CN113535943A - Medical record classification method and device and data record classification method and device - Google Patents

Medical record classification method and device and data record classification method and device Download PDF

Info

Publication number
CN113535943A
CN113535943A CN202010291522.7A CN202010291522A CN113535943A CN 113535943 A CN113535943 A CN 113535943A CN 202010291522 A CN202010291522 A CN 202010291522A CN 113535943 A CN113535943 A CN 113535943A
Authority
CN
China
Prior art keywords
medical
record
target
data
medical record
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010291522.7A
Other languages
Chinese (zh)
Inventor
陈漠沙
仇伟
谭传奇
黄非
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010291522.7A priority Critical patent/CN113535943A/en
Publication of CN113535943A publication Critical patent/CN113535943A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Public Health (AREA)
  • Primary Health Care (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The embodiment of the specification provides a method for classifying medical records. The method comprises the following steps: firstly, obtaining a target medical record to be classified, wherein the target medical record comprises a plurality of medical information fields; then, matching medical information fields in the target medical records with a preset medical information library, wherein the medical information library comprises a plurality of medical record types and corresponding standard medical information fields; then, the hit ratios of the medical information fields in the target medical record in the multiple medical record types are obtained, and the type of the target medical record is determined according to the hit ratios. Thus, the medical records can be classified efficiently and accurately.

Description

Medical record classification method and device and data record classification method and device
Technical Field
The embodiment of the specification relates to the technical field of data processing, in particular to a medical record classification method and device and a data record classification method and device.
Background
With the spread of medical informatization, medical record systems are used by more and more medical institutions and play an important role in the field of medical data. Medical records often include different types, and therefore, the classification of medical records is inevitably involved in medical record systems. For example, for medical records belonging to electronic medical records, the corresponding record types may include admission medical records, discharge medical records, doctor ward round medical records, operation records, and the like, and the uploaded electronic medical records need to be classified according to requirements of subsequent management, analysis, viewing, and the like.
However, the current classification method is too single to meet various requirements in practical applications. Therefore, a reasonable and highly universal classification scheme is urgently needed, which can realize the classification of medical records with low cost, high speed, high efficiency and accuracy while ensuring the safety of private data in the medical records.
Disclosure of Invention
The specification describes a medical record classification method and a medical record classification device, which can realize classification of medical records with low cost, rapidness, high efficiency, accuracy and strong universality while ensuring security of private data in the medical records.
According to a first aspect, a method of classifying a medical record is provided. The method comprises the following steps: acquiring a target medical record to be classified, wherein the target medical record comprises a plurality of medical information fields; matching medical information fields in the target medical record with a preset medical information base, wherein the medical information base comprises a plurality of medical record types and corresponding standard medical information fields; obtaining hit ratios of medical information fields in the target medical record in the plurality of medical record types; and determining the type of the target medical record according to the hit ratio.
In one embodiment, the method further comprises: pre-training the medical information base; the pre-training process comprises the following steps: obtaining a medical record template corresponding to at least one medical record type; and extracting standard medical information fields corresponding to the medical record types from the medical record template.
In one embodiment, the method further comprises: pre-training the medical information base; the pre-training process comprises the following steps: obtaining a plurality of historical medical records corresponding to at least one medical record type; extracting and counting medical information fields in the plurality of historical medical records; and taking the medical information field after statistics as a standard medical information field corresponding to the medical record type.
In a specific embodiment, the extracting and counting the medical information fields in the plurality of historical medical records includes: obtaining a plurality of phrases of the plurality of historical medical records by taking punctuation marks as separation; and taking the phrases with the times of appearance larger than a preset threshold value as standard medical information fields.
In one embodiment, the obtaining hit ratios of the medical information fields in the target medical record in the plurality of medical record types includes: counting the hit number M of medical information fields in the target medical record aiming at a first medical record type comprising N standard medical information fields; calculating the hit ratio of the target medical record in the first medical record type through M and N.
In a specific embodiment, the determining the type of the target medical record according to the hit ratio includes: and selecting the first medical record type with the largest hit ratio as the type of the target medical record.
In one embodiment, the target medical record is a medical record, and the medical record type includes at least one of: physical examination record, admission record, first disease course record, doctor ward round record, operation record, informed consent and discharge record.
In one embodiment, the target medical record is a care record, and the medical record types include patient care records and/or critical patient care records.
According to a second aspect, a method of classifying data records is provided. The method comprises the following steps: acquiring a target data record to be classified, wherein the target data record comprises a plurality of data information fields; matching data information fields in the target data records with a preset data information base, wherein the data information base comprises a plurality of data record types and corresponding standard data information fields; obtaining the hit ratio of the data information field in the target data record in the plurality of data record types; and determining the type of the target data record according to the hit ratio.
In one embodiment, the target data record is a user data record or an operational data record.
According to a third aspect, an apparatus for classifying medical records is provided. The device includes: the record acquisition unit is configured to acquire a target medical record to be classified, wherein the target medical record comprises a plurality of medical information fields; the record matching unit is configured to match medical information fields in the target medical record with a preset medical information library, and the medical information library comprises a plurality of medical record types and corresponding standard medical information fields; a ratio obtaining unit configured to obtain hit ratios of medical information fields in the target medical record in the plurality of medical record types; a type determination unit configured to determine a type of the target medical record according to the hit ratio.
According to a fourth aspect, an apparatus for sorting data records is provided. The device includes: the device comprises a record acquisition unit, a classification unit and a classification unit, wherein the record acquisition unit is configured to acquire a target data record to be classified, and the target data record comprises a plurality of data information fields; the record matching unit is configured to match data information fields in the target data records with a preset data information base, and the data information base comprises a plurality of data record types and corresponding standard data information fields; a proportion obtaining unit configured to obtain hit proportions of data information fields in the target data record in the plurality of data record types; and the type determining unit is configured to determine the type of the target data record according to the hit ratio.
According to a fifth aspect, there is provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of the first or second aspect.
According to a sixth aspect, there is provided a computing device comprising a memory and a processor, wherein the memory has stored therein executable code, and wherein the processor, when executing the executable code, implements the method of the first or second aspect.
In summary, the medical record classification method disclosed in the embodiment of the present specification can realize efficient and accurate classification of medical records, and the method has good versatility and can perfectly solve the problem of classification of cross-institution medical records.
In addition, by adopting the data record classification method disclosed by the embodiment of the specification, the data records can be efficiently and accurately classified.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments disclosed in the present specification, the drawings needed to be used in the description of the embodiments will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments disclosed in the present specification, and it is obvious for those skilled in the art to obtain other drawings based on the drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating an interface change caused by tampering with a medical record template according to an embodiment of the present disclosure;
FIG. 2 is a block diagram of an embodiment of a method for classifying medical records according to the present disclosure;
FIG. 3 is a flowchart of a method for classifying medical records according to an embodiment of the present disclosure;
FIG. 4 is a flowchart of a method for classifying data records according to an embodiment of the present disclosure;
FIG. 5 is a block diagram of a medical record sorting device according to an embodiment of the present disclosure;
fig. 6 is a structural diagram of a data record sorting device disclosed in the embodiments of the present specification.
Detailed Description
Embodiments disclosed in the present specification are described below with reference to the accompanying drawings.
As mentioned above, medical records need to be classified for follow-up management and analysis and review. It should be noted that some medical record systems may provide corresponding templates to the user for different medical record types. However, the user may tamper with the template during data entry, resulting in failure of parsing during subsequent reading of the record. For example, different types of medical records for the same patient may have partially the same content, and at this time, when writing a medical record of type B, a medical staff may directly copy the filled medical record of type a, modify the original template a on the basis of the medical record, and modify the original template a into a medical record of type B corresponding to the template B, but when saving the medical record, the system still recognizes that the medical record is of type a corresponding to the template a, but not of correct type B, which results in that when subsequently parsing the saved record, the system attempts to parse according to the parsing rule corresponding to the template a, and the parsing fails. Therefore, it is necessary to make a correct judgment on the type of the medical record before processing the data content of the medical record, so as to ensure that the algorithm can be correctly analyzed subsequently.
In one particular scenario, the medical record may be an electronic medical record. Particularly, with the popularization of hospital informatization, the electronic medical record system is accepted by more and more hospitals, and plays an important role in a plurality of medical artificial intelligence systems such as patient health management, hospital scientific research systems and the like. The electronic medical records have different templates according to different types (such as admission medical records, discharge medical records, medical records of ward round of doctors, operation records and the like). During the actual writing process, the doctor may tamper with the template, and typically in a scenario, under the medical record type a template, the doctor may write the content of the medical record type B.
In one example, fig. 1 is a schematic diagram of an interface change caused by tampering with a medical record template disclosed in an embodiment of the present disclosure, as shown in fig. 1, the type of the medical record to be pre-imported is "admission record", generally speaking, a typical admission record includes section information such as "patient basic information", "chief complaint", "physical examination", "preliminary diagnosis", "medical plan" and "doctor signature", and if a doctor wants to write a "discharge record" medical record type, a typical discharge record includes information such as "patient basic information", "admission time condition", "admission diagnosis", "medical procedure", "discharge condition" and "doctor signature", and the doctor may perform a certain tampering operation on the section of the "admission record", for example, delete the section of the "physical examination" and increase the medical procedure and the discharge condition ".
In this way, the content of the type B medical record is generated based on the type a template, but the template type stored in the database of the record is still a (pre-imported template type), which may cause the situation that the electronic medical record type is not matched with the electronic medical record content when the computer reads the record, and may cause failure of subsequent analysis. Therefore, the type of the medical record needs to be correctly judged before the medical record content is processed, and the algorithm can be correctly analyzed subsequently.
From the above, it is essential and important to correctly classify medical records. In one classification scheme, machine learning algorithms may be applied to achieve classification of medical records. Specifically, a classification model for the medical records may be trained, and thus, the medical records to be classified are input into the classification model, so that a classification result may be obtained. However, training a classification model usually requires a large number of data sets, meaning that this approach cannot be implemented when there are only a small number of data sets. In addition, the trained classification models often cause performance loss when performing migration (for example, a medical institution wishes to perform migration learning by using the trained classification models of other medical institutions for classifying medical records in the system). Moreover, medical treatment belongs to a scene with high requirements on safety, and leakage of medical data may be caused by applying a machine learning algorithm, namely leakage of user privacy data is implied, and potential safety hazards exist.
Furthermore, the inventors have observed that different types of medical records typically include different types of key information fields (hereinafter referred to as standardized medical information fields), and in one example, for the types of medical records included in an electronic medical record, such as admission records and discharge records, standardized medical information fields are included in the admission records: basic information of patients, preliminary diagnosis, diagnosis and treatment plan, and the like, and the discharge record comprises standard medical information fields: basic information of the patient, diagnosis and treatment passing and discharge conditions and the like.
Based on the above, the inventor proposes to design a method for classifying medical records based on standard medical information fields, specifically, a medical information base is preset, wherein the medical information base comprises a plurality of medical record types and the standard medical information fields corresponding to the medical record types, and then classification of the medical records is realized based on the medical information base. In an embodiment, fig. 2 is a block diagram of a method for classifying medical records disclosed in an embodiment of the present specification, and as shown in fig. 2, first, a target medical record to be classified is matched based on standard medical information fields corresponding to Q (Q is a positive integer greater than 1) medical record types included in a medical information base, so as to obtain Q hit ratios of the medical information fields in the target medical record in the Q medical record types, and further determine the type of the target medical record. Therefore, the medical records can be classified efficiently and accurately, the method has good universality, and the problem of classification of the cross-institution medical records can be perfectly solved.
The steps of the method are described below with reference to specific examples.
Specifically, fig. 3 is a flowchart of a method for classifying medical records disclosed in an embodiment of the present disclosure, where an execution subject of the method may be a device with processing capability: a server or a system or device, such as a medical record system or the like. As shown in fig. 3, the method flow includes the following steps:
step S310, obtaining a target medical record to be classified, wherein the target medical record comprises a plurality of medical information fields; step S320, matching medical information fields in the target medical records with a preset medical information base, wherein the medical information base comprises a plurality of medical record types and corresponding standard medical information fields; step S330, obtaining the hit ratio of the medical information field in the target medical record in the plurality of medical record types; and step S340, determining the type of the target medical record according to the hit ratio.
The steps are as follows:
first, in step S310, a target medical record to be classified is acquired, wherein the target medical record comprises a plurality of medical information fields. In one embodiment, the target medical record may be an electronic medical record. It is to be understood that Electronic Medical Records (EMR) are also called computerized Medical Record systems or Computer-Based Patient records (CPR). Electronic medical records are digitized medical records that are stored, managed, transmitted, and reproduced by electronic devices (computers, health cards, etc.) to replace handwritten paper medical records, the contents of which include all the information of the paper medical records. In general, electronic medical records provide users with the ability to access complete and accurate data, alerts, reminders, and clinical decision support systems.
In one embodiment, the target medical record may be a care record. It is understood that the care records include information from the patient, the caregiver, treatment, care, research, teaching, and administration, and from various drugs, equipment, and devices.
In one embodiment, the target medical record may be a medical image record. It is to be understood that medical imaging refers to techniques and procedures for obtaining images of internal tissues of a human body or a portion of a human body in a non-invasive manner for medical or medical research purposes. Accordingly, the content in the medical image record may include the relevant tissue image and technical treatment.
As can be seen from the above, the target medical record may be a medical record such as an electronic medical record, a nursing record or a medical image record, and includes a plurality of medical information fields. In one embodiment, the plurality of medical information fields may be obtained by performing a word segmentation process on the content in the target medical record. In another embodiment, the text content in the target medical record may be segmented by punctuation marks, spaces, or the like, to obtain the plurality of medical information fields.
After the target medical record to be classified is obtained, in step S320, the medical information field in the target medical record is matched with the preset medical information library.
For ease of understanding and description, the preset medical information library will be described below. Specifically, the medical information base includes a plurality of medical record types and corresponding standard medical information fields.
For the above-mentioned plurality of medical record types, it should be noted that, on the one hand, the medical records may have a multi-level classification, and the classification method may be applied to classify for a certain level thereof. Alternatively, the plurality of medical record types may be determined by medical personnel based on medical experience. In one embodiment, the target medical record is an electronic medical record, and accordingly, the plurality of medical record types corresponding to the candidate medical records may include physical examination records, admission records, first-time course records, physician ward rounds records, surgery records, informed consent records, discharge records, and the like. It is noted that the physical examination record can be a record made based on a health examination. Health physical examination is physical examination with health as the center, and general medical scientists think that health physical examination refers to the comprehensive examination of the body when obvious disease has not appeared yet, conveniently knows the physical condition, screens the body disease. The physical examination of healthy people by physical examination means is called "health physical examination" or "preventive health physical examination". The medical means and method adopted in the health examination can comprise basic examinations of clinical departments, such as examinations of medical equipment such as ultrasound, electrocardio, radiation and the like, and also can comprise laboratory examinations of blood, urine and feces around a human body.
In another embodiment, the target medical record is a care record, and accordingly, the plurality of medical record types may include a patient care record, a critical patient care record, and the like. In one embodiment, the target medical record is a medical image record, and accordingly, the plurality of medical record types corresponding to the candidate medical record may include an angiographic record, a computed tomography record, a magnetic resonance imaging record, a medical ultrasound record, and the like.
Further, each medical record type has a respective standard medical information field, and the standard medical information fields corresponding to the plurality of medical record types may be determined in the following various ways.
Specifically, in one embodiment, the medical information base may be pre-trained to obtain a tagged medical information field corresponding to at least one medical record type therein. In one aspect, considering that the medical record template generally includes key information fields corresponding to the record types, in a specific embodiment, the pre-training process may include: firstly, obtaining a medical record template corresponding to at least one medical record type; then, standard medical information fields corresponding to the medical record types are extracted from the medical record templates. In a more specific embodiment, a number of chapter titles may be extracted from the medical record template as standard medical information fields for the corresponding medical record type. In another more specific embodiment, a field having a predetermined format (e.g., a font size greater than 4, bold display, no indentation in the top line, etc.) may be extracted from the medical record template as the standard medical information field. As such, where a medical record template exists, standard medical information fields may be derived based on the template.
On the other hand, considering the standard medical information field corresponding to a single medical record type, the historical medical records under the type are usually high-frequency words, so the standard medical information field can be determined by adopting a word frequency statistic mode. Based on this, in another specific embodiment, the pre-training process may further include: firstly, obtaining a plurality of historical medical records corresponding to at least one medical record type; then, extracting and counting medical information fields in the plurality of historical medical records; then, the medical information field after statistics is used as a standard medical information field corresponding to the medical record type. Further, considering that the standard medical information field is generally shorter but longer than an atomic word, where the atomic word refers to an inseparable word with a minimum unit, for example, "reason for admission" in fig. 1 is a combination of two atomic words, "admission" and "reason", a punctuation mark may be used to segment the historical medical record, count a high-frequency phrase in the segmented phrases, and classify the high-frequency phrase into the standard medical information field set under the corresponding medical record type. Therefore, the standard medical information fields under various types can be simply, conveniently, quickly and accurately determined.
Specifically, the extracting and counting the medical information fields in the plurality of historical medical records may include: obtaining a plurality of phrases of the plurality of historical medical records by taking punctuation marks as separation; and taking the phrases with the times of appearance larger than a preset threshold value as standard medical information fields. In a more specific embodiment, the punctuation marks may include common Chinese and English punctuation marks, with the addition of spaces, carriage return characters, and the like. In a more specific embodiment, the predetermined threshold may be set by the staff according to actual experience and the number of the plurality of historical medical records, for example, when the number of the plurality of historical medical records is 100, the predetermined threshold may be set to 90. Further, in an example, the phrases that are larger than the predetermined threshold may be manually checked, and then the checked phrases are classified into the first set of heading fields.
According to a specific example, assuming that the medical record belongs to an electronic medical record, see fig. 1, the electronic medical record shown therein includes text content: reasons for admission: before 3 days, patients seek medical treatment because of severe pain of the right upper abdomen with nausea and vomiting, abdominal tenderness and pain of gallbladder areas are checked, and the pain of the left upper abdomen of the patients is relieved after piperacillin tazobactam anti-infection treatment, so that the patients are admitted to the hospital through emergency treatment for further diagnosis and treatment. For this, the segmented phrases include: hospitalization reasons, 3 days before, patients hospitalize for severe pain in the right upper abdomen with nausea and vomiting, abdominal tenderness, pain in the gall bladder area, relief of pain in the left upper abdomen after piperacillin tazobactam anti-infection treatment, and hospitalization for further diagnosis and emergency treatment.
Furthermore, because the occurrence frequency of the standard medical information field is high, each medical record can occur, so that the statistics of a word group is performed on the medical record of a specific medical record type to obtain high-frequency words (if the occurrence frequency is higher than 90%), manual verification is performed, and the words passing the verification are used as the standard medical information field of the medical record of the corresponding category. For example, assuming that the plurality of medical record types of the electronic medical record include an admission record and a discharge record, the determined standard medical information field corresponding to the admission record type may include: the standard medical information fields corresponding to the determined discharge record types can comprise the following basic information of patients, chief complaints, physical examination, preliminary diagnosis, diagnosis and treatment plan and doctor signatures: basic information of a patient, diagnosis and treatment pass, discharge condition and doctor signature. Therefore, the standard medical information fields under various types can be determined according to a sufficient number of historical medical records.
It should be noted that, for the word groups included in the historical medical records, a high-frequency word is determined as a standard medical information field, and in another specific embodiment, a TF-IDF (term frequency-inverse document frequency) statistical technique may also be adopted to determine the high-frequency word, and further determine the header field set. Note that TF means Term Frequency (Term Frequency) and IDF means Inverse text Frequency index (Inverse Document Frequency). The importance of a word increases in direct proportion to the number of times it appears in a document, but at the same time decreases in inverse proportion to the frequency with which it appears in a corpus, TF-IDF can be used to assess how important a word is for a document set or a document in a corpus.
The method of pre-training a medical information base to determine standard medical information fields using a medical record template and a sufficient number of historical medical records, respectively, is described above. In another embodiment, in the case that the medical record template cannot be obtained and the number of the historical medical records is small, an experienced service person (such as a doctor or a medical artificial intelligence practitioner) can determine the mapping relationship between the medical record type and the standard information field by referring to a few medical records to make a summary, and store the mapping relationship into the medical information base. Thus, even if only a small amount of historical medical records exist, an accurate mapping relation can be obtained.
In the above, a plurality of medical record types in the medical information base and meanings and determination manners of corresponding standard medical information fields are described. Therefore, based on the preset medical information base, the medical information fields in the target medical record can be matched, and the hit fields aiming at each type in a plurality of medical record types can be obtained.
In one embodiment, this step may include: and sequentially matching the plurality of medical information fields included in the target medical record with the standard medical information fields corresponding to the medical record types to respectively obtain hit fields corresponding to the medical record types. In one example, it is assumed that the plurality of medical information fields include: the medical record type A comprises a field A, a field B, a field C and a field D, and the medical information base comprises a standard medical information field corresponding to the medical record type A: field a, field E, field F, field D, and a standard medical information field including a corresponding medical record type b: field a, field B, field C, and field D, from which a hit field corresponding to medical record type a may be obtained: field a, field D, and hit field corresponding to medical record type b: field a, field B, field C, and field D.
In another embodiment, this step may include: matching a plurality of medical information fields included in the target medical record with full standard medical information fields corresponding to the types of the plurality of medical records to obtain full hit fields; and determining the medical record type of each hit field in the total hit field, thereby determining the hit field corresponding to each medical record type. In one example, it is assumed that the plurality of medical information fields include: a field a, a field B, a field C, and a field D, and the full-scale standard medical information fields included in the medical information base: field a, field B, field C, field D, field E, field F, from which a full hit field can be derived: field a, field B, field C, field D,. Further, it can be determined that both field a and field D belong to medical record type a and medical record type B, and both field B and field C belong to medical record type B only, so that a hit field corresponding to medical record type a can be determined: field a, field D, and hit field corresponding to medical record type b: field a, field B, field C, and field D.
In this way, by matching the medical information fields in the target medical record with a preset medical information library, the hit fields in the target medical record matching with each medical record type can be determined. Then, in step S330, the hit ratios of the medical information fields in the target medical record in the plurality of medical record types are obtained, and in step S340, the type of the target medical record is determined according to the hit ratios.
In one embodiment, step S330 may include: counting the hit number M of medical information fields in the target medical record aiming at a first medical record type comprising N standard medical information fields; and calculating the hit ratio of the target medical record in the first medical record type through M and N. Note that, where N is a positive integer and M is a natural number, the medical record type may be any one of the above-described medical record types. Based on this, step S340 may include: and selecting the first medical record type with the largest hit ratio as the type of the target medical record. In one example, it is assumed that the determined hit ratio includes 0.4, 0.9, and 0.3, and thus the medical record type corresponding to 0.9 may be used as the type of the target medical record.
On the other hand, in an embodiment, in the case that the determined maximum hit ratio is not unique, manual intervention may be introduced, and the selected medical record type of the staff may be used as the type of the target medical record. Alternatively, the medical record type with the largest number of hit fields may be further selected as the type of the target medical record.
In summary, the medical record classification method disclosed in the embodiments of the present specification can realize efficient and accurate classification of medical records, and the method has good versatility and can perfectly solve the problem of classification of cross-institution medical records.
The above description mainly refers to a classification method for medical records. According to another aspect, the embodiments of the present specification also disclose a method for classifying data records. In one embodiment, the data records may include the aforementioned medical records. In another embodiment, the data records may further include operational data records (e.g., an experimental operational record, etc.). In yet another embodiment, where the data records may include law enforcement records, penalty records, and the like, are applied in the field of justice. This method is described below.
Specifically, the figure is a flowchart of a classification method for data records disclosed in an embodiment of the present specification, and an execution subject of the method may be a device with processing capability: a server or a system or device, such as a record management system or the like. As shown in fig. 4, the method flow includes the following steps:
step S410, obtaining a target data record to be classified, wherein the target data record comprises a plurality of data information fields; step S420, matching the data information field in the target data record with a preset data information base, wherein the data information base comprises a plurality of data record types and corresponding standard data information fields; step S430, obtaining the hit ratio of the data information field in the target data record in the plurality of data record types; step S440, determining the type of the target data record according to the hit ratio.
The steps are as follows:
first, in step S410, a target data record to be classified is obtained, the target data record including a plurality of data information fields.
In one embodiment, the target data record may be an operational data record. In a specific embodiment, the operation data records may be experimental operation data records, equipment operation data records, and demonstration operation data records. In another embodiment, the target data record may be a user data record. In a specific embodiment, the user data record may be a daily life data record, a personal health data record, among others.
In one embodiment, the target data records to be sorted may be newly generated in the record management system, such as data records newly uploaded by the user. In a particular embodiment, these data records may be periodically retrieved as target data records to be sorted. In another specific embodiment, the acquisition of the upload text may be automatically triggered in response to the successful upload of the data record, as the target data record to be classified.
The above can acquire the target data record to be classified. Next, in step S420, the data information fields in the target data record are matched with a preset data information base, where the data information base includes a plurality of data record types and standard data information fields corresponding to the data record types. This results in hit fields corresponding to each data record type.
In one embodiment, the target data record is assumed to be an experimental operation text, and accordingly, the plurality of data record types may include a chemical experiment type, a physical experiment type, a biological experiment type, and the like. In another embodiment, assuming that the target data record is a daily life data record, the plurality of data record types may include a learning class, a friend-making class, a work class, a sport class, and the like. In a specific embodiment, the standard data information field corresponding to the learning class includes: data name, location, time period, learning companion, learning summary, recent planning.
Further, based on the standard data information field, the target data records are matched, and hit fields for each data record type can be obtained. Then, in step S430, the hit ratios of the data information fields in the target data record in the multiple data record types are obtained, and in step S440, the type of the target data record is determined according to the hit ratios.
It should be noted that, for the description of the above steps S410 to S440, reference may also be made to the related description in the foregoing embodiments.
In summary, the data records can be classified efficiently and accurately by the data record classification method disclosed in the embodiments of the present specification.
Corresponding to the classification method described in the above embodiment, the embodiment of the present specification further discloses a classification apparatus, which is specifically as follows:
fig. 5 is a structural view of a medical record sorting device according to an embodiment of the present disclosure. As shown in fig. 5, the apparatus 500 includes:
a record obtaining unit 510 configured to obtain a target medical record to be classified, where the target medical record includes a plurality of medical information fields; a record matching unit 520, configured to match medical information fields in the target medical record with a preset medical information base, where the medical information base includes a plurality of medical record types and standard medical information fields corresponding to the medical record types; a ratio obtaining unit 530 configured to obtain hit ratios of medical information fields in the target medical record in the plurality of medical record types; a type determining unit 540 configured to determine the type of the target medical record according to the hit ratio.
In one embodiment, the apparatus 500 further comprises: a pre-training unit 550 configured to pre-train the medical information base; the pre-training module is specifically configured to: obtaining a medical record template corresponding to at least one medical record type; and extracting standard medical information fields corresponding to the medical record types from the medical record template.
In one embodiment, the apparatus 500 further comprises: a pre-training unit 550 configured to pre-train the medical information base; the pre-training module specifically comprises: a record obtaining subunit 551 configured to obtain a plurality of historical medical records corresponding to at least one medical record type; a field statistics subunit 552 configured to extract and count the medical information fields in the plurality of historical medical records; a field classification subunit 553 configured to treat the medical information field after the statistics as a standard medical information field corresponding to the medical record type.
In a specific embodiment, the field statistics subunit 552 is specifically configured to: obtaining a plurality of phrases of the plurality of historical medical records by taking punctuation marks as separation; and taking the phrases with the times of appearance larger than a preset threshold value as standard medical information fields.
In an embodiment, the ratio obtaining unit 530 is specifically configured to: counting the hit number M of medical information fields in the target medical record aiming at a first medical record type comprising N standard medical information fields; calculating the hit ratio of the target medical record in the first medical record type through M and N. In a specific embodiment, the type determining unit 540 is specifically configured to: and selecting the first medical record type with the largest hit ratio as the type of the target medical record.
In one embodiment, the target medical record is a medical record, and the medical record type includes at least one of: admission record, first disease course record, doctor ward round record, operation record, informed consent and discharge record.
In one embodiment, the target medical record is a care record, and the medical record types include patient care records and/or critical patient care records.
In summary, the medical record classification device disclosed in the embodiment of the present specification can realize efficient and accurate classification of medical records, and the method has good versatility and can perfectly solve the problem of classification of cross-institution medical records.
Fig. 6 is a structural diagram of a data record sorting device disclosed in the embodiments of the present specification. As shown in fig. 6, the apparatus 600 includes:
a record obtaining unit 610 configured to obtain a target data record to be classified, where the target data record includes a plurality of data information fields; a record matching unit 620, configured to match data information fields in the target data records with a preset data information base, where the data information base includes multiple data record types and standard data information fields corresponding to the data record types; a ratio obtaining unit 630, configured to obtain hit ratios of data information fields in the target data record in the plurality of data record types; a type determining unit 640 configured to determine the type of the target data record according to the hit ratio.
In one embodiment, the apparatus 500 further comprises: a pre-training unit 650 configured to pre-train the database; the pre-training module is specifically configured to: obtaining a data record template corresponding to at least one data record type; and extracting standard data information fields corresponding to the data record types from the data record template.
In one embodiment, the apparatus 500 further comprises: a pre-training unit 650 configured to pre-train the database; the pre-training module specifically comprises: a record obtaining subunit 651 configured to obtain a plurality of history data records corresponding to at least one data record type; a field statistics subunit 652 configured to extract and count data information fields in the plurality of historical data records; a field classification subunit 653 configured to treat the data information field after statistics as a standard data information field corresponding to the data record type.
In a specific embodiment, the field statistics subunit 652 is specifically configured to: obtaining a plurality of phrases of the plurality of historical data records by using punctuation marks as separation; and taking the phrases with the times of appearance of the phrases larger than a preset threshold value as standard data information fields.
In an embodiment, the ratio obtaining unit 630 is specifically configured to: counting the hit number M of the data information fields in the target data record aiming at a first data record type comprising N standard data information fields; and calculating the hit ratio of the target data record in the first data record type through M and N. In a specific embodiment, the type determining unit 640 is specifically configured to: and selecting the first data record type with the largest hit ratio as the type of the target data record.
In summary, the data record classification device disclosed in the embodiments of the present specification can realize efficient and accurate data record classification, and the method has a good general purpose, and can perfectly solve the problem of cross-mechanism data record classification.
As above, according to an embodiment of a further aspect, there is also provided a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method described in connection with fig. 3 or fig. 4.
According to an embodiment of yet another aspect, there is also provided a computing device comprising a memory having stored therein executable code, and a processor that, when executing the executable code, implements the method described in connection with fig. 3 or fig. 4.
Those skilled in the art will recognize that, in one or more of the examples described above, the functions described in the embodiments disclosed herein may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, the functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.
The above-mentioned embodiments, objects, technical solutions and advantages of the embodiments disclosed in the present specification are further described in detail, it should be understood that the above-mentioned embodiments are only specific embodiments of the embodiments disclosed in the present specification, and are not intended to limit the scope of the embodiments disclosed in the present specification, and any modifications, equivalent substitutions, improvements and the like made on the basis of the technical solutions of the embodiments disclosed in the present specification should be included in the scope of the embodiments disclosed in the present specification.

Claims (14)

1. A method of classifying a medical record, comprising:
acquiring a target medical record to be classified, wherein the target medical record comprises a plurality of medical information fields;
matching medical information fields in the target medical record with a preset medical information base, wherein the medical information base comprises a plurality of medical record types and corresponding standard medical information fields; obtaining hit ratios of medical information fields in the target medical record in the plurality of medical record types;
and determining the type of the target medical record according to the hit ratio.
2. The method of claim 1, wherein the method further comprises: pre-training the medical information base; the pre-training process comprises the following steps:
obtaining a medical record template corresponding to at least one medical record type;
and extracting standard medical information fields corresponding to the medical record types from the medical record template.
3. The method of claim 1, wherein the method further comprises: pre-training the medical information base; the pre-training process comprises the following steps:
obtaining a plurality of historical medical records corresponding to at least one medical record type;
extracting and counting medical information fields in the plurality of historical medical records;
and taking the medical information field after statistics as a standard medical information field corresponding to the medical record type.
4. The method of claim 3, wherein said extracting and counting medical information fields in said plurality of historical medical records comprises:
obtaining a plurality of phrases of the plurality of historical medical records by taking punctuation marks as separation;
and taking the phrases with the times of appearance larger than a preset threshold value as standard medical information fields.
5. The method of claim 1, wherein the obtaining hit ratios of medical information fields in the target medical record in the plurality of medical record types comprises:
counting the hit number M of medical information fields in the target medical record aiming at a first medical record type comprising N standard medical information fields;
calculating the hit ratio of the target medical record in the first medical record type through M and N.
6. The method of claim 5, wherein said determining a type of said target medical record from said hit ratio comprises:
and selecting the first medical record type with the largest hit ratio as the type of the target medical record.
7. The method of any of claims 1-6, wherein the target medical record is a medical record, the medical record type including at least one of: physical examination record, admission record, first disease course record, doctor ward round record, operation record, informed consent and discharge record.
8. The method of any of claims 1-6, wherein the target medical record is a care record, the medical record type including a patient care record and/or a critical patient care record.
9. A method of classifying data records, comprising:
acquiring a target data record to be classified, wherein the target data record comprises a plurality of data information fields;
matching data information fields in the target data records with a preset data information base, wherein the data information base comprises a plurality of data record types and corresponding standard data information fields;
obtaining the hit ratio of the data information field in the target data record in the plurality of data record types;
and determining the type of the target data record according to the hit ratio.
10. The method of claim 9, wherein the target data record is a user data record or an operational data record.
11. An apparatus for classifying medical records, comprising:
the record acquisition unit is configured to acquire a target medical record to be classified, wherein the target medical record comprises a plurality of medical information fields;
the record matching unit is configured to match medical information fields in the target medical record with a preset medical information library, and the medical information library comprises a plurality of medical record types and corresponding standard medical information fields;
a ratio obtaining unit configured to obtain hit ratios of medical information fields in the target medical record in the plurality of medical record types;
a type determination unit configured to determine a type of the target medical record according to the hit ratio.
12. An apparatus for sorting data records, comprising:
the device comprises a record acquisition unit, a classification unit and a classification unit, wherein the record acquisition unit is configured to acquire a target data record to be classified, and the target data record comprises a plurality of data information fields;
the record matching unit is configured to match data information fields in the target data records with a preset data information base, and the data information base comprises a plurality of data record types and corresponding standard data information fields;
a proportion obtaining unit configured to obtain hit proportions of data information fields in the target data record in the plurality of data record types;
and the type determining unit is configured to determine the type of the target data record according to the hit ratio.
13. A computer-readable storage medium, on which a computer program is stored, wherein the computer program, when executed in a computer, causes the computer to perform the method of any of claims 1-10.
14. A computing device comprising a memory and a processor, wherein the memory has stored therein executable code that when executed by the processor implements the method of any of claims 1-10.
CN202010291522.7A 2020-04-14 2020-04-14 Medical record classification method and device and data record classification method and device Pending CN113535943A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010291522.7A CN113535943A (en) 2020-04-14 2020-04-14 Medical record classification method and device and data record classification method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010291522.7A CN113535943A (en) 2020-04-14 2020-04-14 Medical record classification method and device and data record classification method and device

Publications (1)

Publication Number Publication Date
CN113535943A true CN113535943A (en) 2021-10-22

Family

ID=78119961

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010291522.7A Pending CN113535943A (en) 2020-04-14 2020-04-14 Medical record classification method and device and data record classification method and device

Country Status (1)

Country Link
CN (1) CN113535943A (en)

Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030140044A1 (en) * 2002-01-18 2003-07-24 Peoplechart Patient directed system and method for managing medical information
US8321241B1 (en) * 2005-05-31 2012-11-27 Allscripts Software, Llc Electronic patient record documentation with push and pull of data to and from database
CN103530334A (en) * 2013-09-29 2014-01-22 方正国际软件有限公司 System and method for data matching based on comparison module
KR20140107994A (en) * 2013-02-28 2014-09-05 이종호 System for recording medical treatment of psychiatrist and method thereof
KR20140127544A (en) * 2013-04-25 2014-11-04 서울대학교병원 (분사무소) Method and system for providing medical data record writing based emr system
US20150111188A1 (en) * 2013-10-23 2015-04-23 Saji Maruthurkkara Query Response System for Medical Device Recipients
US20150127659A1 (en) * 2013-11-01 2015-05-07 Intuit Inc. Method and system for document data extraction template management
CN105138501A (en) * 2015-08-10 2015-12-09 北京思特奇信息技术股份有限公司 Configurable dynamic report generating method and system
CN107785057A (en) * 2017-06-19 2018-03-09 平安医疗健康管理股份有限公司 Medical data processing method, device, storage medium and computer equipment
CN108038504A (en) * 2017-12-11 2018-05-15 深圳房讯通信息技术有限公司 A kind of method for parsing property ownership certificate photo content
CN109218101A (en) * 2018-09-26 2019-01-15 北京交通大学 A kind of method and system of wisdom contract network group creation
US20190035506A1 (en) * 2017-07-31 2019-01-31 Hefei University Of Technology Intelligent auxiliary diagnosis method, system and machine-readable medium thereof
CN109598139A (en) * 2018-11-21 2019-04-09 金色熊猫有限公司 Privacy information processing method, device, electronic equipment and computer-readable medium
CN109636557A (en) * 2018-12-11 2019-04-16 厦门商集网络科技有限责任公司 A kind of intelligent classification bookkeeping methods and equipment based on bank slip recognition
CN109637602A (en) * 2018-11-23 2019-04-16 金色熊猫有限公司 Medical data storage and querying method, device, storage medium and electronic equipment
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file
CN110727710A (en) * 2019-10-12 2020-01-24 平安医疗健康管理股份有限公司 Data analysis method and device, computer equipment and storage medium
US20210202111A1 (en) * 2018-09-05 2021-07-01 Koninklijke Philips N.V. Method of classifying medical records

Patent Citations (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20030140044A1 (en) * 2002-01-18 2003-07-24 Peoplechart Patient directed system and method for managing medical information
US8321241B1 (en) * 2005-05-31 2012-11-27 Allscripts Software, Llc Electronic patient record documentation with push and pull of data to and from database
KR20140107994A (en) * 2013-02-28 2014-09-05 이종호 System for recording medical treatment of psychiatrist and method thereof
KR20140127544A (en) * 2013-04-25 2014-11-04 서울대학교병원 (분사무소) Method and system for providing medical data record writing based emr system
CN103530334A (en) * 2013-09-29 2014-01-22 方正国际软件有限公司 System and method for data matching based on comparison module
US20150111188A1 (en) * 2013-10-23 2015-04-23 Saji Maruthurkkara Query Response System for Medical Device Recipients
US20150127659A1 (en) * 2013-11-01 2015-05-07 Intuit Inc. Method and system for document data extraction template management
CN105138501A (en) * 2015-08-10 2015-12-09 北京思特奇信息技术股份有限公司 Configurable dynamic report generating method and system
CN107785057A (en) * 2017-06-19 2018-03-09 平安医疗健康管理股份有限公司 Medical data processing method, device, storage medium and computer equipment
US20190035506A1 (en) * 2017-07-31 2019-01-31 Hefei University Of Technology Intelligent auxiliary diagnosis method, system and machine-readable medium thereof
CN108038504A (en) * 2017-12-11 2018-05-15 深圳房讯通信息技术有限公司 A kind of method for parsing property ownership certificate photo content
US20210202111A1 (en) * 2018-09-05 2021-07-01 Koninklijke Philips N.V. Method of classifying medical records
CN109218101A (en) * 2018-09-26 2019-01-15 北京交通大学 A kind of method and system of wisdom contract network group creation
CN109598139A (en) * 2018-11-21 2019-04-09 金色熊猫有限公司 Privacy information processing method, device, electronic equipment and computer-readable medium
CN109637602A (en) * 2018-11-23 2019-04-16 金色熊猫有限公司 Medical data storage and querying method, device, storage medium and electronic equipment
CN109636557A (en) * 2018-12-11 2019-04-16 厦门商集网络科技有限责任公司 A kind of intelligent classification bookkeeping methods and equipment based on bank slip recognition
CN110543844A (en) * 2019-08-26 2019-12-06 中电科大数据研究院有限公司 metadata extraction method for government affair metadata PDF file
CN110727710A (en) * 2019-10-12 2020-01-24 平安医疗健康管理股份有限公司 Data analysis method and device, computer equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
崔园;: "数据挖掘在中文病历分类中的应用", 计算机与数字工程, no. 03, pages 166 - 169 *

Similar Documents

Publication Publication Date Title
Wei et al. Evaluation of a deep neural network for automated classification of colorectal polyps on histopathologic slides
JP6749835B2 (en) Context-sensitive medical data entry system
JP5952835B2 (en) Imaging protocol updates and / or recommenders
US11449793B2 (en) Methods and systems for medical record searching with transmittable machine learning
CN109754886A (en) Therapeutic scheme intelligent generating system, method and readable storage medium storing program for executing, electronic equipment
Ernst Iridology: not useful and potentially harmful
US20150324523A1 (en) System and method for indicating the quality of information to support decision making
US20230088543A1 (en) Systems and methods for dynamic data processing and graphical user interface processing
EP3557584A1 (en) Artificial intelligence querying for radiology reports in medical imaging
EP4170670A1 (en) Medical data processing method and system
US20150227714A1 (en) Medical information analysis apparatus and medical information analysis method
CN111627512A (en) Recommendation method and device for similar medical records, electronic equipment and storage medium
Tschandl Risk of bias and error from data sets used for dermatologic artificial intelligence
CN114141377A (en) Method for establishing diagnosis rule base, method and equipment for checking diagnosis information
CN114912887B (en) Clinical data input method and device based on electronic medical record
Horng et al. Consensus development of a modern ontology of emergency department presenting problems—the Hierarchical Presenting Problem Ontology (HaPPy)
Zhang et al. Comparison of chest radiograph captions based on natural language processing vs completed by radiologists
WO2019068535A1 (en) Method for analysing a medical imaging data set, system for analysing a medical imaging data set, computer program product and a computer-readable medium
CN114155949A (en) Examination and verification method, device and equipment for first page of medical record
US20240006039A1 (en) Medical structured reporting workflow assisted by natural language processing techniques
Nair et al. Automated clinical concept-value pair extraction from discharge summary of pituitary adenoma patients
Dipnall et al. Predicting fracture outcomes from clinical registry data using artificial intelligence supplemented models for evidence-informed treatment (PRAISE) study protocol
CN113535943A (en) Medical record classification method and device and data record classification method and device
US8756234B1 (en) Information theory entropy reduction program
JP2023020145A (en) Analysis device, analysis method, and program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination