CN113887204A - Coding method for clinical examination medical text - Google Patents

Coding method for clinical examination medical text Download PDF

Info

Publication number
CN113887204A
CN113887204A CN202111147404.XA CN202111147404A CN113887204A CN 113887204 A CN113887204 A CN 113887204A CN 202111147404 A CN202111147404 A CN 202111147404A CN 113887204 A CN113887204 A CN 113887204A
Authority
CN
China
Prior art keywords
term
medical
segmentation
clinical examination
participle
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202111147404.XA
Other languages
Chinese (zh)
Inventor
刘靳波
孔鑫
何文伟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Affiliated Hospital of Southwest Medical University
Original Assignee
Affiliated Hospital of Southwest Medical University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Affiliated Hospital of Southwest Medical University filed Critical Affiliated Hospital of Southwest Medical University
Priority to CN202111147404.XA priority Critical patent/CN113887204A/en
Publication of CN113887204A publication Critical patent/CN113887204A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/12Use of codes for handling textual entities
    • G06F40/126Character encoding
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/20ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Epidemiology (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Medical Treatment And Welfare Office Work (AREA)

Abstract

The invention provides a coding method for a clinical examination medical text, and relates to the field of clinical examination medicine. The invention analyzes and processes the clinical examination medical text to obtain the content structure of the text, and carries out structured coding on each structure of the content structure; before structured coding, calculating the similarity of each clinical examination medical term in the clinical examination medical term library; repeated and approximate clinical examination medical terminology can be effectively reduced; when the clinical examination medical term is stored, a source structure based on the participles is adopted, the participles are used as a basic unit of coding, and different participles in a code word library are combined to form the corresponding clinical examination medical term; the storage space can be greatly saved; the word segmentation is carried out by combining the word segmentation dictionary and the machine learning word segmentation device, so that the workload of manual examination and verification is reduced, and the word segmentation efficiency is improved; and three mapping modes of full mapping, basic mapping and principal participle mapping are added, so that the universality is better.

Description

Coding method for clinical examination medical text
Technical Field
The invention relates to the field of clinical examination medicine, in particular to a coding method for a clinical examination medicine text.
Background
Clinical laboratory medicine is a bridge discipline established between basic medicine and clinical medicine, and relates to relevant knowledge in various fields of medicine. Is composed of many basic subjects of hematology, biochemistry, human parasitism, microbiology, immunology and the like, and is an important component part of medical and health work. It is a comprehensive application subject with mutual penetration and cross matching of multiple subjects on the basis of inspection medicine. Relates to various natural disciplines such as chemistry, physics, biology, optics, statistics, artificial intelligence, immunology, microbiology, genetics, molecular biology and the like. In the beginning of the 90 s, clinical test medical profession rapidly developed, the establishment of disciplines was unprecedentedly active, and the test disciplines developed from medical test to clinical test medicine and become an independent discipline. The main professional classes designed in the clinical laboratory medicine presidential stage include: molecular biology base, clinical testing medicine base, clinical biochemistry, clinical hematology, clinical transfusion, clinical microbiology, clinical immunology, human parasitology, practical diagnostics, clinical testing quality management, etc. It is obvious that the professional knowledge related to clinical examination medicine is wide, the knowledge structure is complex, if structured coding processing is not performed, the clinical examination medicine text is difficult to utilize and analyze, and in addition, due to different structured rules, no unified structured standard of the clinical examination medicine text exists at the present stage, so that data barriers appear, and the clinical examination medicine text cannot be interchanged and shared.
For this reason, the publication numbers are: the invention application of CN112131868A discloses a clinical trial medical coding method, which comprises: uploading a source file of clinical trial research, and specifying a column of words to be coded during uploading; matching the words to be coded with the corresponding standard dictionary based on the domain corresponding to the clinical trial research institute to obtain a coding result; and matching the coding result with the words to be coded in the source file, and exporting the words as a complete result. According to the invention, the uploading of the source file, the medical coding and the export of the coding result are sequentially carried out, so that the closed-loop medical coding is realized, and the working efficiency of the medical coding is effectively improved; and based on the coding results obtained for the domain corresponding to the clinical trial research, the research can be classified according to groups, the research in the same domain shares one set of coding results, copying is not needed between the researches, the workload of medical coding is reduced, and the working efficiency is greatly improved.
However, the application code is not structured about the content to be coded, and the mutual conversion can not be directly realized in the text adopting different coding rules. Therefore, there is a need to provide a coding method for clinical laboratory medical texts to solve the above technical problems.
Disclosure of Invention
In order to solve the technical problems, the invention provides a coding method for clinical examination medical texts, which is used for carrying out text analysis and processing on the clinical examination medical texts needing structured coding to obtain terms to be coded; calculating the similarity between the term to be coded and each clinical examination medical term in the clinical examination medical term library through a similarity calculation formula; judging whether the term to be coded belongs to any clinical examination medical term in a clinical examination medical term library or not through a similarity threshold; if the medical term belongs to the category, the term to be coded is represented by the code of the subordinate clinical examination medical term; if not, judging whether to add the term as a new clinical examination medical term into the clinical examination medical term library.
As a more specific solution, the clinical laboratory medical term library includes a mother library and an extended library; the mother library manually extracts clinical examination medical terms from the clinical examination medical authoritative classification and coding works through clinical examination medical professionals and stores the clinical examination medical terms in a structured manner; the expansion library is used for structured preservation of new clinical laboratory medical terms.
As a more specific solution, the clinical laboratory medical terms are classified into laboratory medical object terms and laboratory medical means terms according to the corresponding term semanteme; the clinical laboratory medical terms are stored into a clinical laboratory medical term repository by structuring.
Further, the retention structure includes: term ID, term CODE, term NAME, term SOURCE _ CODE, term DATE, term CLASS, term main participle, and term DATA; the term ID is used to sequentially number clinical laboratory medical terms; the term CODE is used as a unique reference CODE corresponding to a clinical examination medical term and plays a role in reference; the term NAME is used to NAME the NAME of the clinical laboratory medical term; the term SOURCE is used to label the SOURCE of clinical laboratory medical terms; the term SOURCE CODE is used to label CODE CODEs in the SOURCE of clinical laboratory medical terms, and the term DATE is used to label creation time, modification time, and time data; the term CLASS is used to denote a CLASS of terms that includes test medical object terms and test medical instrument terms; the term main participle is used for describing the most main term content of the corresponding clinical examination medical term; the term DATA is used to preserve the respective participles CODE of clinical laboratory medical terms.
As a more specific solution, performing a word segmentation operation on each clinical examination medical term through a word segmentation device, and obtaining a clinical examination medical term composed of word segmentation segments; classifying the segmentation into main segmentation, attribute segmentation and expansion segmentation according to the semantic meaning of the segmentation, and storing the medical terms of clinical examination according to the encoding structure of main segmentation CODE, multiple attribute segmentation CODE and expansion segmentation CODE.
Further, the segmentation is stored in a coding word bank, and the storage structure includes: word segmentation ID, word segmentation CODE, word segmentation KIND, word segmentation DATE, word segmentation NAME and word segmentation DATA; the segmentation ID is used for sequentially numbering segmentation segments; the participle CODE is used as a unique reference CODE corresponding to the participle segment to play a reference role; the participle KIND is used for labeling the participle classification of the participle section, and the participle classification comprises a main participle, an attribute participle and an expansion participle; the participle DATE is used for marking the creation time, the modification time and the time data; the participle NAME is used for naming the NAME of the participle segment; the segmentation DATA is used for storing the text content of the segmentation segment.
As a more specific solution, the clinical test medical term includes a test object term and a test means term; the coding structure of the test object term: analyzing components (main participles), component description, sample analysis, dimension inspection, inspection precision, inspection means, inspection time (multiple attribute participles), remark supplement participles (expansion participles); the coding structure of the inspection instrument term: the method comprises the following steps of checking mode (main participle) + mode description, checking area, checking object, testability description, use description, posture description (multiple attribute participles) + remark supplement participles (expansion participles).
As a more specific solution, the segmentation word segment is labeled with corresponding weights through a segmentation word machine, and the similarity between the term to be encoded and each clinical examination medical term in the clinical examination medical term library is calculated through a similarity calculation formula, where the similarity calculation formula is a segmentation word similarity calculation formula based on cosine similarity and segmentation word weights:
Figure BDA0003285952310000031
wherein: a, B represent the term A and the term B, respectively,
Figure BDA0003285952310000032
a word-segmentation vector representing the term a,
Figure BDA0003285952310000033
a participle vector representing term B;
Figure BDA0003285952310000034
a set of segmentation weights representing the term a,
Figure BDA0003285952310000035
airepresents the ith participle weight of the term A;
Figure BDA0003285952310000036
set of participles representing term B
Figure BDA0003285952310000037
biRepresents the ith participle weight of term B; sam (A, B) denotes the similarity of term A and term B.
As a more specific solution, the word segmentation device is a word segmentation device based on combination of machine learning word segmentation and a word segmentation dictionary; after the initial word segmentation is carried out through the word segmentation dictionary, further word segmentation is carried out through the machine learning word segmentation device, and finally the machine learning word segmentation is manually checked to check the word segmentation accuracy.
Further, the segmentation dictionary comprises a commonly used segmentation dictionary and a clinical laboratory medicine segmentation dictionary; the common-term word segmentation dictionary is used for storing common-term segmentation words, the clinical examination medical word segmentation dictionary is used for storing clinical examination medical word segmentation words, the clinical examination medical word segmentation dictionary is automatically updated according to a clinical examination medical term library, and the word segmentation dictionary is marked with word segmentation part of speech, word segmentation semantic meaning (main word segmentation, attribute word segmentation and expansion word segmentation) and word segmentation weight of each word segmentation.
Furthermore, the machine learning word segmentation device is obtained by training based on a bidirectional LSTM + CRF machine learning algorithm and sequentially comprises a Look-up layer, a Forward LSTM layer, a backswood LSTM layer and a CRF layer; performing word segmentation operation and labeling on the existing clinical examination medical terms manually, wherein the labeling comprises word segmentation part of speech, word segmentation semanteme and word segmentation weight; and (3) taking the labeled clinical examination medical terms as training data, training and testing the bidirectional LSTM + CRF neural network model, and outputting the model meeting the word segmentation accuracy as a machine learning word segmentation device.
As a more specific solution, the term to be encoded judges whether the term to be encoded belongs to any clinical laboratory medical term in the clinical laboratory medical term library through a similarity threshold; when the similarity threshold is judged, the mapping mode comprises full mapping, basic mapping and principal and subordinate word mapping; the full mapping is to calculate the similarity of the main participle, the attribute participle and the expansion participle of the clinical examination medical term in the coding technology and the clinical examination medical term library respectively; the basic mapping is to calculate the similarity of the main participle and the attribute participle of the clinical examination medical term in the clinical examination medical term library; the principal participle mapping is to calculate and clinical examination medical term principal participle similarity in a clinical examination medical term library only.
As a more specific solution, terms to be encoded of clinical laboratory medical text are structurally encoded by the following steps:
s1, acquiring clinical examination medical texts needing structured coding;
s2, preprocessing the clinical examination medical text, including de-duplication, de-noising and text vectorization;
s3, performing word segmentation operation on the preprocessed clinical examination medical text through a word segmentation device, and labeling the part word property, the part word semanteme and the part word weight of each word;
s4, determining whether each participle is discarded or not through a participle weight threshold, wherein the participle below the participle weight threshold is regarded as a useless participle and discarded to obtain a term to be encoded;
s5, calculating the similarity between each word segmentation of the term to be coded and each word segmentation of the clinical examination medical term in the clinical examination medical term library according to the set mapping mode and the similarity calculation formula;
s6, judging whether the term to be coded belongs to any clinical examination medical term in the clinical examination medical term library through a similarity threshold;
s7 if it belongs, the term to be coded is represented by the code of the subordinate clinical laboratory medical term;
s8, if not, screening out the segmentation sections of the terms to be coded which do not meet the similarity threshold, and manually judging whether the segmentation sections which do not meet the similarity threshold are added into the coding segmentation library as new segmentation;
s9, if judging that the new participle is added into the coding participle library, updating the coding participle library, expressing the term to be coded through a participle segment to obtain a new clinical examination medical term, adding the new clinical examination medical term into the clinical examination medical term library, and updating a participle dictionary;
and S10, if judging that no new participle is added into the encoding participle library, discarding the term to be encoded as a nonsense term.
As a more specific solution, the method can also perform mutual structural code conversion on a clinical examination medical term library/dictionary library/mapping library adopting different coding rules; the structured transcoding is performed by:
d1 carrying out structured coding on clinical examination medical texts in a clinical examination medical term library/dictionary library/mapping library with different coding rules through steps S1-S10;
d2 directly establishing a mapping relation between clinical examination medical texts with the same structured codes;
d3, establishing a segmentation mapping relation for the clinical examination medical texts with the same partial segmentation;
d4 completely different clinical trial medical texts were relational mapped manually.
Compared with the related art, the coding method for the clinical examination oriented medical text has the following beneficial effects:
1. the invention analyzes and processes the clinical examination medical text to obtain the content structure of the text, and carries out structured coding on each structure of the content structure; before structured coding, calculating the similarity of each clinical examination medical term in the clinical examination medical term library; repeated and approximate clinical examination medical terms can be effectively reduced, and in addition, different code conversion of similar contents can be realized by calculating the approximation degree;
2. when the clinical examination medical term is stored, a source structure based on the participles is adopted, the participles are used as a basic unit of coding, and different participles in a code word library are combined to form the corresponding clinical examination medical term; because many participles are communicated in a public way, the storage space can be greatly saved, the management integration level is higher, the whole term can be stored only by recording the CODE of each participle related to the term, and the structure and the source of the term are clearer;
3. the invention divides words by combining a word division dictionary and a machine learning word divider, firstly performs preliminary word division by the word division dictionary, and peels off obviously determined words; therefore, the workload of the machine learning word segmentation device is reduced, and secondary word segmentation is carried out through the machine learning word segmentation device, so that the workload of manual examination and verification is reduced, and the word segmentation efficiency is improved;
4. the invention considers that the mapping needs to be carried out in a layering way due to different structural layers of terms to be coded; therefore, three mapping modes of full mapping, basic mapping and principal and subordinate word mapping are added, and the universality is better.
Drawings
Fig. 1 is a flowchart illustrating a method for encoding medical texts for clinical examinations according to a preferred embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and embodiments.
As shown in fig. 1, the coding method for clinical examination medical text provided by the present invention performs text analysis and processing on the clinical examination medical text that needs to be structured coded, to obtain a term to be coded; calculating the similarity between the term to be coded and each clinical examination medical term in the clinical examination medical term library through a similarity calculation formula; judging whether the term to be coded belongs to any clinical examination medical term in a clinical examination medical term library or not through a similarity threshold; if the medical term belongs to the category, the term to be coded is represented by the code of the subordinate clinical examination medical term; if not, judging whether to add the term as a new clinical examination medical term into the clinical examination medical term library.
It is to be noted that; at the present stage, a coding method for clinical examination medical texts is not perfect, and usually a time sequence coding mode is adopted, and unique coding numbers are given to the clinical examination medical texts; however, this method cannot reflect the content structure of the clinical examination medical text, in addition, the clinical examination medical text has a large number of repeated fields, some clinical examination medical texts have many identical or even nearly identical contents, if each clinical examination medical text is numbered independently, the workload and the management difficulty are extremely large, in this embodiment, the content structure is obtained by analyzing and processing the clinical examination medical text, and each structure is structurally encoded; before structured coding, similarity between the term to be coded and each clinical examination medical term in the clinical examination medical term library is calculated, and if the term to be coded belongs to the clinical examination medical term library, the term to be coded is represented by the code of the subordinate clinical examination medical term; if not, judging whether to add the term into the clinical examination medical term library as a new clinical examination medical term, effectively reducing repeated and approximate clinical examination medical terms, and realizing different code conversion of similar contents by calculating the approximation degree.
As a more specific solution, the clinical laboratory medical term library includes a mother library and an extended library; the mother library manually extracts clinical examination medical terms from the clinical examination medical authoritative classification and coding works through clinical examination medical professionals and stores the clinical examination medical terms in a structured manner; the expansion library is used for structured preservation of new clinical laboratory medical terms.
It should be noted that: the mother library of the clinical examination medical term library is mainly used as a credible reference source when similarity is matched, so that a professional is required to strictly construct the mother library, and the mother library is formed by integrating according to the patent and welfare clinical examination medical works such as 'Chinese medical subject word list', 'clinical examination item classification and code', 'clinical examination medical basic noun term collection', 'Chinese examination medical term standard', 'international examination medical term standard', and the like.
As a more specific solution, the clinical laboratory medical terms are classified into laboratory medical object terms and laboratory medical means terms according to the corresponding term semanteme; the clinical laboratory medical terms are stored into a clinical laboratory medical term repository by structuring.
It should be noted that: in clinical laboratory medical texts, the most common and the most important ones relate to the objects: test medical object terms and test medical instrument terms.
Further, the retention structure includes: term ID, term CODE, term NAME, term SOURCE _ CODE, term DATE, term CLASS, term main participle, and term DATA; the term ID is used to sequentially number clinical laboratory medical terms; the term CODE is used as a unique reference CODE corresponding to a clinical examination medical term and plays a role in reference; the term NAME is used to NAME the NAME of the clinical laboratory medical term; the term SOURCE is used to label the SOURCE of clinical laboratory medical terms; the term SOURCE CODE is used to label CODE CODEs in the SOURCE of clinical laboratory medical terms, and the term DATE is used to label creation time, modification time, and time data; the term CLASS is used to denote a CLASS of terms that includes test medical object terms and test medical instrument terms; the term main participle is used for describing the most main term content of the corresponding clinical examination medical term; the term DATA is used to preserve the respective participles CODE of clinical laboratory medical terms.
It should be noted that: when the clinical examination medical term is stored, a source structure based on the participles is adopted, the participles are used as a basic unit of coding, and different participles in a code word library are combined to form the corresponding clinical examination medical term; because many participles are communicated mutually, so can save the very large storage space, and the management integration is higher too, only need to record every participle CODE that the term relates to can preserve the whole term, its structure, source are clearer too, in a concrete embodiment, the term ID is the order number that is set up by 1, the term CODE is the only designated CODE that the system forms according to the combination arrangement of the participle CODE; the term CLASS is denoted by 0 and 1 (0-test medical object term; 1-test medical instrument term).
As a more specific solution, performing a word segmentation operation on each clinical examination medical term through a word segmentation device, and obtaining a clinical examination medical term composed of word segmentation segments; classifying the segmentation into main segmentation, attribute segmentation and expansion segmentation according to the semantic meaning of the segmentation, and storing the medical terms of clinical examination according to the encoding structure of main segmentation CODE, multiple attribute segmentation CODE and expansion segmentation CODE.
It should be noted that: in the embodiment, the main participles with high occurrence frequency, clear ideograms and main meanings in clinical examination medical terms are used as the main participles, the participles for describing and refining the main participles are used as the attribute participles, and the participles with the functions of expansion and explanation are used as the expansion participles. The main participles and the attribute participles appear at fixed positions of the structured codes of the clinical examination medical terms by fixed coding digits, and the extension participles are not provided with the fixed digits and are added at the tail of the codes only when needed.
Further, the segmentation is stored in a coding word bank, and the storage structure includes: word segmentation ID, word segmentation CODE, word segmentation KIND, word segmentation DATE, word segmentation NAME and word segmentation DATA; the segmentation ID is used for sequentially numbering segmentation segments; the participle CODE is used as a unique reference CODE corresponding to the participle segment to play a reference role; the participle KIND is used for labeling the participle classification of the participle section, and the participle classification comprises a main participle, an attribute participle and an expansion participle; the participle DATE is used for marking the creation time, the modification time and the time data; the participle NAME is used for naming the NAME of the participle segment; the segmentation DATA is used for storing the text content of the segmentation segment.
As a more specific solution, the clinical test medical term includes a test object term and a test means term; the coding structure of the test object term: analyzing components (main participles), component description, sample analysis, dimension inspection, inspection precision, inspection means, inspection time (multiple attribute participles), remark supplement participles (expansion participles); the coding structure of the inspection instrument term: the method comprises the following steps of checking mode (main participle) + mode description, checking area, checking object, testability description, use description, posture description (multiple attribute participles) + remark supplement participles (expansion participles).
11. It should be noted that: in the embodiment, the encoding structures of all the inspection object terms and the inspection means terms are different, some structures can be empty, and some structures can be expanded; in a specific embodiment, the "check object" encodes the structure order:
analytical components (main): protein S antigens
Description of the ingredients: antigens
Analyzing a sample: blood, blood-enriching agent and method for producing the same
And (3) checking the dimension: mu.g/ml
And (3) checking the precision: 0.1. mu.g/ml
The inspection means is as follows: clinical blood examination
Checking time; 21/09/2313:32:23
The person to be detected: du xi
"check means" encodes the structural order:
test mode (main): clinical blood examination
Mode description: protein analyzer
And (3) an inspection area: superficial vein of arm
The test object is: protein S antigens
And (3) testability description: left side of the
Description of the use: examination for bleeding and coagulation
Description of the posture: sit upright
And (4) inspection personnel: king.
As a more specific solution, the segmentation word segment is labeled with corresponding weights through a segmentation word machine, and the similarity between the term to be encoded and each clinical examination medical term in the clinical examination medical term library is calculated through a similarity calculation formula, where the similarity calculation formula is a segmentation word similarity calculation formula based on cosine similarity and segmentation word weights:
Figure BDA0003285952310000081
wherein: a, B represent the term A and the term B, respectively,
Figure BDA0003285952310000091
a word-segmentation vector representing the term a,
Figure BDA0003285952310000092
a participle vector representing term B;
Figure BDA0003285952310000093
a set of segmentation weights representing the term a,
Figure BDA0003285952310000094
airepresents the ith participle weight of the term A;
Figure BDA0003285952310000095
set of participles representing term B
Figure BDA0003285952310000096
biRepresents the ith participle weight of term B; sam (A, B) denotes the similarity of term A and term B.
As a more specific solution, the word segmentation device is a word segmentation device based on combination of machine learning word segmentation and a word segmentation dictionary; after the initial word segmentation is carried out through the word segmentation dictionary, further word segmentation is carried out through the machine learning word segmentation device, and finally the machine learning word segmentation is manually checked to check the word segmentation accuracy.
Further, the segmentation dictionary comprises a commonly used segmentation dictionary and a clinical laboratory medicine segmentation dictionary; the common-term word segmentation dictionary is used for storing common-term segmentation words, the clinical examination medical word segmentation dictionary is used for storing clinical examination medical word segmentation words, the clinical examination medical word segmentation dictionary is automatically updated according to a clinical examination medical term library, and the word segmentation dictionary is marked with word segmentation part of speech, word segmentation semantic meaning (main word segmentation, attribute word segmentation and expansion word segmentation) and word segmentation weight of each word segmentation.
Furthermore, the machine learning word segmentation device is obtained by training based on a bidirectional LSTM + CRF machine learning algorithm and sequentially comprises a Look-up layer, a Forward LSTM layer, a backswood LSTM layer and a CRF layer; performing word segmentation operation and labeling on the existing clinical examination medical terms manually, wherein the labeling comprises word segmentation part of speech, word segmentation semanteme and word segmentation weight; and (3) taking the labeled clinical examination medical terms as training data, training and testing the bidirectional LSTM + CRF neural network model, and outputting the model meeting the word segmentation accuracy as a machine learning word segmentation device.
It should be noted that: the content omission can be caused by only adopting the word segmentation dictionary, and the problems of low word segmentation efficiency and manual examination of word segmentation results can be caused by only adopting the machine learning word segmentation device; therefore, the embodiment combines the two to perform word segmentation, performs preliminary word segmentation through the word segmentation dictionary, and peels off obviously determined word segmentation; therefore, the workload of the machine learning word segmentation device is reduced, secondary word segmentation is carried out through the machine learning word segmentation device, the workload of manual examination and verification is reduced, and the word segmentation efficiency is improved.
As a more specific solution, the term to be encoded judges whether the term to be encoded belongs to any clinical laboratory medical term in the clinical laboratory medical term library through a similarity threshold; when the similarity threshold is judged, the mapping mode comprises full mapping, basic mapping and principal and subordinate word mapping; the full mapping is to calculate the similarity of the main participle, the attribute participle and the expansion participle of the clinical examination medical term in the coding technology and the clinical examination medical term library respectively; the basic mapping is to calculate the similarity of the main participle and the attribute participle of the clinical examination medical term in the clinical examination medical term library; the principal participle mapping is to calculate and clinical examination medical term principal participle similarity in a clinical examination medical term library only.
It should be noted that: in the embodiment, because the structural levels of the terms to be coded are different, hierarchical mapping is required during mapping; therefore, three mapping modes of full mapping, basic mapping and principal and subordinate word mapping are added, and the universality is better.
As a more specific solution, terms to be encoded of clinical laboratory medical text are structurally encoded by the following steps:
s1, acquiring clinical examination medical texts needing structured coding;
s2, preprocessing the clinical examination medical text, including de-duplication, de-noising and text vectorization;
s3, performing word segmentation operation on the preprocessed clinical examination medical text through a word segmentation device, and labeling the part word property, the part word semanteme and the part word weight of each word;
s4, determining whether each participle is discarded or not through a participle weight threshold, wherein the participle below the participle weight threshold is regarded as a useless participle and discarded to obtain a term to be encoded;
s5, calculating the similarity between each word segmentation of the term to be coded and each word segmentation of the clinical examination medical term in the clinical examination medical term library according to the set mapping mode and the similarity calculation formula;
s6, judging whether the term to be coded belongs to any clinical examination medical term in the clinical examination medical term library through a similarity threshold;
s7 if it belongs, the term to be coded is represented by the code of the subordinate clinical laboratory medical term;
s8, if not, screening out the segmentation sections of the terms to be coded which do not meet the similarity threshold, and manually judging whether the segmentation sections which do not meet the similarity threshold are added into the coding segmentation library as new segmentation;
s9, if judging that the new participle is added into the coding participle library, updating the coding participle library, expressing the term to be coded through a participle segment to obtain a new clinical examination medical term, adding the new clinical examination medical term into the clinical examination medical term library, and updating a participle dictionary;
and S10, if judging that no new participle is added into the encoding participle library, discarding the term to be encoded as a nonsense term.
As a more specific solution, the method can also perform mutual structural code conversion on a clinical examination medical term library/dictionary library/mapping library adopting different coding rules; the structured transcoding is performed by:
d1 carrying out structured coding on clinical examination medical texts in a clinical examination medical term library/dictionary library/mapping library with different coding rules through steps S1-S10;
d2 directly establishing a mapping relation between clinical examination medical texts with the same structured codes;
d3, establishing a segmentation mapping relation for the clinical examination medical texts with the same partial segmentation;
d4 completely different clinical trial medical texts were relational mapped manually.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims (10)

1. A coding method for clinical examination medical texts is characterized in that the clinical examination medical texts which need to be structured coded are subjected to text analysis and processing to obtain terms to be coded; calculating the similarity between the term to be coded and each clinical examination medical term in the clinical examination medical term library through a similarity calculation formula; judging whether the term to be coded belongs to any clinical examination medical term in a clinical examination medical term library or not through a similarity threshold; if the medical term belongs to the category, the term to be coded is represented by the code of the subordinate clinical examination medical term; if not, judging whether to add the term as a new clinical examination medical term into the clinical examination medical term library.
2. The encoding method for clinical laboratory medical texts according to claim 1, wherein the clinical laboratory medical term library comprises a mother library and an extended library; the mother library manually extracts clinical examination medical terms from the clinical examination medical authoritative classification and coding works through clinical examination medical professionals and stores the clinical examination medical terms in a structured manner; the expansion library is used for structured preservation of new clinical laboratory medical terms.
3. The encoding method for clinical laboratory oriented medical texts according to claim 2, wherein said clinical laboratory medical terms are classified into laboratory medical object terms and laboratory medical means terms according to their semantic meanings; the clinical examination medical term is stored into a clinical examination medical term library through structuring;
the retention structure includes: term ID, term CODE, term NAME, term SOURCE _ CODE, term DATE, term CLASS, term main participle, and term DATA; the term ID is used to sequentially number clinical laboratory medical terms; the term CODE is used as a unique reference CODE corresponding to a clinical examination medical term and plays a role in reference; the term NAME is used to NAME the NAME of the clinical laboratory medical term; the term SOURCE is used to label the SOURCE of clinical laboratory medical terms; the term SOURCE CODE is used to label CODE CODEs in the SOURCE of clinical laboratory medical terms, and the term DATE is used to label creation time, modification time, and time data; the term CLASS is used to denote a CLASS of terms that includes test medical object terms and test medical instrument terms; the term main participle is used for describing the most main term content of the corresponding clinical examination medical term; the term DATA is used to preserve the respective participles CODE of clinical laboratory medical terms.
4. The encoding method for clinical laboratory medical text according to claim 3, wherein the segmentation operation is performed on each clinical laboratory medical term through a segmenter, and clinical laboratory medical terms composed of segmented words are obtained; classifying the segmentation sections into main segmentation, attribute segmentation and expansion segmentation according to the semantic meaning of the segmentation, and storing clinical examination medical terms according to the encoding structure of main segmentation CODE, multiple attribute segmentation CODE and expansion segmentation CODE;
the word segmentation is stored in a coding word bank, and the storage structure comprises: word segmentation ID, word segmentation CODE, word segmentation KIND, word segmentation DATE, word segmentation NAME and word segmentation DATA; the segmentation ID is used for sequentially numbering segmentation segments; the participle CODE is used as a unique reference CODE corresponding to the participle segment to play a reference role; the participle KIND is used for labeling the participle classification of the participle section, and the participle classification comprises a main participle, an attribute participle and an expansion participle; the participle DATE is used for marking the creation time, the modification time and the time data; the participle NAME is used for naming the NAME of the participle segment; the segmentation DATA is used for storing the text content of the segmentation segment.
5. The encoding method for clinical laboratory-oriented medical text according to claim 4, wherein the clinical laboratory medical terms include test object terms and test instrument terms;
the coding structure of the test object term: analyzing components (main participles), component description, sample analysis, dimension inspection, inspection precision, inspection means, inspection time (multiple attribute participles), remark supplement participles (expansion participles);
the coding structure of the inspection instrument term: the method comprises the following steps of checking mode (main participle) + mode description, checking area, checking object, testability description, use description, posture description (multiple attribute participles) + remark supplement participles (expansion participles).
6. The encoding method for clinical laboratory medical texts according to claim 5, wherein the segmentation word segments are labeled with corresponding weights through a segmenter, and the similarity between the term to be encoded and each clinical laboratory medical term in the clinical laboratory medical term library is calculated through a similarity calculation formula, wherein the similarity calculation formula is a segmentation word similarity calculation formula based on cosine similarity and segmentation word weights:
Figure FDA0003285952300000021
wherein: a, B represent the term A and the term B, respectively,
Figure FDA0003285952300000022
a word-segmentation vector representing the term a,
Figure FDA0003285952300000023
a participle vector representing term B;
Figure FDA0003285952300000024
a set of segmentation weights representing the term a,
Figure FDA0003285952300000025
airepresents the ith participle weight of the term A;
Figure FDA0003285952300000026
set of participles representing term B
Figure FDA0003285952300000027
biRepresents the ith participle weight of term B; sam (A, B) denotes the similarity of term A and term B.
7. The clinical laboratory test-oriented medical text encoding method according to claim 6, wherein the word segmentation device is a word segmentation device based on a combination of machine learning word segmentation and a word segmentation dictionary; after primary word segmentation is carried out through a word segmentation dictionary, further word segmentation is carried out through a machine learning word segmentation device, and finally, verification is carried out manually to check the word segmentation accuracy after machine learning word segmentation;
the word segmentation dictionary comprises a commonly used word segmentation dictionary and a clinical examination medical word segmentation dictionary; the common-word segmentation dictionary is used for storing common segmentation words, and the clinical test medical segmentation dictionary is used for storing clinical test medical segmentation words, wherein the clinical test medical segmentation dictionary is automatically updated according to a clinical test medical term library, and the segmentation dictionary is marked with the segmentation part of speech, the segmentation semantic (main segmentation words, attribute segmentation words and expansion segmentation words) and the segmentation weight of each segmentation word;
the machine learning word segmentation device is obtained by training based on a bidirectional LSTM + CRF machine learning algorithm and sequentially comprises a Look-up layer, a Forward LSTM layer, a backswood LSTM layer and a CRF layer;
performing word segmentation operation and labeling on the existing clinical examination medical terms manually, wherein the labeling comprises word segmentation part of speech, word segmentation semanteme and word segmentation weight; and (3) taking the labeled clinical examination medical terms as training data, training and testing the bidirectional LSTM + CRF neural network model, and outputting the model meeting the word segmentation accuracy as a machine learning word segmentation device.
8. The encoding method for clinical laboratory test medical texts according to claim 7, wherein the terms to be encoded are determined whether the terms to be encoded belong to any clinical laboratory test medical term in the clinical laboratory test medical term library by a similarity threshold; when the similarity threshold is judged, the mapping mode comprises full mapping, basic mapping and principal and subordinate word mapping; the full mapping is to calculate the similarity of the main participle, the attribute participle and the expansion participle of the clinical examination medical term in the coding technology and the clinical examination medical term library respectively; the basic mapping is to calculate the similarity of the main participle and the attribute participle of the clinical examination medical term in the clinical examination medical term library; the principal participle mapping is to calculate and clinical examination medical term principal participle similarity in a clinical examination medical term library only.
9. The clinical laboratory medical text-oriented encoding method according to claim 8, characterized in that the terms to be encoded of the clinical laboratory medical text are structurally encoded by the following steps:
s1, acquiring clinical examination medical texts needing structured coding;
s2, preprocessing the clinical examination medical text, including de-duplication, de-noising and text vectorization;
s3, performing word segmentation operation on the preprocessed clinical examination medical text through a word segmentation device, and labeling the part word property, the part word semanteme and the part word weight of each word;
s4, determining whether each participle is discarded or not through a participle weight threshold, wherein the participle below the participle weight threshold is regarded as a useless participle and discarded to obtain a term to be encoded;
s5, calculating the similarity between each word segmentation of the term to be coded and each word segmentation of the clinical examination medical term in the clinical examination medical term library according to the set mapping mode and the similarity calculation formula;
s6, judging whether the term to be coded belongs to any clinical examination medical term in the clinical examination medical term library through a similarity threshold;
s7 if it belongs, the term to be coded is represented by the code of the subordinate clinical laboratory medical term;
s8, if not, screening out the segmentation sections of the terms to be coded which do not meet the similarity threshold, and manually judging whether the segmentation sections which do not meet the similarity threshold are added into the coding segmentation library as new segmentation;
s9, if judging that the new participle is added into the coding participle library, updating the coding participle library, expressing the term to be coded through a participle segment to obtain a new clinical examination medical term, adding the new clinical examination medical term into the clinical examination medical term library, and updating a participle dictionary;
and S10, if judging that no new participle is added into the encoding participle library, discarding the term to be encoded as a nonsense term.
10. The encoding method for clinical laboratory medical texts according to claim 9, wherein the mutual structural code conversion can be performed on clinical laboratory medical term library/dictionary library/mapping library adopting different encoding rules; the structured transcoding is performed by:
d1 carrying out structured coding on clinical examination medical texts in a clinical examination medical term library/dictionary library/mapping library with different coding rules through steps S1-S10;
d2 directly establishing a mapping relation between clinical examination medical texts with the same structured codes;
d3, establishing a segmentation mapping relation for the clinical examination medical texts with the same partial segmentation;
d4 completely different clinical trial medical texts were relational mapped manually.
CN202111147404.XA 2021-09-29 2021-09-29 Coding method for clinical examination medical text Withdrawn CN113887204A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111147404.XA CN113887204A (en) 2021-09-29 2021-09-29 Coding method for clinical examination medical text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111147404.XA CN113887204A (en) 2021-09-29 2021-09-29 Coding method for clinical examination medical text

Publications (1)

Publication Number Publication Date
CN113887204A true CN113887204A (en) 2022-01-04

Family

ID=79007739

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111147404.XA Withdrawn CN113887204A (en) 2021-09-29 2021-09-29 Coding method for clinical examination medical text

Country Status (1)

Country Link
CN (1) CN113887204A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116682519A (en) * 2023-08-03 2023-09-01 广东杰纳医药科技有限公司 Clinical experiment data unit analysis method

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116682519A (en) * 2023-08-03 2023-09-01 广东杰纳医药科技有限公司 Clinical experiment data unit analysis method
CN116682519B (en) * 2023-08-03 2024-03-19 广东杰纳医药科技有限公司 Clinical experiment data unit analysis method

Similar Documents

Publication Publication Date Title
CN109065157B (en) Disease diagnosis standardized code recommendation list determination method and system
CN108831559B (en) Chinese electronic medical record text analysis method and system
CN113871003B (en) Disease auxiliary differential diagnosis system based on causal medical knowledge graph
CN107247881B (en) Multi-mode intelligent analysis method and system
US20220254493A1 (en) Chronic disease prediction system based on multi-task learning model
CN111316281B (en) Semantic classification method and system for numerical data in natural language context based on machine learning
CN113241135B (en) Disease risk prediction method and system based on multi-modal fusion
CN106934235B (en) Patient's similarity measurement migratory system between a kind of disease areas based on transfer learning
CN108628824A (en) A kind of entity recognition method based on Chinese electronic health record
CN107463786A (en) Medical image Knowledge Base based on structured report template
CN110335653B (en) Non-standard medical record analysis method based on openEHR medical record format
CN109036553A (en) A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge
CN109378066A (en) A kind of control method and control device for realizing disease forecasting based on feature vector
CN111191415A (en) Operation classification coding method based on original operation data
CN111223539A (en) Method for extracting relation of Chinese electronic medical record
CN113435200A (en) Entity recognition model training and electronic medical record processing method, system and equipment
CN113887204A (en) Coding method for clinical examination medical text
Farruque et al. Explainable zero-shot modelling of clinical depression symptoms from text
CN111524570A (en) Ultrasonic follow-up patient screening method based on machine learning
CN114547303A (en) Text multi-feature classification method and device based on Bert-LSTM
CN112071431B (en) Clinical path automatic generation method and system based on deep learning and knowledge graph
CN116304114B (en) Intelligent data processing method and system based on surgical nursing
CN115312186B (en) Auxiliary screening system for diabetic retinopathy
CN110060749A (en) Electronic health record intelligent diagnosing method based on SEV-SDG-CNN
CN115841861A (en) Similar medical record recommendation method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220104