CN113887204A - Coding method for clinical examination medical text - Google Patents
Coding method for clinical examination medical text Download PDFInfo
- Publication number
- CN113887204A CN113887204A CN202111147404.XA CN202111147404A CN113887204A CN 113887204 A CN113887204 A CN 113887204A CN 202111147404 A CN202111147404 A CN 202111147404A CN 113887204 A CN113887204 A CN 113887204A
- Authority
- CN
- China
- Prior art keywords
- term
- medical
- segmentation
- clinical examination
- participle
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Withdrawn
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/10—Text processing
- G06F40/12—Use of codes for handling textual entities
- G06F40/126—Character encoding
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H10/00—ICT specially adapted for the handling or processing of patient-related medical or healthcare data
- G16H10/20—ICT specially adapted for the handling or processing of patient-related medical or healthcare data for electronic clinical trials or questionnaires
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Epidemiology (AREA)
- Medical Informatics (AREA)
- Primary Health Care (AREA)
- Public Health (AREA)
- Medical Treatment And Welfare Office Work (AREA)
Abstract
The invention provides a coding method for a clinical examination medical text, and relates to the field of clinical examination medicine. The invention analyzes and processes the clinical examination medical text to obtain the content structure of the text, and carries out structured coding on each structure of the content structure; before structured coding, calculating the similarity of each clinical examination medical term in the clinical examination medical term library; repeated and approximate clinical examination medical terminology can be effectively reduced; when the clinical examination medical term is stored, a source structure based on the participles is adopted, the participles are used as a basic unit of coding, and different participles in a code word library are combined to form the corresponding clinical examination medical term; the storage space can be greatly saved; the word segmentation is carried out by combining the word segmentation dictionary and the machine learning word segmentation device, so that the workload of manual examination and verification is reduced, and the word segmentation efficiency is improved; and three mapping modes of full mapping, basic mapping and principal participle mapping are added, so that the universality is better.
Description
Technical Field
The invention relates to the field of clinical examination medicine, in particular to a coding method for a clinical examination medicine text.
Background
Clinical laboratory medicine is a bridge discipline established between basic medicine and clinical medicine, and relates to relevant knowledge in various fields of medicine. Is composed of many basic subjects of hematology, biochemistry, human parasitism, microbiology, immunology and the like, and is an important component part of medical and health work. It is a comprehensive application subject with mutual penetration and cross matching of multiple subjects on the basis of inspection medicine. Relates to various natural disciplines such as chemistry, physics, biology, optics, statistics, artificial intelligence, immunology, microbiology, genetics, molecular biology and the like. In the beginning of the 90 s, clinical test medical profession rapidly developed, the establishment of disciplines was unprecedentedly active, and the test disciplines developed from medical test to clinical test medicine and become an independent discipline. The main professional classes designed in the clinical laboratory medicine presidential stage include: molecular biology base, clinical testing medicine base, clinical biochemistry, clinical hematology, clinical transfusion, clinical microbiology, clinical immunology, human parasitology, practical diagnostics, clinical testing quality management, etc. It is obvious that the professional knowledge related to clinical examination medicine is wide, the knowledge structure is complex, if structured coding processing is not performed, the clinical examination medicine text is difficult to utilize and analyze, and in addition, due to different structured rules, no unified structured standard of the clinical examination medicine text exists at the present stage, so that data barriers appear, and the clinical examination medicine text cannot be interchanged and shared.
For this reason, the publication numbers are: the invention application of CN112131868A discloses a clinical trial medical coding method, which comprises: uploading a source file of clinical trial research, and specifying a column of words to be coded during uploading; matching the words to be coded with the corresponding standard dictionary based on the domain corresponding to the clinical trial research institute to obtain a coding result; and matching the coding result with the words to be coded in the source file, and exporting the words as a complete result. According to the invention, the uploading of the source file, the medical coding and the export of the coding result are sequentially carried out, so that the closed-loop medical coding is realized, and the working efficiency of the medical coding is effectively improved; and based on the coding results obtained for the domain corresponding to the clinical trial research, the research can be classified according to groups, the research in the same domain shares one set of coding results, copying is not needed between the researches, the workload of medical coding is reduced, and the working efficiency is greatly improved.
However, the application code is not structured about the content to be coded, and the mutual conversion can not be directly realized in the text adopting different coding rules. Therefore, there is a need to provide a coding method for clinical laboratory medical texts to solve the above technical problems.
Disclosure of Invention
In order to solve the technical problems, the invention provides a coding method for clinical examination medical texts, which is used for carrying out text analysis and processing on the clinical examination medical texts needing structured coding to obtain terms to be coded; calculating the similarity between the term to be coded and each clinical examination medical term in the clinical examination medical term library through a similarity calculation formula; judging whether the term to be coded belongs to any clinical examination medical term in a clinical examination medical term library or not through a similarity threshold; if the medical term belongs to the category, the term to be coded is represented by the code of the subordinate clinical examination medical term; if not, judging whether to add the term as a new clinical examination medical term into the clinical examination medical term library.
As a more specific solution, the clinical laboratory medical term library includes a mother library and an extended library; the mother library manually extracts clinical examination medical terms from the clinical examination medical authoritative classification and coding works through clinical examination medical professionals and stores the clinical examination medical terms in a structured manner; the expansion library is used for structured preservation of new clinical laboratory medical terms.
As a more specific solution, the clinical laboratory medical terms are classified into laboratory medical object terms and laboratory medical means terms according to the corresponding term semanteme; the clinical laboratory medical terms are stored into a clinical laboratory medical term repository by structuring.
Further, the retention structure includes: term ID, term CODE, term NAME, term SOURCE _ CODE, term DATE, term CLASS, term main participle, and term DATA; the term ID is used to sequentially number clinical laboratory medical terms; the term CODE is used as a unique reference CODE corresponding to a clinical examination medical term and plays a role in reference; the term NAME is used to NAME the NAME of the clinical laboratory medical term; the term SOURCE is used to label the SOURCE of clinical laboratory medical terms; the term SOURCE CODE is used to label CODE CODEs in the SOURCE of clinical laboratory medical terms, and the term DATE is used to label creation time, modification time, and time data; the term CLASS is used to denote a CLASS of terms that includes test medical object terms and test medical instrument terms; the term main participle is used for describing the most main term content of the corresponding clinical examination medical term; the term DATA is used to preserve the respective participles CODE of clinical laboratory medical terms.
As a more specific solution, performing a word segmentation operation on each clinical examination medical term through a word segmentation device, and obtaining a clinical examination medical term composed of word segmentation segments; classifying the segmentation into main segmentation, attribute segmentation and expansion segmentation according to the semantic meaning of the segmentation, and storing the medical terms of clinical examination according to the encoding structure of main segmentation CODE, multiple attribute segmentation CODE and expansion segmentation CODE.
Further, the segmentation is stored in a coding word bank, and the storage structure includes: word segmentation ID, word segmentation CODE, word segmentation KIND, word segmentation DATE, word segmentation NAME and word segmentation DATA; the segmentation ID is used for sequentially numbering segmentation segments; the participle CODE is used as a unique reference CODE corresponding to the participle segment to play a reference role; the participle KIND is used for labeling the participle classification of the participle section, and the participle classification comprises a main participle, an attribute participle and an expansion participle; the participle DATE is used for marking the creation time, the modification time and the time data; the participle NAME is used for naming the NAME of the participle segment; the segmentation DATA is used for storing the text content of the segmentation segment.
As a more specific solution, the clinical test medical term includes a test object term and a test means term; the coding structure of the test object term: analyzing components (main participles), component description, sample analysis, dimension inspection, inspection precision, inspection means, inspection time (multiple attribute participles), remark supplement participles (expansion participles); the coding structure of the inspection instrument term: the method comprises the following steps of checking mode (main participle) + mode description, checking area, checking object, testability description, use description, posture description (multiple attribute participles) + remark supplement participles (expansion participles).
As a more specific solution, the segmentation word segment is labeled with corresponding weights through a segmentation word machine, and the similarity between the term to be encoded and each clinical examination medical term in the clinical examination medical term library is calculated through a similarity calculation formula, where the similarity calculation formula is a segmentation word similarity calculation formula based on cosine similarity and segmentation word weights:
wherein: a, B represent the term A and the term B, respectively,a word-segmentation vector representing the term a,a participle vector representing term B;a set of segmentation weights representing the term a,airepresents the ith participle weight of the term A;set of participles representing term BbiRepresents the ith participle weight of term B; sam (A, B) denotes the similarity of term A and term B.
As a more specific solution, the word segmentation device is a word segmentation device based on combination of machine learning word segmentation and a word segmentation dictionary; after the initial word segmentation is carried out through the word segmentation dictionary, further word segmentation is carried out through the machine learning word segmentation device, and finally the machine learning word segmentation is manually checked to check the word segmentation accuracy.
Further, the segmentation dictionary comprises a commonly used segmentation dictionary and a clinical laboratory medicine segmentation dictionary; the common-term word segmentation dictionary is used for storing common-term segmentation words, the clinical examination medical word segmentation dictionary is used for storing clinical examination medical word segmentation words, the clinical examination medical word segmentation dictionary is automatically updated according to a clinical examination medical term library, and the word segmentation dictionary is marked with word segmentation part of speech, word segmentation semantic meaning (main word segmentation, attribute word segmentation and expansion word segmentation) and word segmentation weight of each word segmentation.
Furthermore, the machine learning word segmentation device is obtained by training based on a bidirectional LSTM + CRF machine learning algorithm and sequentially comprises a Look-up layer, a Forward LSTM layer, a backswood LSTM layer and a CRF layer; performing word segmentation operation and labeling on the existing clinical examination medical terms manually, wherein the labeling comprises word segmentation part of speech, word segmentation semanteme and word segmentation weight; and (3) taking the labeled clinical examination medical terms as training data, training and testing the bidirectional LSTM + CRF neural network model, and outputting the model meeting the word segmentation accuracy as a machine learning word segmentation device.
As a more specific solution, the term to be encoded judges whether the term to be encoded belongs to any clinical laboratory medical term in the clinical laboratory medical term library through a similarity threshold; when the similarity threshold is judged, the mapping mode comprises full mapping, basic mapping and principal and subordinate word mapping; the full mapping is to calculate the similarity of the main participle, the attribute participle and the expansion participle of the clinical examination medical term in the coding technology and the clinical examination medical term library respectively; the basic mapping is to calculate the similarity of the main participle and the attribute participle of the clinical examination medical term in the clinical examination medical term library; the principal participle mapping is to calculate and clinical examination medical term principal participle similarity in a clinical examination medical term library only.
As a more specific solution, terms to be encoded of clinical laboratory medical text are structurally encoded by the following steps:
s1, acquiring clinical examination medical texts needing structured coding;
s2, preprocessing the clinical examination medical text, including de-duplication, de-noising and text vectorization;
s3, performing word segmentation operation on the preprocessed clinical examination medical text through a word segmentation device, and labeling the part word property, the part word semanteme and the part word weight of each word;
s4, determining whether each participle is discarded or not through a participle weight threshold, wherein the participle below the participle weight threshold is regarded as a useless participle and discarded to obtain a term to be encoded;
s5, calculating the similarity between each word segmentation of the term to be coded and each word segmentation of the clinical examination medical term in the clinical examination medical term library according to the set mapping mode and the similarity calculation formula;
s6, judging whether the term to be coded belongs to any clinical examination medical term in the clinical examination medical term library through a similarity threshold;
s7 if it belongs, the term to be coded is represented by the code of the subordinate clinical laboratory medical term;
s8, if not, screening out the segmentation sections of the terms to be coded which do not meet the similarity threshold, and manually judging whether the segmentation sections which do not meet the similarity threshold are added into the coding segmentation library as new segmentation;
s9, if judging that the new participle is added into the coding participle library, updating the coding participle library, expressing the term to be coded through a participle segment to obtain a new clinical examination medical term, adding the new clinical examination medical term into the clinical examination medical term library, and updating a participle dictionary;
and S10, if judging that no new participle is added into the encoding participle library, discarding the term to be encoded as a nonsense term.
As a more specific solution, the method can also perform mutual structural code conversion on a clinical examination medical term library/dictionary library/mapping library adopting different coding rules; the structured transcoding is performed by:
d1 carrying out structured coding on clinical examination medical texts in a clinical examination medical term library/dictionary library/mapping library with different coding rules through steps S1-S10;
d2 directly establishing a mapping relation between clinical examination medical texts with the same structured codes;
d3, establishing a segmentation mapping relation for the clinical examination medical texts with the same partial segmentation;
d4 completely different clinical trial medical texts were relational mapped manually.
Compared with the related art, the coding method for the clinical examination oriented medical text has the following beneficial effects:
1. the invention analyzes and processes the clinical examination medical text to obtain the content structure of the text, and carries out structured coding on each structure of the content structure; before structured coding, calculating the similarity of each clinical examination medical term in the clinical examination medical term library; repeated and approximate clinical examination medical terms can be effectively reduced, and in addition, different code conversion of similar contents can be realized by calculating the approximation degree;
2. when the clinical examination medical term is stored, a source structure based on the participles is adopted, the participles are used as a basic unit of coding, and different participles in a code word library are combined to form the corresponding clinical examination medical term; because many participles are communicated in a public way, the storage space can be greatly saved, the management integration level is higher, the whole term can be stored only by recording the CODE of each participle related to the term, and the structure and the source of the term are clearer;
3. the invention divides words by combining a word division dictionary and a machine learning word divider, firstly performs preliminary word division by the word division dictionary, and peels off obviously determined words; therefore, the workload of the machine learning word segmentation device is reduced, and secondary word segmentation is carried out through the machine learning word segmentation device, so that the workload of manual examination and verification is reduced, and the word segmentation efficiency is improved;
4. the invention considers that the mapping needs to be carried out in a layering way due to different structural layers of terms to be coded; therefore, three mapping modes of full mapping, basic mapping and principal and subordinate word mapping are added, and the universality is better.
Drawings
Fig. 1 is a flowchart illustrating a method for encoding medical texts for clinical examinations according to a preferred embodiment of the present invention.
Detailed Description
The invention is further described with reference to the following figures and embodiments.
As shown in fig. 1, the coding method for clinical examination medical text provided by the present invention performs text analysis and processing on the clinical examination medical text that needs to be structured coded, to obtain a term to be coded; calculating the similarity between the term to be coded and each clinical examination medical term in the clinical examination medical term library through a similarity calculation formula; judging whether the term to be coded belongs to any clinical examination medical term in a clinical examination medical term library or not through a similarity threshold; if the medical term belongs to the category, the term to be coded is represented by the code of the subordinate clinical examination medical term; if not, judging whether to add the term as a new clinical examination medical term into the clinical examination medical term library.
It is to be noted that; at the present stage, a coding method for clinical examination medical texts is not perfect, and usually a time sequence coding mode is adopted, and unique coding numbers are given to the clinical examination medical texts; however, this method cannot reflect the content structure of the clinical examination medical text, in addition, the clinical examination medical text has a large number of repeated fields, some clinical examination medical texts have many identical or even nearly identical contents, if each clinical examination medical text is numbered independently, the workload and the management difficulty are extremely large, in this embodiment, the content structure is obtained by analyzing and processing the clinical examination medical text, and each structure is structurally encoded; before structured coding, similarity between the term to be coded and each clinical examination medical term in the clinical examination medical term library is calculated, and if the term to be coded belongs to the clinical examination medical term library, the term to be coded is represented by the code of the subordinate clinical examination medical term; if not, judging whether to add the term into the clinical examination medical term library as a new clinical examination medical term, effectively reducing repeated and approximate clinical examination medical terms, and realizing different code conversion of similar contents by calculating the approximation degree.
As a more specific solution, the clinical laboratory medical term library includes a mother library and an extended library; the mother library manually extracts clinical examination medical terms from the clinical examination medical authoritative classification and coding works through clinical examination medical professionals and stores the clinical examination medical terms in a structured manner; the expansion library is used for structured preservation of new clinical laboratory medical terms.
It should be noted that: the mother library of the clinical examination medical term library is mainly used as a credible reference source when similarity is matched, so that a professional is required to strictly construct the mother library, and the mother library is formed by integrating according to the patent and welfare clinical examination medical works such as 'Chinese medical subject word list', 'clinical examination item classification and code', 'clinical examination medical basic noun term collection', 'Chinese examination medical term standard', 'international examination medical term standard', and the like.
As a more specific solution, the clinical laboratory medical terms are classified into laboratory medical object terms and laboratory medical means terms according to the corresponding term semanteme; the clinical laboratory medical terms are stored into a clinical laboratory medical term repository by structuring.
It should be noted that: in clinical laboratory medical texts, the most common and the most important ones relate to the objects: test medical object terms and test medical instrument terms.
Further, the retention structure includes: term ID, term CODE, term NAME, term SOURCE _ CODE, term DATE, term CLASS, term main participle, and term DATA; the term ID is used to sequentially number clinical laboratory medical terms; the term CODE is used as a unique reference CODE corresponding to a clinical examination medical term and plays a role in reference; the term NAME is used to NAME the NAME of the clinical laboratory medical term; the term SOURCE is used to label the SOURCE of clinical laboratory medical terms; the term SOURCE CODE is used to label CODE CODEs in the SOURCE of clinical laboratory medical terms, and the term DATE is used to label creation time, modification time, and time data; the term CLASS is used to denote a CLASS of terms that includes test medical object terms and test medical instrument terms; the term main participle is used for describing the most main term content of the corresponding clinical examination medical term; the term DATA is used to preserve the respective participles CODE of clinical laboratory medical terms.
It should be noted that: when the clinical examination medical term is stored, a source structure based on the participles is adopted, the participles are used as a basic unit of coding, and different participles in a code word library are combined to form the corresponding clinical examination medical term; because many participles are communicated mutually, so can save the very large storage space, and the management integration is higher too, only need to record every participle CODE that the term relates to can preserve the whole term, its structure, source are clearer too, in a concrete embodiment, the term ID is the order number that is set up by 1, the term CODE is the only designated CODE that the system forms according to the combination arrangement of the participle CODE; the term CLASS is denoted by 0 and 1 (0-test medical object term; 1-test medical instrument term).
As a more specific solution, performing a word segmentation operation on each clinical examination medical term through a word segmentation device, and obtaining a clinical examination medical term composed of word segmentation segments; classifying the segmentation into main segmentation, attribute segmentation and expansion segmentation according to the semantic meaning of the segmentation, and storing the medical terms of clinical examination according to the encoding structure of main segmentation CODE, multiple attribute segmentation CODE and expansion segmentation CODE.
It should be noted that: in the embodiment, the main participles with high occurrence frequency, clear ideograms and main meanings in clinical examination medical terms are used as the main participles, the participles for describing and refining the main participles are used as the attribute participles, and the participles with the functions of expansion and explanation are used as the expansion participles. The main participles and the attribute participles appear at fixed positions of the structured codes of the clinical examination medical terms by fixed coding digits, and the extension participles are not provided with the fixed digits and are added at the tail of the codes only when needed.
Further, the segmentation is stored in a coding word bank, and the storage structure includes: word segmentation ID, word segmentation CODE, word segmentation KIND, word segmentation DATE, word segmentation NAME and word segmentation DATA; the segmentation ID is used for sequentially numbering segmentation segments; the participle CODE is used as a unique reference CODE corresponding to the participle segment to play a reference role; the participle KIND is used for labeling the participle classification of the participle section, and the participle classification comprises a main participle, an attribute participle and an expansion participle; the participle DATE is used for marking the creation time, the modification time and the time data; the participle NAME is used for naming the NAME of the participle segment; the segmentation DATA is used for storing the text content of the segmentation segment.
As a more specific solution, the clinical test medical term includes a test object term and a test means term; the coding structure of the test object term: analyzing components (main participles), component description, sample analysis, dimension inspection, inspection precision, inspection means, inspection time (multiple attribute participles), remark supplement participles (expansion participles); the coding structure of the inspection instrument term: the method comprises the following steps of checking mode (main participle) + mode description, checking area, checking object, testability description, use description, posture description (multiple attribute participles) + remark supplement participles (expansion participles).
11. It should be noted that: in the embodiment, the encoding structures of all the inspection object terms and the inspection means terms are different, some structures can be empty, and some structures can be expanded; in a specific embodiment, the "check object" encodes the structure order:
analytical components (main): protein S antigens
Description of the ingredients: antigens
Analyzing a sample: blood, blood-enriching agent and method for producing the same
And (3) checking the dimension: mu.g/ml
And (3) checking the precision: 0.1. mu.g/ml
The inspection means is as follows: clinical blood examination
Checking time; 21/09/2313:32:23
The person to be detected: du xi
"check means" encodes the structural order:
test mode (main): clinical blood examination
Mode description: protein analyzer
And (3) an inspection area: superficial vein of arm
The test object is: protein S antigens
And (3) testability description: left side of the
Description of the use: examination for bleeding and coagulation
Description of the posture: sit upright
And (4) inspection personnel: king.
As a more specific solution, the segmentation word segment is labeled with corresponding weights through a segmentation word machine, and the similarity between the term to be encoded and each clinical examination medical term in the clinical examination medical term library is calculated through a similarity calculation formula, where the similarity calculation formula is a segmentation word similarity calculation formula based on cosine similarity and segmentation word weights:
wherein: a, B represent the term A and the term B, respectively,a word-segmentation vector representing the term a,a participle vector representing term B;a set of segmentation weights representing the term a,airepresents the ith participle weight of the term A;set of participles representing term BbiRepresents the ith participle weight of term B; sam (A, B) denotes the similarity of term A and term B.
As a more specific solution, the word segmentation device is a word segmentation device based on combination of machine learning word segmentation and a word segmentation dictionary; after the initial word segmentation is carried out through the word segmentation dictionary, further word segmentation is carried out through the machine learning word segmentation device, and finally the machine learning word segmentation is manually checked to check the word segmentation accuracy.
Further, the segmentation dictionary comprises a commonly used segmentation dictionary and a clinical laboratory medicine segmentation dictionary; the common-term word segmentation dictionary is used for storing common-term segmentation words, the clinical examination medical word segmentation dictionary is used for storing clinical examination medical word segmentation words, the clinical examination medical word segmentation dictionary is automatically updated according to a clinical examination medical term library, and the word segmentation dictionary is marked with word segmentation part of speech, word segmentation semantic meaning (main word segmentation, attribute word segmentation and expansion word segmentation) and word segmentation weight of each word segmentation.
Furthermore, the machine learning word segmentation device is obtained by training based on a bidirectional LSTM + CRF machine learning algorithm and sequentially comprises a Look-up layer, a Forward LSTM layer, a backswood LSTM layer and a CRF layer; performing word segmentation operation and labeling on the existing clinical examination medical terms manually, wherein the labeling comprises word segmentation part of speech, word segmentation semanteme and word segmentation weight; and (3) taking the labeled clinical examination medical terms as training data, training and testing the bidirectional LSTM + CRF neural network model, and outputting the model meeting the word segmentation accuracy as a machine learning word segmentation device.
It should be noted that: the content omission can be caused by only adopting the word segmentation dictionary, and the problems of low word segmentation efficiency and manual examination of word segmentation results can be caused by only adopting the machine learning word segmentation device; therefore, the embodiment combines the two to perform word segmentation, performs preliminary word segmentation through the word segmentation dictionary, and peels off obviously determined word segmentation; therefore, the workload of the machine learning word segmentation device is reduced, secondary word segmentation is carried out through the machine learning word segmentation device, the workload of manual examination and verification is reduced, and the word segmentation efficiency is improved.
As a more specific solution, the term to be encoded judges whether the term to be encoded belongs to any clinical laboratory medical term in the clinical laboratory medical term library through a similarity threshold; when the similarity threshold is judged, the mapping mode comprises full mapping, basic mapping and principal and subordinate word mapping; the full mapping is to calculate the similarity of the main participle, the attribute participle and the expansion participle of the clinical examination medical term in the coding technology and the clinical examination medical term library respectively; the basic mapping is to calculate the similarity of the main participle and the attribute participle of the clinical examination medical term in the clinical examination medical term library; the principal participle mapping is to calculate and clinical examination medical term principal participle similarity in a clinical examination medical term library only.
It should be noted that: in the embodiment, because the structural levels of the terms to be coded are different, hierarchical mapping is required during mapping; therefore, three mapping modes of full mapping, basic mapping and principal and subordinate word mapping are added, and the universality is better.
As a more specific solution, terms to be encoded of clinical laboratory medical text are structurally encoded by the following steps:
s1, acquiring clinical examination medical texts needing structured coding;
s2, preprocessing the clinical examination medical text, including de-duplication, de-noising and text vectorization;
s3, performing word segmentation operation on the preprocessed clinical examination medical text through a word segmentation device, and labeling the part word property, the part word semanteme and the part word weight of each word;
s4, determining whether each participle is discarded or not through a participle weight threshold, wherein the participle below the participle weight threshold is regarded as a useless participle and discarded to obtain a term to be encoded;
s5, calculating the similarity between each word segmentation of the term to be coded and each word segmentation of the clinical examination medical term in the clinical examination medical term library according to the set mapping mode and the similarity calculation formula;
s6, judging whether the term to be coded belongs to any clinical examination medical term in the clinical examination medical term library through a similarity threshold;
s7 if it belongs, the term to be coded is represented by the code of the subordinate clinical laboratory medical term;
s8, if not, screening out the segmentation sections of the terms to be coded which do not meet the similarity threshold, and manually judging whether the segmentation sections which do not meet the similarity threshold are added into the coding segmentation library as new segmentation;
s9, if judging that the new participle is added into the coding participle library, updating the coding participle library, expressing the term to be coded through a participle segment to obtain a new clinical examination medical term, adding the new clinical examination medical term into the clinical examination medical term library, and updating a participle dictionary;
and S10, if judging that no new participle is added into the encoding participle library, discarding the term to be encoded as a nonsense term.
As a more specific solution, the method can also perform mutual structural code conversion on a clinical examination medical term library/dictionary library/mapping library adopting different coding rules; the structured transcoding is performed by:
d1 carrying out structured coding on clinical examination medical texts in a clinical examination medical term library/dictionary library/mapping library with different coding rules through steps S1-S10;
d2 directly establishing a mapping relation between clinical examination medical texts with the same structured codes;
d3, establishing a segmentation mapping relation for the clinical examination medical texts with the same partial segmentation;
d4 completely different clinical trial medical texts were relational mapped manually.
The above description is only an embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.
Claims (10)
1. A coding method for clinical examination medical texts is characterized in that the clinical examination medical texts which need to be structured coded are subjected to text analysis and processing to obtain terms to be coded; calculating the similarity between the term to be coded and each clinical examination medical term in the clinical examination medical term library through a similarity calculation formula; judging whether the term to be coded belongs to any clinical examination medical term in a clinical examination medical term library or not through a similarity threshold; if the medical term belongs to the category, the term to be coded is represented by the code of the subordinate clinical examination medical term; if not, judging whether to add the term as a new clinical examination medical term into the clinical examination medical term library.
2. The encoding method for clinical laboratory medical texts according to claim 1, wherein the clinical laboratory medical term library comprises a mother library and an extended library; the mother library manually extracts clinical examination medical terms from the clinical examination medical authoritative classification and coding works through clinical examination medical professionals and stores the clinical examination medical terms in a structured manner; the expansion library is used for structured preservation of new clinical laboratory medical terms.
3. The encoding method for clinical laboratory oriented medical texts according to claim 2, wherein said clinical laboratory medical terms are classified into laboratory medical object terms and laboratory medical means terms according to their semantic meanings; the clinical examination medical term is stored into a clinical examination medical term library through structuring;
the retention structure includes: term ID, term CODE, term NAME, term SOURCE _ CODE, term DATE, term CLASS, term main participle, and term DATA; the term ID is used to sequentially number clinical laboratory medical terms; the term CODE is used as a unique reference CODE corresponding to a clinical examination medical term and plays a role in reference; the term NAME is used to NAME the NAME of the clinical laboratory medical term; the term SOURCE is used to label the SOURCE of clinical laboratory medical terms; the term SOURCE CODE is used to label CODE CODEs in the SOURCE of clinical laboratory medical terms, and the term DATE is used to label creation time, modification time, and time data; the term CLASS is used to denote a CLASS of terms that includes test medical object terms and test medical instrument terms; the term main participle is used for describing the most main term content of the corresponding clinical examination medical term; the term DATA is used to preserve the respective participles CODE of clinical laboratory medical terms.
4. The encoding method for clinical laboratory medical text according to claim 3, wherein the segmentation operation is performed on each clinical laboratory medical term through a segmenter, and clinical laboratory medical terms composed of segmented words are obtained; classifying the segmentation sections into main segmentation, attribute segmentation and expansion segmentation according to the semantic meaning of the segmentation, and storing clinical examination medical terms according to the encoding structure of main segmentation CODE, multiple attribute segmentation CODE and expansion segmentation CODE;
the word segmentation is stored in a coding word bank, and the storage structure comprises: word segmentation ID, word segmentation CODE, word segmentation KIND, word segmentation DATE, word segmentation NAME and word segmentation DATA; the segmentation ID is used for sequentially numbering segmentation segments; the participle CODE is used as a unique reference CODE corresponding to the participle segment to play a reference role; the participle KIND is used for labeling the participle classification of the participle section, and the participle classification comprises a main participle, an attribute participle and an expansion participle; the participle DATE is used for marking the creation time, the modification time and the time data; the participle NAME is used for naming the NAME of the participle segment; the segmentation DATA is used for storing the text content of the segmentation segment.
5. The encoding method for clinical laboratory-oriented medical text according to claim 4, wherein the clinical laboratory medical terms include test object terms and test instrument terms;
the coding structure of the test object term: analyzing components (main participles), component description, sample analysis, dimension inspection, inspection precision, inspection means, inspection time (multiple attribute participles), remark supplement participles (expansion participles);
the coding structure of the inspection instrument term: the method comprises the following steps of checking mode (main participle) + mode description, checking area, checking object, testability description, use description, posture description (multiple attribute participles) + remark supplement participles (expansion participles).
6. The encoding method for clinical laboratory medical texts according to claim 5, wherein the segmentation word segments are labeled with corresponding weights through a segmenter, and the similarity between the term to be encoded and each clinical laboratory medical term in the clinical laboratory medical term library is calculated through a similarity calculation formula, wherein the similarity calculation formula is a segmentation word similarity calculation formula based on cosine similarity and segmentation word weights:
wherein: a, B represent the term A and the term B, respectively,a word-segmentation vector representing the term a,a participle vector representing term B;a set of segmentation weights representing the term a,airepresents the ith participle weight of the term A;set of participles representing term BbiRepresents the ith participle weight of term B; sam (A, B) denotes the similarity of term A and term B.
7. The clinical laboratory test-oriented medical text encoding method according to claim 6, wherein the word segmentation device is a word segmentation device based on a combination of machine learning word segmentation and a word segmentation dictionary; after primary word segmentation is carried out through a word segmentation dictionary, further word segmentation is carried out through a machine learning word segmentation device, and finally, verification is carried out manually to check the word segmentation accuracy after machine learning word segmentation;
the word segmentation dictionary comprises a commonly used word segmentation dictionary and a clinical examination medical word segmentation dictionary; the common-word segmentation dictionary is used for storing common segmentation words, and the clinical test medical segmentation dictionary is used for storing clinical test medical segmentation words, wherein the clinical test medical segmentation dictionary is automatically updated according to a clinical test medical term library, and the segmentation dictionary is marked with the segmentation part of speech, the segmentation semantic (main segmentation words, attribute segmentation words and expansion segmentation words) and the segmentation weight of each segmentation word;
the machine learning word segmentation device is obtained by training based on a bidirectional LSTM + CRF machine learning algorithm and sequentially comprises a Look-up layer, a Forward LSTM layer, a backswood LSTM layer and a CRF layer;
performing word segmentation operation and labeling on the existing clinical examination medical terms manually, wherein the labeling comprises word segmentation part of speech, word segmentation semanteme and word segmentation weight; and (3) taking the labeled clinical examination medical terms as training data, training and testing the bidirectional LSTM + CRF neural network model, and outputting the model meeting the word segmentation accuracy as a machine learning word segmentation device.
8. The encoding method for clinical laboratory test medical texts according to claim 7, wherein the terms to be encoded are determined whether the terms to be encoded belong to any clinical laboratory test medical term in the clinical laboratory test medical term library by a similarity threshold; when the similarity threshold is judged, the mapping mode comprises full mapping, basic mapping and principal and subordinate word mapping; the full mapping is to calculate the similarity of the main participle, the attribute participle and the expansion participle of the clinical examination medical term in the coding technology and the clinical examination medical term library respectively; the basic mapping is to calculate the similarity of the main participle and the attribute participle of the clinical examination medical term in the clinical examination medical term library; the principal participle mapping is to calculate and clinical examination medical term principal participle similarity in a clinical examination medical term library only.
9. The clinical laboratory medical text-oriented encoding method according to claim 8, characterized in that the terms to be encoded of the clinical laboratory medical text are structurally encoded by the following steps:
s1, acquiring clinical examination medical texts needing structured coding;
s2, preprocessing the clinical examination medical text, including de-duplication, de-noising and text vectorization;
s3, performing word segmentation operation on the preprocessed clinical examination medical text through a word segmentation device, and labeling the part word property, the part word semanteme and the part word weight of each word;
s4, determining whether each participle is discarded or not through a participle weight threshold, wherein the participle below the participle weight threshold is regarded as a useless participle and discarded to obtain a term to be encoded;
s5, calculating the similarity between each word segmentation of the term to be coded and each word segmentation of the clinical examination medical term in the clinical examination medical term library according to the set mapping mode and the similarity calculation formula;
s6, judging whether the term to be coded belongs to any clinical examination medical term in the clinical examination medical term library through a similarity threshold;
s7 if it belongs, the term to be coded is represented by the code of the subordinate clinical laboratory medical term;
s8, if not, screening out the segmentation sections of the terms to be coded which do not meet the similarity threshold, and manually judging whether the segmentation sections which do not meet the similarity threshold are added into the coding segmentation library as new segmentation;
s9, if judging that the new participle is added into the coding participle library, updating the coding participle library, expressing the term to be coded through a participle segment to obtain a new clinical examination medical term, adding the new clinical examination medical term into the clinical examination medical term library, and updating a participle dictionary;
and S10, if judging that no new participle is added into the encoding participle library, discarding the term to be encoded as a nonsense term.
10. The encoding method for clinical laboratory medical texts according to claim 9, wherein the mutual structural code conversion can be performed on clinical laboratory medical term library/dictionary library/mapping library adopting different encoding rules; the structured transcoding is performed by:
d1 carrying out structured coding on clinical examination medical texts in a clinical examination medical term library/dictionary library/mapping library with different coding rules through steps S1-S10;
d2 directly establishing a mapping relation between clinical examination medical texts with the same structured codes;
d3, establishing a segmentation mapping relation for the clinical examination medical texts with the same partial segmentation;
d4 completely different clinical trial medical texts were relational mapped manually.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111147404.XA CN113887204A (en) | 2021-09-29 | 2021-09-29 | Coding method for clinical examination medical text |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202111147404.XA CN113887204A (en) | 2021-09-29 | 2021-09-29 | Coding method for clinical examination medical text |
Publications (1)
Publication Number | Publication Date |
---|---|
CN113887204A true CN113887204A (en) | 2022-01-04 |
Family
ID=79007739
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202111147404.XA Withdrawn CN113887204A (en) | 2021-09-29 | 2021-09-29 | Coding method for clinical examination medical text |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN113887204A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116682519A (en) * | 2023-08-03 | 2023-09-01 | 广东杰纳医药科技有限公司 | Clinical experiment data unit analysis method |
-
2021
- 2021-09-29 CN CN202111147404.XA patent/CN113887204A/en not_active Withdrawn
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN116682519A (en) * | 2023-08-03 | 2023-09-01 | 广东杰纳医药科技有限公司 | Clinical experiment data unit analysis method |
CN116682519B (en) * | 2023-08-03 | 2024-03-19 | 广东杰纳医药科技有限公司 | Clinical experiment data unit analysis method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109065157B (en) | Disease diagnosis standardized code recommendation list determination method and system | |
CN108831559B (en) | Chinese electronic medical record text analysis method and system | |
CN113871003B (en) | Disease auxiliary differential diagnosis system based on causal medical knowledge graph | |
CN107247881B (en) | Multi-mode intelligent analysis method and system | |
US20220254493A1 (en) | Chronic disease prediction system based on multi-task learning model | |
CN111316281B (en) | Semantic classification method and system for numerical data in natural language context based on machine learning | |
CN113241135B (en) | Disease risk prediction method and system based on multi-modal fusion | |
CN106934235B (en) | Patient's similarity measurement migratory system between a kind of disease areas based on transfer learning | |
CN108628824A (en) | A kind of entity recognition method based on Chinese electronic health record | |
CN107463786A (en) | Medical image Knowledge Base based on structured report template | |
CN110335653B (en) | Non-standard medical record analysis method based on openEHR medical record format | |
CN109036553A (en) | A kind of disease forecasting method based on automatic extraction Medical Technologist's knowledge | |
CN109378066A (en) | A kind of control method and control device for realizing disease forecasting based on feature vector | |
CN111191415A (en) | Operation classification coding method based on original operation data | |
CN111223539A (en) | Method for extracting relation of Chinese electronic medical record | |
CN113435200A (en) | Entity recognition model training and electronic medical record processing method, system and equipment | |
CN113887204A (en) | Coding method for clinical examination medical text | |
Farruque et al. | Explainable zero-shot modelling of clinical depression symptoms from text | |
CN111524570A (en) | Ultrasonic follow-up patient screening method based on machine learning | |
CN114547303A (en) | Text multi-feature classification method and device based on Bert-LSTM | |
CN112071431B (en) | Clinical path automatic generation method and system based on deep learning and knowledge graph | |
CN116304114B (en) | Intelligent data processing method and system based on surgical nursing | |
CN115312186B (en) | Auxiliary screening system for diabetic retinopathy | |
CN110060749A (en) | Electronic health record intelligent diagnosing method based on SEV-SDG-CNN | |
CN115841861A (en) | Similar medical record recommendation method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WW01 | Invention patent application withdrawn after publication | ||
WW01 | Invention patent application withdrawn after publication |
Application publication date: 20220104 |