CN117034911B - Correction method and device for hospital diagnosis dictionary, server and storage medium - Google Patents
Correction method and device for hospital diagnosis dictionary, server and storage medium Download PDFInfo
- Publication number
- CN117034911B CN117034911B CN202311264209.4A CN202311264209A CN117034911B CN 117034911 B CN117034911 B CN 117034911B CN 202311264209 A CN202311264209 A CN 202311264209A CN 117034911 B CN117034911 B CN 117034911B
- Authority
- CN
- China
- Prior art keywords
- word
- word segmentation
- suspected
- diagnosis
- modifier
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000003745 diagnosis Methods 0.000 title claims abstract description 86
- 238000000034 method Methods 0.000 title claims abstract description 46
- 230000011218 segmentation Effects 0.000 claims abstract description 151
- 239000003607 modifier Substances 0.000 claims abstract description 75
- 238000001514 detection method Methods 0.000 claims abstract description 49
- 239000011159 matrix material Substances 0.000 claims description 38
- 239000012634 fragment Substances 0.000 claims description 3
- 238000002372 labelling Methods 0.000 claims description 3
- 238000012163 sequencing technique Methods 0.000 claims description 2
- 230000003190 augmentative effect Effects 0.000 claims 1
- 230000007547 defect Effects 0.000 abstract description 4
- 230000003287 optical effect Effects 0.000 description 6
- 238000010586 diagram Methods 0.000 description 5
- 208000004998 Abdominal Pain Diseases 0.000 description 3
- 206010000060 Abdominal distension Diseases 0.000 description 3
- 210000001015 abdomen Anatomy 0.000 description 2
- 238000000605 extraction Methods 0.000 description 2
- 230000006870 function Effects 0.000 description 2
- 239000013307 optical fiber Substances 0.000 description 2
- 230000002093 peripheral effect Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 230000000644 propagated effect Effects 0.000 description 2
- 208000002193 Pain Diseases 0.000 description 1
- 238000003491 array Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004891 communication Methods 0.000 description 1
- 238000004590 computer program Methods 0.000 description 1
- 238000010276 construction Methods 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 239000002360 explosive Substances 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 230000036541 health Effects 0.000 description 1
- 230000000474 nursing effect Effects 0.000 description 1
- 230000008707 rearrangement Effects 0.000 description 1
- 238000005215 recombination Methods 0.000 description 1
- 230000006798 recombination Effects 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 230000009466 transformation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/253—Grammatical analysis; Style critique
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02D—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
- Y02D10/00—Energy efficient computing, e.g. low power processors, power management or thermal management
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Health & Medical Sciences (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Artificial Intelligence (AREA)
- Medical Informatics (AREA)
- Data Mining & Analysis (AREA)
- Public Health (AREA)
- Probability & Statistics with Applications (AREA)
- Databases & Information Systems (AREA)
- Pathology (AREA)
- Biomedical Technology (AREA)
- Epidemiology (AREA)
- Primary Health Care (AREA)
- Machine Translation (AREA)
Abstract
The embodiment of the invention discloses a correction method, a correction device, a correction server and a correction storage medium for a hospital diagnosis dictionary, wherein the correction method comprises the following steps: receiving word segmentation results of the diagnosis sentences; determining the part of speech of suspected word segmentation according to the word segmentation result; searching a sentence in which a suspected word is located, and adding a first modifier word of a fixed word to the suspected word in the sentence when the part of speech is a noun; when the part of speech of the suspected word is a verb, adding a second modifier word of an adverb to the suspected word in the sentence to obtain a supplementary sentence; and carrying out grammar detection on the supplementary sentences, adjusting the suspected word segmentation when the detection result is wrong, determining the part of speech of the suspected word segmentation according to the adjusted word segmentation, and returning to the sentence step of searching for the suspected word segmentation until the detection result is correct. The method can make up for the defect of inaccurate medical record word segmentation of the traditional word segmentation model, and can correct the error word segmentation result to obtain the correct word segmentation result.
Description
Technical Field
The present invention relates to the technical field of hospital diagnosis dictionaries, and in particular, to a method, an apparatus, a server, and a storage medium for correcting a hospital diagnosis dictionary.
Background
Along with the continuous deep construction of medical and health informatization, the digital transformation causes the explosive growth of medical data maintained by various medical information management systems, the current medical informatization work is fully developed in most hospitals and has slightly seen effects, but the medical data of all hospitals are independent of each other, are difficult to exchange and share among systems, and the phenomenon of data island is increasingly serious. Therefore, data sharing is needed to be realized through a comprehensive hospital medical system, so that a foundation is provided for intelligent diagnosis and treatment.
The medical text data is used as a main component of a comprehensive hospital medical system, can convert medical record data into corresponding structured data, is convenient for effective identification and extraction in the later period, and becomes an important resource for intelligent medical diagnosis assistance through data mining. The hospital diagnostic dictionary can provide standardized word composition, interpretation, and metadata description as an important component of medical text data. Thus, an accurate hospital diagnostic dictionary is an important component of medical text data.
The hospital diagnostic dictionary is constructed from various text data by word segmentation. In the process of realizing the invention, the inventor finds the following technical problems: the existing word segmentation model is generally used for word segmentation, then in medical record data, doctors often use natural language to write, and often use shorthand, in the case, the existing word segmentation model cannot accurately perform word segmentation, so that a large number of errors or redundant words appear in a hospital diagnosis dictionary.
Disclosure of Invention
The embodiment of the invention provides a correction method, a correction device, a correction server and a correction storage medium for a hospital diagnosis dictionary, which are used for solving the technical problem that the word segmentation accuracy of medical record data by adopting a word segmentation model in the prior art is low.
In a first aspect, an embodiment of the present invention provides a method for correcting a hospital diagnostic dictionary, including:
receiving word segmentation results of the diagnosis sentences;
determining the part of speech of suspected word segmentation according to the word segmentation result;
searching a sentence in which a suspected word is located, and adding a first modifier word of a fixed word to the suspected word in the sentence when the part of speech is a noun;
when the part of speech of the word is a verb, adding a second modifier word of the adverb to the suspected word in the sentence to obtain a supplementary sentence;
and carrying out grammar detection on the supplementary sentences, adjusting the suspected word segmentation when the detection result is wrong, determining the part of speech of the suspected word segmentation according to the adjusted word segmentation, and returning to the sentence step of searching for the suspected word segmentation until the detection result is correct.
In a second aspect, an embodiment of the present invention further provides a correction device for a hospital diagnosis dictionary, including:
the receiving module is used for receiving word segmentation results of the diagnosis sentences;
the determining module is used for determining the part of speech of the suspected word according to the word segmentation result;
the searching module is used for searching sentences in which suspected word segmentation is located, and adding a fixed-language first modifier word to the suspected word segmentation in the sentences when the part of speech is a noun;
the adding module is used for adding a second modifier word of the adverb to the suspected word in the sentence when the part of speech of the suspected word is a verb, so as to obtain a supplementary sentence;
the adjusting module is used for carrying out grammar detection on the supplementary sentences, adjusting the suspected word segmentation when the detection result is wrong, determining the part of speech of the suspected word segmentation according to the adjusted word segmentation, and returning to the sentence step of searching for the suspected word segmentation until the detection result is correct.
In a third aspect, an embodiment of the present invention further provides a server, including:
one or more processors;
storage means for storing one or more programs,
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the correction method for a hospital diagnostic dictionary as provided in the above embodiments.
In a fourth aspect, embodiments of the present invention also provide a storage medium containing computer executable instructions which, when executed by a computer processor, are used to perform a method of correcting a hospital diagnostic dictionary as provided by the above embodiments.
The correction method, the correction device, the correction server and the correction storage medium for the hospital diagnosis dictionary provided by the embodiment of the invention are used for receiving the word segmentation result of the diagnosis sentence; determining the part of speech of suspected word segmentation according to the word segmentation result; searching a sentence in which a suspected word is located, and adding a first modifier word of a fixed word to the suspected word in the sentence when the part of speech is a noun; when the part of speech of the word is a verb, adding a second modifier word of the adverb to the suspected word in the sentence to obtain a supplementary sentence; and carrying out grammar detection on the supplementary sentences, adjusting the suspected word segmentation when the detection result is wrong, determining the part of speech of the suspected word segmentation according to the adjusted word segmentation, and returning to the sentence step of searching for the suspected word segmentation until the detection result is correct. The method comprises the steps of adding modifier words or short sentences to a problematic word segmentation result to form long sentences with definite semantics, carrying out grammar detection on the long sentences, and determining that word segmentation is error-free when the grammar detection is passed, otherwise, adjusting the word segmentation result until the grammar detection requirement is met. By utilizing the mode, aiming at the characteristics of the diagnosis sentences, the diagnosis sentences are expanded, and whether the word segmentation is wrong or not is judged by utilizing the rationality of the expanded sentences. The method can make up for the defect of inaccurate medical record word segmentation of the traditional word segmentation model, and can correct the error word segmentation result to obtain the correct word segmentation result.
Drawings
Other features, objects and advantages of the present invention will become more apparent upon reading of the detailed description of non-limiting embodiments, made with reference to the accompanying drawings in which:
FIG. 1 is a flow chart of a correction method for a hospital diagnostic dictionary according to an embodiment of the present invention;
FIG. 2 is a flow chart of a correction method for a hospital diagnostic dictionary according to a second embodiment of the present invention;
FIG. 3 is a schematic diagram of a correction device for a hospital diagnostic dictionary according to a third embodiment of the present invention;
fig. 4 is a structural diagram of a server according to a fourth embodiment of the present invention.
Detailed Description
The invention is described in further detail below with reference to the drawings and examples. It is to be understood that the specific embodiments described herein are merely illustrative of the invention and are not limiting thereof. It should be further noted that, for convenience of description, only some, but not all of the structures related to the present invention are shown in the drawings.
Example 1
Fig. 1 is a flowchart of a correction method for a hospital diagnosis dictionary according to an embodiment of the present invention, where the embodiment is applicable to a case of correcting an erroneous word in the dictionary, and the method may be executed by a correction device for a hospital diagnosis dictionary and may be integrated in a hospital medical system server, and specifically includes the following steps:
step 110, a word segmentation result of the diagnostic statement is received.
In the medical field, medical staff write diagnostic reports, i.e., medical record text, in the form of natural text, with numerous examples of typical medical record text, such as diagnostic records, etc. The medical data information is used as an original record of the whole medical diagnosis and treatment process of a patient, comprises medical whole-process information such as a patient course record, an examination and examination result, a doctor's advice, an operation record, a nursing record and the like, is used as one of main data information sources in the medical field, and contains a large amount of valuable but underscored medical data information. In this embodiment, the existing dictionary-based chinese word segmentation method may be used to segment the diagnostic sentences in the diagnostic records in the medical record text first, so as to obtain the corresponding word segmentation result.
Step 120, determining the part of speech of the suspected word according to the word segmentation result.
In this embodiment, the suspected word segmentation may be a word segmentation result obtained by detecting that the word segmentation may be incorrect. Alternatively, word segmentation results can be obtained through a plurality of different word segmentation models, and the word segmentation with the difference is determined and used as the suspected word segmentation. For example: the suspected word segmentation can be determined through word segmentation results obtained by a Chinese word segmentation method and a forward matching method based on a dictionary. And determining the part of speech of each suspected word according to the word segmentation result. The parts of speech mainly divide into twelve major classes, which are nouns, verbs, adjectives, numerical words, graduated words, pronouns, adverbs, prepositions, conjunctions, auxiliary words, exclaments and personions respectively. Since nouns, verbs, adjectives, and a small number of adverbs are mainly involved in a diagnostic sentence, it is only necessary to determine whether or not a suspected word belongs to the three parts of speech.
Step 130, searching a sentence in which the suspected word is located, and adding a first modifier word of a fixed language to the suspected word in the sentence when the part of speech is a noun.
In diagnostic sentences, doctors often describe the condition with small phrases, in which case it is difficult to determine whether the word segmentation is accurate. Therefore, in this embodiment, semantic filling can be performed on the sentence in which it is located, that is, whether the word segmentation is accurate or not can be determined by adding a corresponding modifier word thereto. For example, when the part of speech of the suspected word is a noun, the first modifier word may be added as a definite word. Alternatively, the first modifier term may be an adjective. For example: abdominal distension and pain, the result of word segmentation is abdominal distension and pain, or abdominal distension and pain. The abdomen may be judged as a noun and a corresponding idiomatic adjective, such as the lower abdomen, etc., may be added in front.
And 140, adding a second modifier word of the adverb to the suspected word in the sentence when the part of speech of the suspected word is a verb, so as to obtain a supplementary sentence.
Accordingly, for suspected word segmentation with part of speech being a verb, a second modifier word may be added to the suspected word, for example, modifier words such as distension and pain, and severe modifier words may be added to the suspected word.
Optionally, a third modifier term may be added, where the third modifier term is used to further modify the first modifier term, and a fourth modifier term is added, where the fourth modifier term is used to further modify the second modifier term. In some cases, a grammar problem may not be detected by adding a modifier word alone, and on the basis of the grammar problem, grammar errors caused by word segmentation errors are more easily exposed by adding corresponding modifier words additionally. In the prior art, a modifier for modifying nouns or verbs is often added, but the inventor finds that the manner cannot quickly and accurately determine whether word segmentation errors exist, so that a third modifier and a fourth modifier are added respectively, and the first modifier and the second modifier are subjected to limited description respectively, so that the word segmentation errors can be better exposed. The original diagnosis statement is expanded in the mode to obtain the supplementary statement.
In addition, the positions of the first modifier word and the second modifier word are not limited, and may be in front of the suspected word or in back of the suspected word. The modifier words can be correspondingly selected from a pre-established dictionary.
And 150, carrying out grammar detection on the supplementary sentence, adjusting the suspected word when the detection result is wrong, determining the part of speech of the suspected word according to the adjusted word, and returning to the sentence step of searching for the suspected word until the detection result is correct.
The expanded supplementary sentence can be input into the existing natural language grammar detection model, and whether the suspected word is a correct word segmentation result is determined according to the grammar detection result output by the natural language grammar detection model. If the output detection result is that the grammar is correct, the suspected word segmentation is determined to be a correct word segmentation result. In contrast, when the output detection result is a grammar error, it may be determined that the suspected word is the wrong word segmentation result. After determining the wrong word segmentation result, returning the diagnosis sentence in which the suspected word segmentation is located, adjusting the diagnosis sentence, and exemplarily splitting the suspected word according to the minimum element, and then taking the split word as a new suspected word. Or the front and back words are combined back and forth according to the minimum split elements to generate new suspected word segmentation. And expanding the new suspected word according to the mode, and then carrying out grammar detection again. Until the grammar detection result is correct.
The embodiment receives the word segmentation result of the diagnosis statement; determining the part of speech of suspected word segmentation according to the word segmentation result; searching a sentence in which a suspected word is located, and adding a first modifier word of a fixed word to the suspected word in the sentence when the part of speech is a noun; when the part of speech of the word is a verb, adding a second modifier word of the adverb to the suspected word in the sentence to obtain a supplementary sentence; and carrying out grammar detection on the supplementary sentences, adjusting the suspected word segmentation when the detection result is wrong, determining the part of speech of the suspected word segmentation according to the adjusted word segmentation, and returning to the sentence step of searching for the suspected word segmentation until the detection result is correct. The method comprises the steps of adding modifier words or short sentences to a problematic word segmentation result to form long sentences with definite semantics, carrying out grammar detection on the long sentences, and determining that word segmentation is error-free when the grammar detection is passed, otherwise, adjusting the word segmentation result until the grammar detection requirement is met. By utilizing the mode, aiming at the characteristics of the diagnosis sentences, the diagnosis sentences are expanded, and whether the word segmentation is wrong or not is judged by utilizing the rationality of the expanded sentences. The method can make up for the defect of inaccurate medical record word segmentation of the traditional word segmentation model, and can correct the error word segmentation result to obtain the correct word segmentation result.
Example two
Fig. 2 is a flow chart of a correction method for a hospital diagnosis dictionary according to a second embodiment of the present invention, wherein the correction method is optimized based on the above embodiment, and specifically the method may further include the following steps: sorting phrases of diagnosis schemes where the word segmentation in the dictionary of different sources is located according to grammar rules to obtain a first diagnosis scheme and a second diagnosis scheme; expanding the diagnostic sentences in which the approximate words in the first diagnostic scheme and the second diagnostic scheme are positioned, and adding modifier; determining the same keywords from the expanded first diagnosis scheme and the expanded second diagnosis scheme, and respectively creating a first vector matrix and a second vector matrix for the first approximate word, the second approximate word and the modifier based on the keywords; the same labeling is performed in the dictionary according to the approximation degree of the first vector matrix and the second vector matrix.
Referring to fig. 2, the correction method of the hospital diagnosis dictionary includes:
step 210, receiving word segmentation results of the diagnosis sentences, and determining the part of speech of suspected word segmentation according to the word segmentation results.
Step 220, searching a sentence in which a suspected word is located, and adding a fixed-language first modifier word to the suspected word in the sentence when the part of speech is a noun; when the part of speech of the suspected word is a verb, adding a second modifier word of the adverb to the suspected word in the sentence to obtain a supplementary sentence.
And 230, carrying out grammar detection on the supplementary sentence, adjusting the suspected word when the detection result is wrong, determining the part of speech of the suspected word according to the adjusted word, and returning to the sentence step of searching for the suspected word until the detection result is correct.
And 240, sorting phrases of the diagnosis schemes where the word segmentation is located in the dictionaries with different sources according to grammar rules to obtain a first diagnosis scheme and a second diagnosis scheme.
After the correct word segmentation is determined, the same semantics can be divided into different word expressions in different hospital systems due to different language habits of different hospitals, and the different words can cause confusion of dictionaries in hospital medical systems, so that the words are required to be arranged and unified, or corresponding marks are carried out in corresponding metadata of the words, and the words are ensured to be consistent in the system. In this embodiment, the word segments that may have the same semantics may be processed, and whether the semantics of the word segments are consistent is determined according to the processing result.
In this embodiment, a diagnosis scheme in which the word segments that may have the same semantics are located may be determined first, and the phrases in the diagnosis scheme are rearranged and combined. Illustratively, general medical papers may be reordered in terms of their statement sequence, from the exterior to the interior observations, and from the order of the factors primarily to this point.
And step 250, expanding the diagnosis sentences in which the approximate words in the first diagnosis scheme and the second diagnosis scheme are positioned, and adding modifier.
For example, the diagnostic statement in which the approximate word is located may be expanded in the manner provided in the foregoing embodiment, and the corresponding modifier word may be added to the diagnostic statement according to the part of speech. And filling the approximate words in the second diagnostic regimen with the same modifier words.
Step 260, determining the same keywords from the expanded first and second diagnostic schemes, and creating a first vector matrix and a second vector matrix for the first approximation word, the second approximation word, and the modifier, respectively, based on the keywords.
The keyword may be the same word used in the first diagnostic scheme and the second diagnostic scheme to qualify the diagnosis thereof. Alternatively, the corresponding documents, textbooks, papers and other authoritative documents can be searched according to the first diagnosis scheme and the second diagnosis scheme, and qualitative words are extracted from similar paragraphs in the authoritative documents and are also present in the first diagnosis scheme and the second diagnosis scheme.
After determining the keywords, a first vector matrix and a second vector matrix may be created, respectively, the first vector matrix being used for creation from relationships between the keywords, the added modifiers, and the approximations in the first diagnostic scheme. For example, each row of the matrix corresponds to an approximate word or an added modifier, and each column can correspond to whether the two representations of the two values of the word co-occur, can be the number of co-occurrences which are not processed, can be the co-occurrence tf-idf value which is processed, and can also be the distance between the two values and the keyword. The first vector matrix and the second vector matrix are created separately in the manner described above.
Step 270, marking the same in the dictionary according to the approximation degree of the first vector matrix and the second vector matrix.
Alternatively, the pearson correlation coefficient may be used as a metric to quantify the approximation degree of the first vector matrix and the second vector matrix, and when the approximation degree of the first vector matrix and the second vector matrix is smaller than a preset difference threshold, the first vector matrix and the second vector matrix may be considered to have the same semantics, and may be marked as the same terms in a hospital diagnosis dictionary, otherwise, the semantics of the first vector matrix and the second vector matrix may be considered to be different.
The embodiment adds the following steps: sorting phrases of diagnosis schemes where the word segmentation in the dictionary of different sources is located according to grammar rules to obtain a first diagnosis scheme and a second diagnosis scheme; expanding the diagnostic sentences in which the approximate words in the first diagnostic scheme and the second diagnostic scheme are positioned, and adding modifier; determining the same keywords from the expanded first diagnosis scheme and the expanded second diagnosis scheme, and respectively creating a first vector matrix and a second vector matrix for the first approximate word, the second approximate word and the modifier based on the keywords; the same labeling is performed in the dictionary according to the approximation degree of the first vector matrix and the second vector matrix. Different words with the same semantics in the dictionary can be combined and arranged, so that the later digital processing is facilitated, and the artificial intelligent auxiliary diagnosis is realized.
Example III
Fig. 3 is a schematic structural diagram of a device for correcting a hospital diagnosis dictionary according to a third embodiment of the present invention, as shown in fig. 3, the device includes:
a receiving module 310, configured to receive a word segmentation result of the diagnostic statement;
a determining module 320, configured to determine a part of speech of the suspected word according to the word segmentation result;
the searching module 330 is configured to search a sentence in which a suspected word is located, and add a first modifier word of a fixed language to the suspected word in the sentence when the part of speech is a noun;
an adding module 340, configured to add a second modifier word to the suspected word in the sentence when the part of speech of the suspected word is a verb, so as to obtain a supplementary sentence;
the adjustment module 350 is configured to perform grammar detection on the supplementary sentence, adjust the suspected word when the detection result is incorrect, determine the part of speech of the suspected word according to the adjusted word, and return to the sentence step of searching for the suspected word until the detection result is correct.
The correction device of the hospital diagnosis dictionary provided by the embodiment receives the word segmentation result of the diagnosis sentence; determining the part of speech of suspected word segmentation according to the word segmentation result; searching a sentence in which a suspected word is located, and adding a first modifier word of a fixed word to the suspected word in the sentence when the part of speech is a noun; when the part of speech of the word is a verb, adding a second modifier word of the adverb to the suspected word in the sentence to obtain a supplementary sentence; and carrying out grammar detection on the supplementary sentences, adjusting the suspected word segmentation when the detection result is wrong, determining the part of speech of the suspected word segmentation according to the adjusted word segmentation, and returning to the sentence step of searching for the suspected word segmentation until the detection result is correct. The method comprises the steps of adding modifier words or short sentences to a problematic word segmentation result to form long sentences with definite semantics, carrying out grammar detection on the long sentences, and determining that word segmentation is error-free when the grammar detection is passed, otherwise, adjusting the word segmentation result until the grammar detection requirement is met. By utilizing the mode, aiming at the characteristics of the diagnosis sentences, the diagnosis sentences are expanded, and whether the word segmentation is wrong or not is judged by utilizing the rationality of the expanded sentences. The method can make up for the defect of inaccurate medical record word segmentation of the traditional word segmentation model, and can correct the error word segmentation result to obtain the correct word segmentation result.
On the basis of the above embodiments, the device further includes:
a third modifier term addition module configured to add a third modifier term, where the third modifier term is used to further modify the first modifier term;
a fourth modifier term addition module configured to add a fourth modifier term, where the fourth modifier term is used to further modify the second modifier term.
On the basis of the above embodiments, the device further includes:
the suspected word segmentation determining module is used for obtaining word segmentation results by utilizing at least two word segmentation models and determining the word segmentation with the word segmentation difference as the suspected word segmentation.
On the basis of the above embodiments, the correction module includes:
and the splitting unit is used for splitting the suspected word according to the minimum element.
On the basis of the above embodiments, the correction module further includes:
and the recombination unit is used for recombining the minimum element of the suspected word with the front and rear words when the word obtained by splitting the suspected word according to the minimum element is still incorrect.
On the basis of the above embodiments, the device further includes:
the ordering module is used for ordering the short sentences of the diagnosis schemes where the word segmentation is positioned in the dictionary with different sources according to grammar rules;
the filling module is used for sequencing short sentences of the diagnosis schemes where the word segmentation is located in the dictionaries with different sources according to grammar rules to obtain a first diagnosis scheme and a second diagnosis scheme;
the expansion module is used for expanding the diagnosis sentences in which the approximate words in the first diagnosis scheme and the second diagnosis scheme are positioned and adding modifier;
the creation module is used for determining the same keywords from the expanded first diagnosis scheme and the expanded second diagnosis scheme, and creating a first vector matrix and a second vector matrix for the first approximate word, the second approximate word and the modifier respectively based on the keywords;
and the marking module is used for marking the same in the dictionary according to the approximation degree of the first vector matrix and the second vector matrix.
On the basis of the above embodiments, the creating module includes:
a selecting unit for selecting a document fragment describing the approximate content;
and the extraction unit is used for extracting qualitative words from the document fragments as keywords.
The correction device of the hospital diagnosis dictionary provided by the embodiment of the invention can execute the correction method of the hospital diagnosis dictionary provided by any embodiment of the invention, and has the corresponding functional modules and beneficial effects of the execution method.
Example IV
Fig. 4 is a schematic structural diagram of a server according to a fourth embodiment of the present invention. Fig. 4 illustrates a block diagram of an exemplary server 12 suitable for use in implementing embodiments of the present invention. The server 12 shown in fig. 4 is merely an example, and should not be construed as limiting the functionality and scope of use of embodiments of the present invention.
As shown in fig. 4, the server 12 is in the form of a general purpose computing terminal. The components of server 12 may include, but are not limited to: one or more processors or processing units 16, a system memory 28, a bus 18 that connects the various system components, including the system memory 28 and the processing units 16.
Bus 18 represents one or more of several types of bus structures, including a memory bus or memory controller, a peripheral bus, an accelerated graphics port, a processor, and a local bus using any of a variety of bus architectures. By way of example, and not limitation, such architectures include Industry Standard Architecture (ISA) bus, micro channel architecture (MAC) bus, enhanced ISA bus, video Electronics Standards Association (VESA) local bus, and Peripheral Component Interconnect (PCI) bus.
Server 12 typically includes a variety of computer system readable media. Such media can be any available media that is accessible by server 12 and includes both volatile and nonvolatile media, removable and non-removable media.
The system memory 28 may include computer system readable media in the form of volatile memory, such as Random Access Memory (RAM) 30 and/or cache memory 32. The server 12 may further include other removable/non-removable, volatile/nonvolatile computer system storage media. By way of example only, storage system 34 may be used to read from or write to non-removable, nonvolatile magnetic media (not shown in FIG. 4, commonly referred to as a "hard disk drive"). Although not shown in fig. 4, a magnetic disk drive for reading from and writing to a removable non-volatile magnetic disk (e.g., a "floppy disk"), and an optical disk drive for reading from or writing to a removable non-volatile optical disk (e.g., a CD-ROM, DVD-ROM, or other optical media) may be provided. In such cases, each drive may be coupled to bus 18 through one or more data medium interfaces. The system memory 28 may include at least one program product having a set (e.g., at least one) of program modules configured to carry out the functions of the embodiments of the invention.
A program/utility 40 having a set (at least one) of program modules 42 may be stored in, for example, system memory 28, such program modules 42 including, but not limited to, an operating system, one or more application programs, other program modules, and program data, each or some combination of which may include an implementation of a network environment. Program modules 42 generally perform the functions and/or methods of the embodiments described herein.
The server 12 may also communicate with one or more external terminals 14 (e.g., keyboard, pointing terminal, display 24, etc.), with one or more terminals that enable a user to interact with the server 12, and/or with any terminals (e.g., network card, modem, etc.) that enable the server 12 to communicate with one or more other computing terminals. Such communication may occur through an input/output (I/O) interface 22. Also, the server 12 may communicate with one or more networks such as a Local Area Network (LAN), a Wide Area Network (WAN) and/or a public network, such as the Internet, via a network adapter 20. As shown, network adapter 20 communicates with the other modules of server 12 via bus 18. It should be appreciated that although not shown, other hardware and/or software modules may be used in connection with server 12, including, but not limited to: microcode, terminal drives, redundant processing units, external disk drive arrays, RAID systems, tape drives, data backup storage systems, and the like.
The processing unit 16 executes various functional applications and data processing by running programs stored in the system memory 28, for example, implementing a correction method of a hospital diagnostic dictionary provided by an embodiment of the present invention.
Example five
A fifth embodiment of the present invention also provides a storage medium containing computer-executable instructions, which when executed by a computer processor, are configured to perform a correction method of a hospital diagnostic dictionary as provided in any of the above embodiments.
The computer storage media of embodiments of the invention may take the form of any combination of one or more computer-readable media. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples (a non-exhaustive list) of the computer-readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, either in baseband or as part of a carrier wave. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.
Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Computer program code for carrying out operations of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, smalltalk, C ++ and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may be connected to the user's computer through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computer (for example, through the Internet using an Internet service provider).
Note that the above is only a preferred embodiment of the present invention and the technical principle applied. It will be understood by those skilled in the art that the present invention is not limited to the particular embodiments described herein, but is capable of various obvious changes, rearrangements and substitutions as will now become apparent to those skilled in the art without departing from the scope of the invention. Therefore, while the invention has been described in connection with the above embodiments, the invention is not limited to the embodiments, but may be embodied in many other equivalent forms without departing from the spirit or scope of the invention, which is set forth in the following claims.
Claims (9)
1. A method of calibrating a hospital diagnostic dictionary, comprising:
receiving word segmentation results of the diagnosis sentences;
determining the part of speech of suspected word segmentation according to the word segmentation result;
searching a sentence in which a suspected word is located, and adding a first modifier word of a fixed word to the suspected word in the sentence when the part of speech is a noun;
when the part of speech of the suspected word is a verb, adding a second modifier word of an adverb to the suspected word in the sentence to obtain a supplementary sentence;
carrying out grammar detection on the supplementary sentences, adjusting the suspected word segmentation when the detection result is wrong, determining the part of speech of the suspected word segmentation according to the adjusted word segmentation, and returning to the sentence step of searching for the suspected word segmentation until the detection result is correct;
sorting phrases of diagnosis schemes where the word segmentation in the dictionary of different sources is located according to grammar rules to obtain a first diagnosis scheme and a second diagnosis scheme;
expanding the diagnostic sentences in which the approximate words in the first diagnostic scheme and the second diagnostic scheme are positioned, and adding modifier;
determining the same keywords from the expanded first diagnosis scheme and the expanded second diagnosis scheme, and respectively creating a first vector matrix and a second vector matrix for the first approximate word, the second approximate word and the modifier based on the keywords;
the same labeling is performed in the dictionary according to the approximation degree of the first vector matrix and the second vector matrix.
2. The method according to claim 1, wherein the method further comprises:
adding a third modifier term used for further modifying the first modifier term;
a fourth modifier term is added and is used for further modifying the second modifier term.
3. The method according to claim 1, wherein the method further comprises:
and obtaining a word segmentation result by utilizing at least two word segmentation models, and determining the word segmentation with the word segmentation difference as a suspected word segmentation.
4. The method of claim 1, wherein the correcting the suspected word segment comprises:
splitting the suspected word according to the minimum element.
5. The method of claim 4, wherein the correcting the suspected word segment further comprises:
and when the suspected word segmentation is still incorrect after the word segmentation is obtained after the suspected word segmentation is split according to the minimum element, recombining the minimum element of the suspected word segmentation with the front and rear characters.
6. The method of claim 1, wherein determining the same keyword from the augmented first diagnostic solution and the second diagnostic solution comprises:
selecting a document fragment describing approximate content;
qualitative terms are extracted from the document snippets as keywords.
7. A correction device for a hospital diagnostic dictionary, comprising:
the receiving module is used for receiving word segmentation results of the diagnosis sentences;
the determining module is used for determining the part of speech of the suspected word according to the word segmentation result;
the searching module is used for searching sentences in which suspected word segmentation is located, and adding a fixed-language first modifier word to the suspected word segmentation in the sentences when the part of speech is a noun;
the adding module is used for adding a second modifier word of the adverb to the suspected word in the sentence when the part of speech of the suspected word is a verb, so as to obtain a supplementary sentence;
the adjusting module is used for carrying out grammar detection on the supplementary sentences, adjusting the suspected word segmentation when the detection result is wrong, determining the part of speech of the suspected word segmentation according to the adjusted word segmentation, and returning to the sentence step of searching for the suspected word segmentation until the detection result is correct;
the ordering module is used for ordering the short sentences of the diagnosis schemes where the word segmentation is positioned in the dictionary with different sources according to grammar rules;
the filling module is used for sequencing short sentences of the diagnosis schemes where the word segmentation is located in the dictionaries with different sources according to grammar rules to obtain a first diagnosis scheme and a second diagnosis scheme;
the expansion module is used for expanding the diagnosis sentences in which the approximate words in the first diagnosis scheme and the second diagnosis scheme are positioned and adding modifier;
the creation module is used for determining the same keywords from the expanded first diagnosis scheme and the expanded second diagnosis scheme, and creating a first vector matrix and a second vector matrix for the first approximate word, the second approximate word and the modifier respectively based on the keywords;
and the marking module is used for marking the same in the dictionary according to the approximation degree of the first vector matrix and the second vector matrix.
8. A server, the server comprising:
one or more processors;
storage means for storing one or more programs,
when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the method of correction of a hospital diagnostic dictionary as claimed in any one of claims 1-6.
9. A storage medium containing computer executable instructions which, when executed by a computer processor, are for performing a method of correction of a hospital diagnostic dictionary according to any of claims 1-6.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311264209.4A CN117034911B (en) | 2023-09-28 | 2023-09-28 | Correction method and device for hospital diagnosis dictionary, server and storage medium |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311264209.4A CN117034911B (en) | 2023-09-28 | 2023-09-28 | Correction method and device for hospital diagnosis dictionary, server and storage medium |
Publications (2)
Publication Number | Publication Date |
---|---|
CN117034911A CN117034911A (en) | 2023-11-10 |
CN117034911B true CN117034911B (en) | 2023-12-22 |
Family
ID=88637665
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311264209.4A Active CN117034911B (en) | 2023-09-28 | 2023-09-28 | Correction method and device for hospital diagnosis dictionary, server and storage medium |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117034911B (en) |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0554064A (en) * | 1991-08-29 | 1993-03-05 | Nec Corp | Participial modification relation analyzer |
CN108959250A (en) * | 2018-06-27 | 2018-12-07 | 众安信息技术服务有限公司 | A kind of error correction method and its system based on language model and word feature |
CN109344406A (en) * | 2018-09-30 | 2019-02-15 | 阿里巴巴集团控股有限公司 | Part-of-speech tagging method, apparatus and electronic equipment |
CN109697286A (en) * | 2018-12-18 | 2019-04-30 | 众安信息技术服务有限公司 | A kind of diagnostic standardization method and device based on term vector |
CN112016304A (en) * | 2020-09-03 | 2020-12-01 | 平安科技(深圳)有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN113673238A (en) * | 2021-10-25 | 2021-11-19 | 杭州费尔斯通科技有限公司 | Word segmentation correction method and system based on hypernym, electronic device and storage medium |
CN115310442A (en) * | 2022-08-03 | 2022-11-08 | 湖南中医药大学 | Traditional Chinese medicine ancient book word segmentation method and device, computer equipment and storage medium |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US10553308B2 (en) * | 2017-12-28 | 2020-02-04 | International Business Machines Corporation | Identifying medically relevant phrases from a patient's electronic medical records |
-
2023
- 2023-09-28 CN CN202311264209.4A patent/CN117034911B/en active Active
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPH0554064A (en) * | 1991-08-29 | 1993-03-05 | Nec Corp | Participial modification relation analyzer |
CN108959250A (en) * | 2018-06-27 | 2018-12-07 | 众安信息技术服务有限公司 | A kind of error correction method and its system based on language model and word feature |
CN109344406A (en) * | 2018-09-30 | 2019-02-15 | 阿里巴巴集团控股有限公司 | Part-of-speech tagging method, apparatus and electronic equipment |
CN109697286A (en) * | 2018-12-18 | 2019-04-30 | 众安信息技术服务有限公司 | A kind of diagnostic standardization method and device based on term vector |
CN112016304A (en) * | 2020-09-03 | 2020-12-01 | 平安科技(深圳)有限公司 | Text error correction method and device, electronic equipment and storage medium |
CN113673238A (en) * | 2021-10-25 | 2021-11-19 | 杭州费尔斯通科技有限公司 | Word segmentation correction method and system based on hypernym, electronic device and storage medium |
CN115310442A (en) * | 2022-08-03 | 2022-11-08 | 湖南中医药大学 | Traditional Chinese medicine ancient book word segmentation method and device, computer equipment and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN117034911A (en) | 2023-11-10 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Gupta et al. | Abstractive summarization: An overview of the state of the art | |
Chen et al. | Automatic ICD-10 coding algorithm using an improved longest common subsequence based on semantic similarity | |
JP5362353B2 (en) | Handle collocation errors in documents | |
US20090106018A1 (en) | Word translation device, translation method, and computer readable medium | |
US20070073532A1 (en) | Writing assistance using machine translation techniques | |
KR101500617B1 (en) | Method and system for Context-sensitive Spelling Correction Rules using Korean WordNet | |
WO2011022109A1 (en) | Structured data translation apparatus, system and method | |
Ehsan et al. | Grammatical and context‐sensitive error correction using a statistical machine translation framework | |
CN111597800B (en) | Method, device, equipment and storage medium for obtaining synonyms | |
Scherrer et al. | Modernising historical Slovene words | |
KR20150007647A (en) | Method and system for statistical context-sensitive spelling correction using confusion set | |
US20110046940A1 (en) | Machine translation device, machine translation method, and program | |
Vilares et al. | Studying the effect and treatment of misspelled queries in Cross-Language Information Retrieval | |
CN111950301A (en) | English translation quality analysis method and system for Chinese translation and English translation | |
Chakrawarti et al. | Machine translation model for effective translation of Hindi poetries into English | |
Kajzer-Wietrzny et al. | ‘Lost’in interpreting and ‘found’in translation: using an intermodal, multidirectional parallel corpus to investigate the rendition of numbers | |
CN112559711B (en) | Synonymous text prompting method and device and electronic equipment | |
Ou et al. | Automatic negation detection in narrative pathology reports | |
CN117034911B (en) | Correction method and device for hospital diagnosis dictionary, server and storage medium | |
WO2014169857A1 (en) | Data processing device, data processing method and electronic equipment | |
Melero et al. | Holaaa!! writin like u talk is kewl but kinda hard 4 NLP | |
Xiong et al. | Linguistically Motivated Statistical Machine Translation | |
WO2022180989A1 (en) | Model generation device and model generation method | |
Johnson et al. | Corpus refactoring: a feasibility study | |
Sharma et al. | Visual clue: an approach to predict and highlight next character |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |