CN112818085B - Value range data matching method and device, storage medium and electronic equipment - Google Patents

Value range data matching method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN112818085B
CN112818085B CN202110120997.4A CN202110120997A CN112818085B CN 112818085 B CN112818085 B CN 112818085B CN 202110120997 A CN202110120997 A CN 202110120997A CN 112818085 B CN112818085 B CN 112818085B
Authority
CN
China
Prior art keywords
disease
disease name
matched
name
candidate
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110120997.4A
Other languages
Chinese (zh)
Other versions
CN112818085A (en
Inventor
冯仓龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN202110120997.4A priority Critical patent/CN112818085B/en
Publication of CN112818085A publication Critical patent/CN112818085A/en
Application granted granted Critical
Publication of CN112818085B publication Critical patent/CN112818085B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H70/00ICT specially adapted for the handling or processing of medical references

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Epidemiology (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Public Health (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The disclosure relates to a value range data matching method, a device, a storage medium and electronic equipment, so as to perform more accurate value range matching on disease names in medical data. The value range data matching method comprises the following steps: acquiring a disease name to be matched from medical data; determining a disease category to which the disease name to be matched belongs, and determining a first candidate disease name corresponding to the disease name to be matched according to a standard disease name included in the disease category; inputting the disease name to be matched into a semantic similarity model to obtain a second candidate disease name corresponding to the disease name to be matched, wherein the semantic similarity model is obtained by training according to part-of-speech features and syntactic features of the sample disease name; and determining a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name.

Description

Value range data matching method and device, storage medium and electronic equipment
Technical Field
The disclosure relates to the technical field of data processing, and in particular relates to a value range data matching method, a device, a storage medium and electronic equipment.
Background
In the medical data field, the value range data refers to a medical data set with a mapping relation, and the value range data comprises small value range data and large value range data. The small value range data refers to value range data with fewer categories and simple data organization, such as medical insurance category, patient gender and the like. The large value range data refers to value range data with various kinds and complex data organization, such as operation names, disease names and the like.
Because of different data structures of different medical information systems, different data expressions of the same data by users of the medical information systems, and the like, if unified analysis and management are to be performed on the medical data, standardized operations are required to be performed on the medical data. The standardization operation of the value range data is mainly carried out in a mode of value range matching such as fuzzy query, word segmentation comparison and the like.
However, for the value range data with fewer characters, the value range matching mode in the related technology is difficult to realize a better data standardization effect. For example, the disease name is mostly composed of short text or noun phrase, and has the characteristics of strong professional expressivity, lack of context information, less information for disassembly and the like, so that the data standardization is performed on the disease name in a value range matching mode in the related technology, and a better data standardization effect is difficult to obtain, thereby influencing the subsequent unified analysis and management.
Disclosure of Invention
The disclosure aims to provide a value range data matching method, a device, a storage medium and electronic equipment, so as to perform more accurate value range matching on disease names in medical data.
To achieve the above object, in a first aspect, the present disclosure provides a value range data matching method, the method including:
acquiring a disease name to be matched from medical data;
determining a disease category to which the disease name to be matched belongs, and determining a first candidate disease name corresponding to the disease name to be matched according to a standard disease name included in the disease category;
Inputting the disease name to be matched into a semantic similarity model to obtain a second candidate disease name corresponding to the disease name to be matched, wherein the semantic similarity model is obtained by training according to part-of-speech features and syntactic features of the sample disease name;
And determining a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name.
Optionally, the method further comprises:
Classifying each disease name in the international disease classification table based on the disease incidence part and pathology to obtain a disease classification table;
the determining the disease category to which the disease name to be matched belongs comprises the following steps:
And searching in the disease classification table based on the to-be-matched disease name to determine the disease category to which the to-be-matched disease name belongs.
Optionally, the classifying each standard disease name in the international disease classification table based on the disease onset part and pathology to obtain a disease classification table includes:
Classifying each disease name in the international disease classification table based on the pathology of the disease to obtain a first disease classification table, and classifying each disease name in the international disease classification table based on the disease incidence part of the disease to obtain a second disease classification table;
the determining the disease category to which the disease name to be matched belongs comprises the following steps:
Searching in the first disease classification table based on the disease name to be matched;
If the disease category to which the disease name to be matched belongs is not found in the first disease classification table, searching in the second disease classification table based on the disease name to be matched, and determining the disease category to which the disease name to be matched belongs according to the disease category found in the second disease classification table.
Optionally, the method further comprises:
If the disease category to which the disease name to be matched belongs is not found in the second disease classification table, determining the disease category to which the disease name to be matched belongs in a preset disease classification table, wherein the number of the disease names included in each disease category in the preset disease classification table is greater than the number of the disease names included in each disease category in the second disease classification table.
Optionally, the determining the disease category to which the disease name to be matched belongs includes:
determining at least one disease name corresponding to the disease name to be matched;
and determining the disease category to which the disease name to be matched belongs according to at least one disease name corresponding to the disease name to be matched.
Optionally, the determining, according to the first candidate disease name and the second candidate disease name, a value range matching result corresponding to the disease name to be matched includes:
Acquiring diagnosis department information and/or patient sex information from the medical data corresponding to the disease names to be matched, and screening the first candidate disease names according to the diagnosis department information and/or the patient sex information to obtain target candidate disease names;
And determining a value range matching result corresponding to the disease name to be matched according to the target candidate disease name and the second candidate disease name.
Optionally, the determining, according to the first candidate disease name and the second candidate disease name, a value range matching result corresponding to the disease name to be matched includes:
If the second candidate disease name has the same disease name as the first candidate disease name and the semantic similarity between the second candidate disease name and the disease name to be matched exceeds the preset semantic similarity, determining the disease name as a value range matching result corresponding to the disease name to be matched;
And if the second candidate disease name does not have the disease name which is the same as the first candidate disease name and has the semantic similarity exceeding the preset semantic similarity with the disease name to be matched, carrying out fuzzy matching on the second candidate disease name and the first candidate disease name so as to determine a value domain matching result corresponding to the disease name to be matched.
In a second aspect, the present disclosure also provides a value range data matching device, the device comprising:
the acquisition module is used for acquiring the disease name to be matched from the medical data;
a first determining module, configured to determine a disease category to which the disease name to be matched belongs, and determine a first candidate disease name corresponding to the disease name to be matched according to a standard disease name included in the disease category;
The second determining module is used for inputting the disease name to be matched into a semantic similarity model to obtain a second candidate disease name corresponding to the disease name to be matched, and the semantic similarity model is obtained by training according to part-of-speech features and syntactic features of the sample disease name;
And a third determining module, configured to determine a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name.
In a third aspect, the present disclosure also provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the steps of the method of any of the first aspects.
In a fourth aspect, the present disclosure also provides an electronic device, including:
a memory having a computer program stored thereon;
A processor for executing the computer program in the memory to implement the steps of the method of any of the first aspects.
Through the technical scheme, on one hand, the disease category to which the disease name to be matched belongs can be preferentially determined when the value ranges are matched, so that candidate errors are reduced. On the other hand, the matching can be performed through a semantic similarity model, the semantic similarity model is obtained through training of part-of-speech features and syntax features of sample disease names, so that part-of-speech information and syntax information of the disease names to be matched can be fully utilized in the value range matching process, the value range matching accuracy is further improved, and more accurate data standardization operation is achieved.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:
FIG. 1 is an implementation scenario diagram illustrating a value range data matching method according to an exemplary embodiment of the present disclosure;
FIG. 2 is a flow chart illustrating a value range data matching method according to an exemplary embodiment of the present disclosure;
FIG. 3 is a flow chart illustrating a value range data matching method according to another exemplary embodiment of the present disclosure;
FIG. 4 is a block diagram of a value range data matching device, according to an exemplary embodiment of the present disclosure;
fig. 5 is a block diagram of an electronic device, according to an exemplary embodiment of the present disclosure.
Detailed Description
Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the disclosure, are not intended to limit the disclosure.
As the background technology, the value field data of the disease name in the medical data standardization operation process is mostly composed of short text or noun phrases, and the method has the characteristics of strong professional expressivity, lack of context information, less information for disassembly and the like, so that the data standardization operation is performed on the disease name in a fuzzy matching, word segmentation comparison and other value field matching mode in the related technology, and a better data standardization effect is difficult to realize, thereby influencing the subsequent unified analysis and management.
In view of this, the present disclosure provides a value range data matching method, device, storage medium and electronic apparatus, so as to perform more accurate value range matching on disease names in medical data, thereby implementing more accurate data standardization operation.
First, possible implementation scenarios of the present disclosure are explained. For example, referring to fig. 1, the implementation scenario may include a plurality of medical information systems (illustrated in fig. 1 as medical information systems 1 through N) provided in different medical institutions, and a medical information unified management platform that may communicate with the plurality of medical information systems. The medical information system can execute the value range data matching method provided by the disclosure so as to perform data standardization operation on disease names in medical data stored by the medical information system. Or the unified medical information management platform can execute the value domain data matching method provided by the disclosure so as to perform data standardization operation on the disease names in the medical data reported by each medical information system.
Fig. 2 is a flow chart illustrating a value range data matching method according to an exemplary embodiment of the present disclosure. Referring to fig. 2, the value range data matching method includes:
Step 201, obtain the name of the disease to be matched from the medical data.
Step 202, determining a disease category to which the disease name to be matched belongs, and determining a first candidate disease name corresponding to the disease name to be matched according to the standard disease name included in the disease category.
Step 203, inputting the disease name to be matched into the semantic similarity model to obtain a second candidate disease name corresponding to the disease name to be matched. The semantic similarity model is trained according to part-of-speech features and syntactic features of the sample disease name.
Step 204, determining a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name.
By the method, on one hand, the disease category to which the disease name to be matched belongs can be preferentially determined when the value ranges are matched, so that candidate errors are reduced. On the other hand, the matching can be performed through a semantic similarity model, the semantic similarity model is obtained through training of part-of-speech features and syntax features of sample disease names, so that part-of-speech information and syntax information of the disease names to be matched can be fully utilized in the value range matching process, the value range matching accuracy is further improved, and more accurate data standardization operation is achieved.
In order to enable those skilled in the art to better understand the value range data matching method provided in the present disclosure, the foregoing steps are illustrated in detail below.
The medical data in step 201 may be medical data stored in a medical information system inside each medical institution, i.e. medical data may be acquired from the medical information system, so that the acquired medical data is subjected to data extraction to obtain the name of the disease to be matched. Or the medical data can be the medical data reported by each medical information system to the medical information unified management platform, namely the medical data can be acquired from the medical information unified management platform, so that the acquired medical data is subjected to data extraction to obtain the name of the disease to be matched.
After the name to be matched is acquired, the disease category to which the name of the disease to be matched belongs may be determined. In one possible approach, the names of the diseases in the international disease classification table may be classified based on the location of the disease and the pathology to obtain the disease classification table. Accordingly, determining the disease category to which the disease name to be matched belongs may be: and searching in the disease classification table based on the disease name to be matched so as to determine the disease category to which the disease name to be matched belongs. Thus, large classes can be preferentially matched in the subsequent matching process to reduce candidate errors. And compared with the mode of searching and comparing in the international disease classification table one by one, the searching efficiency can be improved, and the value range matching efficiency is further improved.
The international disease classification table may be an ICD-10 international disease classification table, for example, or may be other international disease classification tables, as embodiments of the present disclosure are not limited in this regard. Taking ICD-10 International disease Classification as an example, each standard disease name corresponds to a unique 6-bit code, for example, the 6-bit code corresponding to tuberculosis is A16.202. At least one identical code value exists in the 6 codes corresponding to the same disease incidence or the same pathological disease, for example, the first 2 code values in the 6 codes corresponding to intestinal infectious diseases such as bacillary dysentery, other bacterial intestinal infections and the like are all A0. The classification of the standard disease names in the international disease classification table based on the disease onset and pathology may therefore be: standard disease names with the same coding values at preset coding positions in the international disease classification table are classified into one type. Each disease category in the categorized disease category table includes at least one standard disease name, such that a first candidate disease name may then be determined based on the standard disease name included in the corresponding disease category.
For example, classifying each disease name in the international classification table based on the disease incidence part and pathology can obtain a disease classification table, and subsequently, the disease name to be matched can be used as an index, and the disease classification table is searched for, wherein the disease category obtained by searching is the disease category to which the disease name to be matched belongs. Or in order to reduce the number of first candidate disease names and further improve the efficiency of subsequent value range matching, each disease name in the international classification table may be classified more finely based on the disease onset position and pathology to obtain at least two disease classification tables.
In a possible manner, each disease name in the international disease classification table may be classified based on the pathology of the disease to obtain a first disease classification table, and each disease name in the international disease classification table may be classified based on the disease onset site to obtain a second disease classification table. Accordingly, determining the disease category to which the disease name to be matched belongs may be: searching in a first disease classification table based on the to-be-matched disease name, searching in a second disease classification table based on the to-be-matched disease name if the disease category to which the to-be-matched disease name belongs is not found in the first disease classification table, and determining the disease category to which the to-be-matched disease name belongs according to the disease category found in the second disease classification table.
Taking ICD-10 international disease classification table as an example, in the international disease classification table, in the 6-bit codes corresponding to each standard disease name, the first 3-bit code can represent the disease incidence position, and the 4-bit code can represent the pathology. Thus, classifying each disease name in the international disease classification table based on the pathology of the disease to obtain a first disease classification table may be: disease names with the same first 4 code values in the international disease classification table are classified into one type. Similarly, classifying each disease name in the international disease classification table based on the disease onset site to obtain a second disease classification table may be: the first 3 codes in the international disease classification table are classified into one type of disease names.
Thus, the number of disease names included in each disease category in the second disease classification table is greater than the number of disease names included in each disease category in the first disease classification table, i.e. the search range of the second class disease classification table is greater than the first class disease classification table. Therefore, when determining the disease category to which the disease name to be matched belongs, the disease name to be matched can be searched in the first disease classification table in a small range. If so, the found disease category can be used as the disease name to which the disease name to be matched belongs. If the disease name is not found, the range can be expanded, the disease name to be matched is found in a large-range second disease classification table, and the disease category to which the disease name to be matched belongs is determined according to the disease category found in the second disease classification table. That is, the disease category found in the second disease classification table is determined as the disease category to which the disease name to be matched belongs.
In a possible manner, if the disease category to which the name of the disease to be matched belongs is not found in the second disease classification table, the search range can be further enlarged. For example, the disease category to which the disease name to be matched belongs may be determined in a preset disease classification table, where each disease category includes more disease names than each disease category in the second disease classification table.
For example, the preset disease classification table may be obtained by performing custom division on each disease name in the international disease classification table according to actual situations. For example, referring to table 1, taking ICD-10 international disease classification table as an example, the preset disease classification table may include 20 disease categories of skin disease, musculoskeletal, reproductive, digestive system, infection, tumor, blood, endocrine, mental system, behavioral disorders, etc., each disease category corresponds to a different first 3 coding range, and each disease category corresponds to a first 3 coding range greater than the first 3 coding range corresponding to each disease category in the second disease classification table. Therefore, under the condition that the corresponding disease category is not found in the second disease category, the corresponding disease category can be found in a larger range in the preset disease category table, so that the purpose of preferentially matching the large category in the matching process is achieved, and candidate errors are reduced.
TABLE 1
According to any of the above methods, the disease category to which the disease name to be matched belongs may be determined by a disease classification table (such as a first disease classification table, a second disease classification table or a preset disease classification table) obtained by dividing the international disease classification table, and each disease name in the international disease classification table is a uniform international name, so the name of each disease category in the disease classification table may also be an international name. In practice, however, the names of diseases in medical data may not be internationally named, but rather popular names. For example, for the internationally named disease name "Alzheimer's disease", its popular disease name is "senile dementia". Therefore, in the above manner, the corresponding disease category cannot be determined according to popular disease names. For example, the corresponding disease category cannot be determined from "senile dementia".
In order to solve the problem, the embodiment of the disclosure may determine at least one disease name corresponding to the disease name to be matched, and then determine the disease category to which the disease name to be matched belongs according to the at least one disease name corresponding to the disease name to be matched. Wherein, the disease can be obtained by: the disease names in the plurality of sample medical data are analyzed to determine at least one disease name corresponding to the same disease, and then a corresponding relation between the disease and the at least one disease name is established. In the subsequent application, the disease name to be matched can be searched in the pre-established corresponding relation to determine at least one disease name corresponding to the disease name to be matched.
Of course, if possible, the disease name to be matched may be intelligently identified by the sample disease name and the sample disease name corresponding to the sample disease name, and then the input disease name to be matched may be intelligently identified by the trained disease name identification model.
After determining at least one disease name corresponding to the disease name to be matched, determining the disease category to which the disease name to be matched belongs according to the at least one disease name corresponding to the disease name to be matched. For example, the disease to be matched is named as "senile dementia", and the disease corresponding to the disease to be matched can be determined as "Alzheimer disease". Then, the disease category to which the name of the disease to be matched belongs can be determined according to the disease, which is referred to as "Alzheimer's disease" respectively. Therefore, the condition that the disease category to which the disease name to be matched belongs cannot be determined can be avoided, and the normal operation of the value domain data matching is ensured.
After determining the disease category to which the disease name to be matched belongs by any of the above methods, the first candidate disease name corresponding to the disease name to be matched can be determined according to the standard disease name included in the disease category. It should be understood that the first candidate disease name includes at least one standard disease name. Therefore, in order to determine an accurate value range matching result, the embodiment of the disclosure may further input the disease name to be matched into the semantic similarity model, so that the result output by the semantic similarity model is used as a second candidate disease name corresponding to the disease name to be matched, and a target disease name is determined as the value range matching result of the disease name to be matched by combining the second candidate disease name and the first candidate disease name.
For example, the semantic similarity model may be used to calculate the similarity between the input disease name to be matched and the sample disease name, and then output at least one sample disease name with the similarity exceeding a preset threshold, i.e. the second candidate disease name comprises at least one sample disease name. The preset threshold may be set according to an actual situation, which is not limited in the embodiments of the present disclosure.
By way of example, the semantic similarity model may be a BERT (Bidirectional Encoder Representation from Transformers) -based semantic similarity model. It should be understood that the BERT model in the related art is based on the attention mechanism only, does not consider parts of speech, and uses semantic information only for model training, so that more misjudgment situations may occur. In addition, the BERT model in the related art has the same weight for each word, and cannot highlight keywords, so that only one subject can be extracted for the whole sentence. In the embodiments of the present disclosure, in order to improve the problems existing in the BERT model in the related art, a semantic similarity model may be trained according to the part-of-speech features and the syntactic features of the sample disease name.
For example, the sample disease name may be obtained from large volume medical data that is text-wise sound and has accurate semantic expression. The part-of-speech feature may be used to characterize the part of speech characterized by each character in the disease name, and the syntactic feature may be used to characterize the ordering, full meaning, etc. of each character in the disease name. For example, for the disease name "senile dementia", it may be determined that the part-of-speech feature of the character "senile" is a noun feature for characterizing age, the part-of-speech features of the characters "dementia" and "foggy" may be adjective features for characterizing brain diseases, and the syntactic feature of "senile dementia" may be features for characterizing brain diseases of elderly people. It should be understood that this is merely illustrative, and that in practical applications, different part-of-speech and syntax features may be extracted for the disease name "senile dementia", and the embodiments of the present disclosure are not limited thereto.
Through the method, the trained semantic similarity model can extract part-of-speech features and syntactic features of the disease names to be matched to perform semantic similarity analysis, so that more accurate second candidate disease names are obtained. On the other hand, in the training process of the semantic similarity model, training is performed through a large number of sample disease names, so that the accuracy of the semantic similarity model can be improved to a certain extent, and a second more accurate candidate disease name is obtained.
After the first candidate disease name and the second candidate disease name are obtained, a target disease name can be determined as a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name.
In a possible manner, according to the first candidate disease name and the second candidate disease name, the value range matching result corresponding to the disease name to be matched may be: if the second candidate disease name has the same disease name as the first candidate disease name and the semantic similarity between the second candidate disease name and the disease name to be matched exceeds the preset semantic similarity, determining the disease name as a value domain matching result corresponding to the disease name to be matched. And if the second candidate disease name does not have the disease name which is the same as the first candidate disease name and has the semantic similarity exceeding the preset semantic similarity with the disease name to be matched, carrying out fuzzy matching on the second candidate disease name and the first candidate disease name so as to determine a value range matching result corresponding to the disease name to be matched. The preset semantic similarity may be set according to actual situations, which is not limited in the embodiments of the present disclosure.
It will be appreciated that the first candidate disease name is determined by the disease category and may be understood to be determined by rule matching with a higher accuracy than the second candidate disease name output by the semantic similarity model. The second candidate disease name is obtained through semantic similarity analysis, and compared with the first candidate disease name, the problem of lack of context semantics in the disease name matching process can be solved, and the scene applicability is higher. Accordingly, embodiments of the present disclosure determine a value range matching result for a disease name to be matched in combination with a first candidate disease name and a second candidate disease name.
For example, if the second candidate disease name can be matched to the same disease name in the first candidate disease name, that is, a disease name is determined through rule matching, and meanwhile, the disease name is also determined through semantic similarity analysis, then the disease name can be determined as a value range matching result corresponding to the disease name to be matched. Conversely, if the second candidate disease name does not match the same disease name in the first candidate disease name, such as by determining a disease name by rule matching, while determining another different disease name by semantic similarity analysis, the first candidate disease name and the second candidate disease name may be fuzzy matched to determine a target disease name. The fuzzy matching method is similar to that in the related art, and is not repeated here.
By the method, the value range matching result of the disease name to be matched can be determined by adopting different matching modes according to the matching condition of the second candidate disease name in the first candidate disease name, so that a more accurate value range matching result is obtained, and more accurate data standardization operation is realized.
It should be understood that, in the case where the second candidate disease name does not match the same disease name in the first candidate disease name, considering the problem of outputting the result, which may be a semantic similarity model, the disease name to be matched in this case may also be recorded, so as to manually collect the problem, and perform the processing such as checking the accuracy of the sample disease name related word, expanding the disease alias, and the like. Correspondingly, the semantic similarity model can be retrained through the sample disease names after the accuracy verification, and the corresponding relation between the disease names and the disease names can be updated according to the result after the disease alias expansion operation. Therefore, after each value range matching, the accuracy of the subsequent value range matching can be improved through the possible processing modes.
In practice, the first candidate disease name may include a greater number of standard disease names. If no corresponding disease category is found in the first disease classification table and the second disease classification table, the disease category found in the preset disease classification table is used as the disease category to which the disease name to be matched belongs. Referring to the above, in order to avoid the situation that the disease category to which the disease name to be matched belongs cannot be found, each disease category in the preset disease classification table includes a larger number of disease names. Thus, determining the first candidate disease name in this manner includes a greater number of standard disease names.
In this case, in order to reduce the candidate range and thus improve the value range matching efficiency, diagnostic department information and/or patient sex information may be obtained from medical data corresponding to the disease name to be matched, and the first candidate disease name may be screened according to the diagnostic department information and/or patient sex information to obtain the target candidate disease name, and then the value range matching result corresponding to the disease name to be matched may be determined according to the target candidate disease name and the second candidate disease name. The diagnosis department information comprises diagnosis department information such as endocardium, hematology, orthopedics and the like, and the sex information of the patient is female or male.
For example, referring to table 1, the disease categories to which the disease names to be matched belong are blood and hematopoietic diseases and certain diseases involving immune mechanisms, i.e., the first candidate disease name is all the disease names included in the disease category. If the diagnosis department information in the medical data corresponding to the disease name to be matched is endocardial department, screening can be performed on the first candidate disease name to obtain cardiovascular related diseases. Therefore, the candidate range can be reduced, and the value range matching efficiency can be improved.
Or referring to table 1, the disease category to which the disease name to be matched belongs is a reproductive system disease, i.e., the first candidate disease name is all disease names included in the disease category. If the patient sex information in the medical data corresponding to the disease name to be matched is female, screening can be carried out on the first candidate disease name so as to obtain female related reproductive system diseases. Therefore, the candidate range can be reduced, and the value range matching efficiency can be improved.
Of course, the first candidate disease name may be screened by combining the diagnosis department information and the patient sex information at the same time, and may be set according to actual situations, which is not limited in the embodiment of the present disclosure.
The value range data matching method provided by the present disclosure is described below by way of another exemplary embodiment. Referring to fig. 3, the value range data matching method includes:
Step 301, obtain the name of the disease to be matched from the medical data.
Step 302, determining whether the corresponding disease name is found in the international disease classification table, and if so, executing step 303. If not, step 304 is performed.
It should be understood that the value range data matching method provided by the present disclosure may be combined with a value range matching method in the related art, and if a corresponding disease name is found in the international disease classification table, the disease name is used as a value range matching result of the disease name to be matched. If the corresponding disease name is not found in the international disease classification table, continuing to execute the subsequent steps. The method for finding the corresponding disease name in the international disease classification table may be a fuzzy matching method, a word segmentation comparison method and the like in the related technology.
And step 303, taking the disease name as a value range matching result of the disease name to be matched.
Step 304, determining whether the disease name to be matched is found in the first disease classification table, if found, executing step 303, and if not found, executing step 305.
Step 305, determining whether the disease name to be matched is found in the second disease classification table, and if so, executing step 306. If not, step 307 is performed.
Step 306, determining the disease category to which the disease name to be matched belongs according to the disease category found in the second disease classification table.
Step 307, determining the disease category to which the disease name to be matched belongs in the preset disease classification table.
It should be understood that the manner of determining the first disease classification table, the second disease classification table and the preset disease classification table is described above, and will not be described herein.
Step 308, determining a first candidate disease name corresponding to the disease name to be matched according to the standard disease name included in the disease category.
Step 309, screening the first candidate disease name according to the diagnosis department information and/or the patient sex information to obtain the target candidate disease name.
Step 310, inputting the disease name to be matched into the semantic similarity model to obtain a second candidate disease name corresponding to the disease name to be matched. The related contents such as the training mode of the semantic similarity model are described above, and are not described herein.
Step 311, if the target candidate disease name has a disease name that is the same as the first candidate disease name and has a semantic similarity with the disease name to be matched that exceeds the preset semantic similarity, determining the disease name as a value range matching result corresponding to the disease name to be matched.
Step 312, if there is no disease name that is the same as the first candidate disease name and has a semantic similarity with the disease name to be matched that exceeds the preset semantic similarity, fuzzy matching is performed on the second candidate disease name and the first candidate disease name, so as to determine a value range matching result corresponding to the disease name to be matched.
The specific embodiments of the above steps are illustrated in detail above, and will not be repeated here. It should be further understood that for the purposes of simplicity of explanation of the above method embodiments, all of them are depicted as a series of acts in combination, but it should be appreciated by those skilled in the art that the present disclosure is not limited by the order of acts described above. Further, it should also be appreciated by those skilled in the art that the embodiments described above are preferred embodiments and that the steps involved are not necessarily required by the present disclosure.
By the method, the double advantages of the disease coding structure information and the big data model can be combined and considered, on the premise that the quality of candidate matching items is guaranteed by utilizing the pathological structure information so as to reduce errors, the deficiency of Wen Yuyi defects on the model itself is improved by adding context information such as syntactic features and part-of-speech features to the semantic similarity model, so that the matching capability of less value domain data of characters such as disease names is enhanced, and the data standardization operation of the disease names is better realized.
Based on the same inventive concept, the embodiment of the disclosure also provides a value range data matching device, which can be part or all of the electronic equipment in a mode of software, hardware or a combination of the two. Referring to fig. 4, the value range data matching apparatus 400 includes:
An obtaining module 401, configured to obtain a disease name to be matched from medical data;
A first determining module 402, configured to determine a disease category to which the disease name to be matched belongs, and determine a first candidate disease name corresponding to the disease name to be matched according to a standard disease name included in the disease category;
A second determining module 403, configured to input the disease name to be matched into a semantic similarity model to obtain a second candidate disease name corresponding to the disease name to be matched, where the semantic similarity model is obtained by training according to part-of-speech features and syntactic features of a sample disease name;
A third determining module 404, configured to determine a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name.
Optionally, the apparatus 400 further includes:
The classification module is used for classifying each disease name in the international disease classification table based on the disease incidence part and pathology of the disease so as to obtain a disease classification table;
The first determining module 402 is configured to:
And searching in the disease classification table based on the to-be-matched disease name to determine the disease category to which the to-be-matched disease name belongs.
Optionally, the classification module is configured to:
Classifying each disease name in the international disease classification table based on the pathology of the disease to obtain a first disease classification table, and classifying each disease name in the international disease classification table based on the disease incidence part of the disease to obtain a second disease classification table;
The first determining module 402 is configured to:
Searching in the first disease classification table based on the disease name to be matched;
when the disease category to which the disease name to be matched belongs is not found in the first disease classification table, searching is carried out in the second disease classification table based on the disease name to be matched, and the disease category to which the disease name to be matched belongs is determined according to the disease category found in the second disease classification table.
Optionally, the apparatus 400 further includes:
and a fourth determining module, configured to determine, in a preset disease classification table, a disease class to which the disease name to be matched belongs when the disease class to which the disease name to be matched belongs is not found in the second disease classification table, where the number of disease names included in each disease class in the preset disease classification table is greater than the number of disease names included in each disease class in the second disease classification table.
Optionally, the first determining module 402 is configured to:
determining at least one disease name corresponding to the disease name to be matched;
and determining the disease category to which the disease name to be matched belongs according to at least one disease name corresponding to the disease name to be matched.
Optionally, the third determining module 404 is configured to:
Acquiring diagnosis department information and/or patient sex information from the medical data corresponding to the disease names to be matched, and screening the first candidate disease names according to the diagnosis department information and/or the patient sex information to obtain target candidate disease names;
And determining a value range matching result corresponding to the disease name to be matched according to the target candidate disease name and the second candidate disease name.
Optionally, the third determining module 404 is configured to:
when a disease name which is the same as the first candidate disease name and has the semantic similarity exceeding the preset semantic similarity with the disease name to be matched exists in the second candidate disease name, determining the disease name as a value range matching result corresponding to the disease name to be matched;
And when the disease names which are the same as the first candidate disease names and have the semantic similarity exceeding the preset semantic similarity with the disease names to be matched do not exist in the second candidate disease names, carrying out fuzzy matching on the second candidate disease names and the first candidate disease names so as to determine the value domain matching result corresponding to the disease names to be matched.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Based on the same inventive concept, an embodiment of the present disclosure provides an electronic device including:
a memory having a computer program stored thereon;
and the processor is used for executing the computer program in the memory to realize the steps of any value range data matching method.
In a possible manner, a block diagram of the electronic device is shown in fig. 5. Referring to fig. 5, the electronic device 500 may include: a processor 501, a memory 502. The electronic device 500 may also include one or more of a multimedia component 503, an input/output (I/O) interface 504, and a communication component 505.
Wherein the processor 501 is configured to control the overall operation of the electronic device 500 to perform all or part of the steps in the value range data matching method described above. The memory 502 is used to store various types of data to support operation at the electronic device 500, which may include, for example, instructions for any application or method operating on the electronic device 500, as well as application-related data, such as contact data, messages sent and received, pictures, audio, video, and so forth. The Memory 502 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (ELECTRICALLY ERASABLE PROGRAMMABLE READ-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 503 may include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 502 or transmitted through the communication component 505. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 504 provides an interface between the processor 501 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 505 is used for wired or wireless communication between the electronic device 500 and other devices. Wireless Communication, such as Wi-Fi, bluetooth, near Field Communication (NFC) for short, 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or a combination of more of them, is not limited herein. The corresponding communication component 505 may thus comprise: wi-Fi module, bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic device 500 may be implemented by one or more Application Specific Integrated Circuits (ASIC), digital signal Processor (DIGITAL SIGNAL Processor, DSP), digital signal processing device (DIGITAL SIGNAL Processing Device, DSPD), programmable logic device (Programmable Logic Device, PLD), field programmable gate array (Field Programmable GATE ARRAY, FPGA), controller, microcontroller, microprocessor, or other electronic component for performing the value domain data matching method described above.
In another exemplary embodiment, a computer readable storage medium is also provided comprising program instructions which, when executed by a processor, implement the steps of the value range data matching method described above. For example, the computer readable storage medium may be the memory 502 described above including program instructions executable by the processor 501 of the electronic device 500 to perform the value range data matching method described above.
In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-mentioned value range data matching method when executed by the programmable apparatus.
The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present disclosure is not limited to the specific details of the embodiments described above, and various simple modifications may be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure, and all the simple modifications belong to the protection scope of the present disclosure.
In addition, the specific features described in the foregoing embodiments may be combined in any suitable manner, and in order to avoid unnecessary repetition, the present disclosure does not further describe various possible combinations.
Moreover, any combination between the various embodiments of the present disclosure is possible as long as it does not depart from the spirit of the present disclosure, which should also be construed as the disclosure of the present disclosure.

Claims (10)

1. A method for matching value range data, the method comprising:
acquiring a disease name to be matched from medical data;
Determining a disease category to which the disease name to be matched belongs from an international disease classification table, and determining a first candidate disease name corresponding to the disease name to be matched according to a standard disease name included in the disease category, wherein the first candidate disease name includes at least one standard disease name;
Inputting the disease names to be matched into a semantic similarity model to obtain second candidate disease names corresponding to the disease names to be matched, wherein the semantic similarity model is obtained by training according to part-of-speech features and syntactic features of sample disease names, and the second candidate disease names comprise at least one sample disease name;
And determining a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name.
2. The method according to claim 1, wherein the method further comprises:
Classifying each disease name in the international disease classification table based on the disease incidence part and pathology to obtain a disease classification table;
the determining the disease category to which the disease name to be matched belongs from the international disease classification table comprises the following steps:
And searching in the disease classification table based on the to-be-matched disease name so as to determine the disease category to which the to-be-matched disease name belongs from the international disease classification table.
3. The method of claim 2, wherein classifying each standard disease name in the international disease classification table based on the disease site and pathology to obtain a disease classification table, comprising:
Classifying each disease name in the international disease classification table based on the pathology of the disease to obtain a first disease classification table, and classifying each disease name in the international disease classification table based on the disease incidence part of the disease to obtain a second disease classification table;
the determining the disease category to which the disease name to be matched belongs from the international disease classification table comprises the following steps:
Searching in the first disease classification table based on the disease name to be matched;
If the disease category to which the disease name to be matched belongs is not found in the first disease classification table, searching in the second disease classification table based on the disease name to be matched, and determining the disease category to which the disease name to be matched belongs according to the disease category found in the second disease classification table.
4. A method according to claim 3, characterized in that the method further comprises:
If the disease category to which the disease name to be matched belongs is not found in the second disease classification table, determining the disease category to which the disease name to be matched belongs in a preset disease classification table, wherein the number of the disease names included in each disease category in the preset disease classification table is greater than the number of the disease names included in each disease category in the second disease classification table.
5. The method according to any one of claims 1-4, wherein said determining, from an international disease classification table, a disease category to which the disease name to be matched belongs, comprises:
determining at least one disease name corresponding to the disease name to be matched;
and determining the disease category to which the disease name to be matched belongs from an international disease classification table according to at least one disease name corresponding to the disease name to be matched.
6. The method according to any one of claims 1-4, wherein determining a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name comprises:
Acquiring diagnosis department information and/or patient sex information from the medical data corresponding to the disease names to be matched, and screening the first candidate disease names according to the diagnosis department information and/or the patient sex information to obtain target candidate disease names;
And determining a value range matching result corresponding to the disease name to be matched according to the target candidate disease name and the second candidate disease name.
7. The method according to any one of claims 1-4, wherein determining a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name comprises:
If the second candidate disease name has the same disease name as the first candidate disease name and the semantic similarity between the second candidate disease name and the disease name to be matched exceeds the preset semantic similarity, determining the disease name as a value range matching result corresponding to the disease name to be matched;
And if the second candidate disease name does not have the disease name which is the same as the first candidate disease name and has the semantic similarity exceeding the preset semantic similarity with the disease name to be matched, carrying out fuzzy matching on the second candidate disease name and the first candidate disease name so as to determine a value domain matching result corresponding to the disease name to be matched.
8. A value range data matching device, the device comprising:
the acquisition module is used for acquiring the disease name to be matched from the medical data;
A first determining module, configured to determine, from an international disease classification table, a disease category to which the disease name to be matched belongs, and determine, according to a standard disease name included in the disease category, a first candidate disease name corresponding to the disease name to be matched, where the first candidate disease name includes at least one standard disease name;
The second determining module is used for inputting the disease name to be matched into a semantic similarity model to obtain a second candidate disease name corresponding to the disease name to be matched, the semantic similarity model is obtained by training according to part-of-speech features and syntactic features of sample disease names, and the second candidate disease name comprises at least one sample disease name;
And a third determining module, configured to determine a value range matching result corresponding to the disease name to be matched according to the first candidate disease name and the second candidate disease name.
9. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any one of claims 1-7.
10. An electronic device, comprising:
a memory having a computer program stored thereon;
A processor for executing the computer program in the memory to implement the steps of the method of any one of claims 1-7.
CN202110120997.4A 2021-01-28 2021-01-28 Value range data matching method and device, storage medium and electronic equipment Active CN112818085B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110120997.4A CN112818085B (en) 2021-01-28 2021-01-28 Value range data matching method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110120997.4A CN112818085B (en) 2021-01-28 2021-01-28 Value range data matching method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN112818085A CN112818085A (en) 2021-05-18
CN112818085B true CN112818085B (en) 2024-06-18

Family

ID=75859954

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110120997.4A Active CN112818085B (en) 2021-01-28 2021-01-28 Value range data matching method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN112818085B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695336A (en) * 2020-04-26 2020-09-22 平安科技(深圳)有限公司 Disease name code matching method and device, computer equipment and storage medium
CN112183104A (en) * 2020-08-26 2021-01-05 望海康信(北京)科技股份公司 Code recommendation method, system and corresponding equipment and storage medium

Family Cites Families (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP6245835B2 (en) * 2012-04-19 2017-12-13 東芝メディカルシステムズ株式会社 Medical information search support device and medical information search support system
CN105069124B (en) * 2015-08-13 2018-06-15 易保互联医疗信息科技(北京)有限公司 A kind of International Classification of Diseases coding method of automation and system
CN109408631B (en) * 2018-09-03 2023-06-20 深圳平安医疗健康科技服务有限公司 Medicine data processing method, device, computer equipment and storage medium
US11783130B2 (en) * 2019-05-06 2023-10-10 John Snow Labs Inc. Using unsupervised machine learning for automatic entity resolution of natural language records
CN110931090A (en) * 2019-11-26 2020-03-27 太平金融科技服务(上海)有限公司 Disease data processing method and device, computer equipment and storage medium
CN111128388B (en) * 2019-12-03 2024-02-27 东软集团股份有限公司 Value range data matching method and device and related products
CN111274806B (en) * 2020-01-20 2020-11-06 医惠科技有限公司 Method and device for recognizing word segmentation and part of speech and method and device for analyzing electronic medical record
CN111506673A (en) * 2020-03-27 2020-08-07 泰康保险集团股份有限公司 Medical record classification code determination method and device
CN112149414B (en) * 2020-09-23 2023-06-23 腾讯科技(深圳)有限公司 Text similarity determination method, device, equipment and storage medium
CN112199954B (en) * 2020-10-10 2023-11-10 平安科技(深圳)有限公司 Disease entity matching method and device based on voice semantics and computer equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111695336A (en) * 2020-04-26 2020-09-22 平安科技(深圳)有限公司 Disease name code matching method and device, computer equipment and storage medium
CN112183104A (en) * 2020-08-26 2021-01-05 望海康信(北京)科技股份公司 Code recommendation method, system and corresponding equipment and storage medium

Also Published As

Publication number Publication date
CN112818085A (en) 2021-05-18

Similar Documents

Publication Publication Date Title
US10176804B2 (en) Analyzing textual data
CN108287858B (en) Semantic extraction method and device for natural language
US9223779B2 (en) Text segmentation with multiple granularity levels
CN108027823B (en) Information processing device, information processing method, and computer-readable storage medium
KR101339103B1 (en) Document classifying system and method using semantic feature
WO2018223796A1 (en) Speech recognition method, storage medium, and speech recognition device
US20120290561A1 (en) Information processing apparatus, information processing method, program, and information processing system
WO2018153130A1 (en) Translation method and apparatus
CN113593709B (en) Disease coding method, system, readable storage medium and device
US20220121824A1 (en) Method for determining text similarity, method for obtaining semantic answer text, and question answering method
CN105161104A (en) Voice processing method and device
CN114556328A (en) Data processing method and device, electronic equipment and storage medium
CN109299227B (en) Information query method and device based on voice recognition
US20210183526A1 (en) Unsupervised taxonomy extraction from medical clinical trials
US11893813B2 (en) Electronic device and control method therefor
CN114708976A (en) Method, device, device and storage medium for auxiliary diagnosis technology
WO2021159812A1 (en) Cancer staging information processing method and apparatus, and storage medium
CN111126084B (en) Data processing method, device, electronic equipment and storage medium
JP2008225963A (en) Machine translation device, replacement dictionary creating device, machine translation method, replacement dictionary creating method, and program
CN112115697A (en) Method, device, server and storage medium for determining target text
CN112818085B (en) Value range data matching method and device, storage medium and electronic equipment
CN116842168B (en) Cross-domain problem processing method and device, electronic equipment and storage medium
JP6107003B2 (en) Dictionary updating apparatus, speech recognition system, dictionary updating method, speech recognition method, and computer program
US11328719B2 (en) Electronic device and method for controlling the electronic device
CN112749545B (en) Medical data processing method, device and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant