WO2021114632A1

WO2021114632A1 - Disease name standardization method, apparatus, device, and storage medium

Info

Publication number: WO2021114632A1
Application number: PCT/CN2020/099487
Authority: WO
Inventors: 姚海申; 蒋雪涵; 徐卓扬; 孙行智; 胡岗
Original assignee: 平安科技（深圳）有限公司
Priority date: 2020-05-13
Filing date: 2020-06-30
Publication date: 2021-06-17
Also published as: CN111696635A

Abstract

Provided are a disease name standardization method, apparatus, device, and storage medium, said method comprising: obtaining a target dictionary, current diagnosis text, and preset ICD standard disease name set, said preset ICD standard disease name set comprising a plurality of preset ICD standard disease names (101); on the basis of the target dictionary, performing word-cut operations on the current diagnosis text to obtain a name of the disease to be standardized contained in the current diagnosis text (102); on the basis of the preset ICD standard disease name set, building a target dictionary tree (103); on the basis of the target dictionary tree, matching the name of the disease to be standardized with a plurality of preset ICD standard disease names in the preset ICD standard disease name set to obtain a plurality of first degrees of matching (104); if there is a target first degree of matching which meets a preset condition among the plurality of first degrees of matching, then obtaining a target preset ICD standard disease name corresponding to a target first degree of matching, and determining the target preset ICD standard disease name to be the result of conversion of the disease name to be standardized (105); thus are facilitated the improvement of conversion efficiency and accuracy. In addition, the present application also relates to blockchain technology, and data can be stored in a blockchain node.

Description

Disease name standardization method, device, equipment and storage medium

This application claims the priority of a Chinese patent application filed with the Chinese Patent Office with an application number of 2020104013701 on May 13, 2020, with the title of "Disease Name Standardization Method and Device", the entire content of which is incorporated into this application by reference.

Technical field

This application relates to the technical field of disease name standardization of artificial intelligence, and specifically relates to a disease name standardization method, device, equipment and storage medium.

Background technique

In recent years, with the vigorous development of smart medical care, smart medical technology based on big data has increasingly higher requirements for data quality. As an important feature, the name of diagnosed disease plays an important role in the field of medical research.

The inventor realized that different doctors in the hospital have different writing habits, and it is often difficult to achieve uniformity for the same disease name. Therefore, how to quickly and effectively extract the doctor’s diagnosed disease name from the medical record has become a problem that needs to be solved. .

Summary of the invention

The embodiments of the present application provide a disease name standardization method, device, equipment, and storage medium, which are beneficial to improve the efficiency of disease name standardization.

The first aspect of the embodiments of the present application provides a method for standardizing disease names, which is applied to electronic equipment, including:

Acquiring the target dictionary, the current diagnosis text, and a preset ICD (international Classification of diseases) standard disease name set, where the preset ICD standard disease name set includes multiple preset ICD standard disease names;

Based on the target dictionary, perform a word cutting operation on the current diagnosis text to obtain the name of the disease to be standardized contained in the current diagnosis text;

Constructing a target dictionary tree based on the preset ICD standard disease name set;

Based on the target dictionary tree, matching the name of the disease to be standardized with the multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees;

When there is a target first matching degree that meets a preset condition among the plurality of first matching degrees, the target preset ICD standard disease name corresponding to the target first matching degree is acquired, and the target is preset to the ICD standard disease The name is determined as the conversion result of the name of the disease to be standardized.

The second aspect of the embodiments of the present application provides a disease name standardization device, which is applied to electronic equipment, and the device includes: an acquisition unit, a word segmentation unit, a construction unit, a matching unit, and a determination unit, wherein:

The acquiring unit is configured to acquire a target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes a plurality of preset ICD standard disease names;

The word segmentation unit is configured to perform a word segmentation operation on the current diagnosis text based on the target dictionary to obtain the name of the disease to be standardized contained in the current diagnosis text;

The construction unit is configured to construct a target dictionary tree based on the preset ICD standard disease name set;

The matching unit is configured to match the name of the disease to be standardized with the plurality of preset ICD standard disease names in the set of preset ICD standard disease names based on the target dictionary tree to obtain a plurality of first suitability;

The determining unit is configured to obtain the target preset ICD standard disease name corresponding to the target first matching degree when there is a target first matching degree that satisfies a preset condition among the plurality of first matching degrees, and The target preset ICD standard disease name is determined as the conversion result of the disease name to be standardized.

A third aspect of the embodiments of the present application provides an electronic device, which includes a processor, a memory, a communication interface, and one or more programs, and the one or more programs are stored in the memory and configured Is executed by the processor, wherein the memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to invoke the program instructions to execute the first aspect of the embodiments of the present application Methods.

The fourth aspect of the embodiments of the present application provides a computer-readable storage medium, including a storage data area and a storage program area. The storage data area stores data created according to the use of blockchain nodes, and the storage program area stores computer programs. , Wherein the computer program includes program instructions, and the program instructions are executed by a processor as part or all of the steps described in the first aspect of the embodiments of the present application.

Through the embodiment of this application, applied to electronic equipment, the above method includes: obtaining the target dictionary, the current diagnosis text and the preset ICD standard disease name set. The preset ICD standard disease name set includes multiple preset ICD standard disease names based on the target Dictionary, perform word segmentation operation on the current diagnosis text, get the name of the disease to be standardized in the current diagnosis text, build a target dictionary tree based on the preset ICD standard disease name set, based on the target dictionary tree, combine the name of the disease to be standardized with the preset The multiple preset ICD standard disease names in the ICD standard disease name set are matched to obtain multiple first matching degrees. When there is a target first matching degree that meets the preset conditions among the multiple first matching degrees, the target first matching degree is obtained. The target preset ICD standard disease name corresponding to a match degree is determined, and the target preset ICD standard disease name is determined as the conversion result of the disease name to be standardized. In this way, the current diagnosis text can be cut through the target dictionary to reduce the current diagnosis There are problems such as colloquialization, typos, omissions, abbreviations, etc. in the text. In addition, a target dictionary tree constructed based on the set of preset ICD standard disease names will match multiple preset ICD standard disease names with the names to be standardized to obtain The conversion structure is conducive to improving the conversion efficiency and accuracy.

Description of the drawings

FIG. 1A provides a schematic structural diagram of a method for standardizing disease names according to an embodiment of this application;

FIG. 1B provides a schematic flowchart of a method for standardizing disease names according to an embodiment of this application;

FIG. 1C is a schematic structural diagram of a method for extracting names of diseases to be standardized according to an embodiment of this application;

FIG. 1D is a schematic structural diagram of a target dictionary tree provided in an embodiment of this application;

FIG. 2 is a schematic flowchart of a method for standardizing disease names according to an embodiment of this application;

FIG. 3 is a schematic flowchart of a method for standardizing disease names according to an embodiment of this application;

FIG. 4 provides a schematic structural diagram of an electronic device according to an embodiment of the application;

FIG. 5 is a schematic structural diagram of a disease name standardization device provided in an embodiment of this application.

Detailed ways

The technical solutions in the embodiments of the present application will be clearly and completely described below in conjunction with the accompanying drawings in the embodiments of the present application. Obviously, the described embodiments are only a part of the embodiments of the present application, rather than all the embodiments. Based on the embodiments in this application, all other embodiments obtained by those of ordinary skill in the art without creative work shall fall within the protection scope of this application.

The terms "first", "second", etc. in the specification and claims of this application and the above-mentioned drawings are used to distinguish different objects, rather than to describe a specific sequence. In addition, the terms "including" and "having" and any variations of them are intended to cover non-exclusive inclusions. For example, a process, method, system, product, or device that includes a series of steps or units is not limited to the listed steps or units, but optionally includes unlisted steps or units, or optionally also includes Other steps or units inherent to these processes, methods, products or equipment.

The reference to "embodiments" in this application means that a specific feature, structure, or characteristic described in conjunction with the embodiments may be included in at least one embodiment of the present application. The appearance of the phrase in various places in the specification does not necessarily refer to the same embodiment, nor is it an independent or alternative embodiment mutually exclusive with other embodiments. Those skilled in the art clearly and implicitly understand that the embodiments described in this application can be combined with other embodiments.

In order to better understand the embodiments of the present application, the method of applying the embodiments of the present application will be introduced below.

Electronic devices can include various handheld devices with wireless communication functions, vehicle-mounted devices, wearable devices (such as smart watches, smart bracelets, pedometers, etc.), computing devices or other processing devices connected to wireless modems, and various Forms of user equipment (User Equipment, UE), mobile station (Mobile Station, MS), terminal equipment (terminal device), and so on. For ease of description, the devices mentioned above are collectively referred to as electronic devices.

Please refer to Figure 1A. Figure 1A is a schematic structural diagram of a disease name standardization method provided by an embodiment of the present application. Based on the structural schematic diagram, the target dictionary, the current diagnosis text and the preset ICD standard disease name set can be obtained, and the ICD standard can be preset The disease name set includes multiple preset ICD standard disease names. Then, based on the target dictionary, the current diagnosis text can be word-cut to obtain the disease names to be standardized in the current diagnosis text, and based on the preset ICD standard disease name sets , Build a target dictionary tree, based on the target dictionary tree, match the name of the disease to be standardized with multiple preset ICD standard disease names in the set of preset ICD standard disease names to obtain multiple first matching degrees, and finally, when multiple first matches When there is a target first matching degree that meets the preset conditions in a matching degree, the target preset ICD standard disease name corresponding to the target first matching degree is obtained, and the target preset ICD standard disease name is determined as the conversion of the disease name to be standardized result.

It can be seen that, through the method for standardizing disease names provided by the embodiments of this application, the current diagnosis text can be word-cut through the target dictionary, so as to reduce the colloquialization, typos, omissions, abbreviations and other problems in the current diagnosis text. In addition, the target dictionary tree constructed based on the set of preset ICD standard disease names matches multiple preset ICD standard disease names with names to be standardized to obtain a conversion structure, which is beneficial to improve conversion efficiency and accuracy.

Please refer to FIG. 1B. FIG. 1B is a schematic flowchart of a method for standardizing disease names provided by an embodiment of the present application, which is applied to an electronic device, and the above method includes the following steps:

101. Obtain a target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes a plurality of preset ICD standard disease names.

Among them, the embodiments of the present application can be applied to electronic equipment, and the electronic equipment may include a disease name standardization system as shown in FIG. 1A. The target dictionary can be diagnosed by multiple historical disease conditions of multiple patients stored in a historical disease case database. The case is obtained through data processing, and the target dictionary can include multiple historical disease names; the above-mentioned preset ICD standard disease name set can be set by the user or the system defaults, and the preset ICD standard disease name set can include multiple preset ICD standards Disease name, the above current diagnosis text may refer to the diagnosis text corresponding to any one of the above new disease cases or the diagnosis text that needs to be standardized for disease names. The current diagnosis text may include at least one of the following: prescription information, diagnosis information , Disease description information, discharge summary information, hospital information, department information, patient information, etc., which are not limited here.

In a possible example, before the above step 101, before obtaining the target dictionary, the following steps may be further included:

A1. Extract historical diagnosis text information from the historical disease case database;

A2. Perform data cleaning on the historical diagnosis text information to obtain a set of historical disease names;

A3. Perform data processing on the historical disease name set and the preset ICD standard disease name set to obtain the target dictionary.

Among them, the aforementioned historical disease case database can store multiple historical disease diagnosis cases of multiple patients, and the historical diagnosis cases can include at least one of the following: admission diagnosis information and discharge diagnosis information, etc., which are not limited here. , Both the admission diagnosis information and the discharge diagnosis information can include at least one of the following: prescription information, diagnosis information, disease description information, discharge summary information, hospital information, department information, patient information, etc., which are not limited here; the above prescription information It may include at least one of the following: disease name, disease symptoms, drug name, drug dosage, etc., which are not limited here.

In specific implementation, multiple historical diagnosis cases can be extracted from the historical disease case database, and historical diagnosis text information can be extracted from them. After historical diagnosis text information is obtained, the historical diagnosis text information can be cleaned according to preset rules. , To obtain the historical disease name set. In addition, the preset rules can be set by the user or the system defaults, and are not limited here. For example, the above historical diagnosis text information can be eliminated to eliminate unnecessary fields (for example, non- Disease name field), and further, based on empirical knowledge, the missing fields in the historical diagnosis text information can be supplemented or data modified. Finally, a set of historical disease names can be obtained, which can include multiple historical disease diagnoses. The multiple disease names corresponding to the case, further, the historical disease name set and the preset ICD standard disease name set can be processed for data, so that an expanded target dictionary can be obtained, and the target dictionary can include multiple disease names. The use of preset preset rules to clean the data helps to alleviate the inaccuracy and incompleteness of the extraction of the use rules. In addition, the extracted disease names do not need to be manually corrected, which is beneficial to save labor costs.

In a possible example, the foregoing step A2, performing data cleaning on the historical diagnosis text information to obtain a set of historical disease names, may include the following steps:

A21. Obtain multiple preset regular expressions for multiple preset disease names, where each preset disease name corresponds to a preset regular expression;

A22. Match the historical diagnosis text information with each preset regular expression of the plurality of preset regular expressions to obtain a plurality of second matching degrees, each of the preset regular expressions corresponds to A second degree of matching;

A23. Determine at least one preset disease name corresponding to at least one second matching degree that exceeds a first preset threshold among the plurality of second matching degrees, and use the at least one preset disease name as the disease name set .

Among them, the above-mentioned first preset threshold can be set by the user or defaulted by the system, which is not limited here. The electronic device can store multiple preset disease names in advance, and preset a regular expression for each preset disease name. , The aforementioned preset regular expression can be composed of ordinary characters and metacharacters, and the preset regular expression can reflect the logical relationship between each character of the corresponding preset disease name, because the aforementioned historical diagnosis text information may contain a large amount of The colloquialized, repetitive names or some abbreviations, typos, therefore, different preset regular expressions can be set in advance according to the characteristics of the word formation corresponding to medical nouns, for example, according to the actual corresponding disease names. The delimiter is formulated, such as "(%s\s*\d+)|(%s\s*(\s*\d+)", etc. In this way, the above-mentioned diagnostic text information can be cleaned according to the preset regular expression. , It can eliminate meaningless characters and repeated names in the data to get a set of disease names that contains complete disease names.

In a specific implementation, the historical diagnosis text information can be matched with each preset regular expression separately to logically filter the historical diagnosis text information to obtain multiple second matching degrees, and each preset regular expression can correspond to one The second matching degree, and further, at least one preset disease name corresponding to at least one second matching degree greater than the first preset threshold can be filtered from the plurality of second matching degrees, and the at least one preset disease name can be used as A set of disease names, in this way, a complete and reliable set of disease names can be obtained.

In a possible example, the above step A3, performing data processing on the historical disease name set and the preset ICD standard disease name set to obtain a target dictionary, may include the following steps:

A31. Combine the disease name set with the preset ICD standard disease name set to obtain a first dictionary, and the first dictionary includes multiple first disease names;

A32. Deduplicate the names of the multiple first diseases to obtain the target dictionary.

Among them, the aforementioned preset ICD standard disease name set can be set by the user or the system defaults. The preset ICD standard disease name set may include multiple preset ICD standard disease names, and the expression method of the preset ICD standard disease names may be based on multiple Certain characteristics of each disease can be determined. For example, diseases can be classified according to certain rules, and disease names can be represented by coding methods. In order to expand the dictionary corresponding to disease names and more realistic data, the above disease name sets can be combined with Preset ICD standard disease name set for data processing to obtain an expanded target dictionary. The target dictionary can still include multiple preset ICD standard disease names. In this way, it is also helpful to improve the new diagnosis text (current diagnosis text) The accuracy of word segmentation.

In specific implementation, the above-mentioned disease name set and the above-mentioned preset ICD standard disease name set can be merged to obtain the first dictionary, and then the same and repeated first disease names in the first dictionary can be deduplicated, and finally obtain The above target dictionary.

As shown in Figure 1C, it is a schematic diagram of the structure of a method for extracting disease names to be standardized. As shown in the figure, historical diagnosis text information can be extracted from the historical disease case database, and the historical diagnosis text information can be data cleaned to obtain the history Disease name set. Data processing is performed on the historical disease name set and the preset ICD standard disease name set to obtain the target dictionary, obtain the current diagnosis text, and perform word cutting operations on the current diagnosis text based on the target dictionary to obtain the current diagnosis text. To be standardized disease names, in this way, the current diagnosis text can be cut through the target dictionary to reduce the colloquialization, typos, omissions, abbreviations and other problems in the current diagnosis text. The extracted disease names do not require manual labor. Correction is conducive to saving labor costs.

102. Based on the target dictionary, perform a word cutting operation on the current diagnosis text to obtain the name of the disease to be standardized contained in the current diagnosis text.

Among them, because there may be a large number of colloquial, repetitive names or some abbreviated names in the current diagnosis text, the current diagnosis text can be segmented based on the target dictionary obtained by processing the historical diagnosis information in the historical disease case database. Operate to get the name of the disease to be standardized in the current diagnosis text. The current diagnosis text can be any new diagnosis text. In this way, the disease name can be extracted from the current diagnosis text faster based on the target dictionary, which is effective Solve the inaccuracy and incompleteness of the extraction of the use rules, so that the extracted disease names do not need to be manually corrected, which is beneficial to improve efficiency.

In specific implementation, based on the target dictionary, performing word cutting operations on the current diagnosis text to obtain the name of the disease to be standardized contained in the current diagnosis text may include the following steps: the target dictionary may be based on the corresponding target dictionary. Count the words in the current diagnosis text, and calculate the frequency of each word in the current diagnosis text. That is to say, when any sentence to be segmented in the current diagnosis text appears, all possible word segmentation results are counted, and the The word segmentation result with the highest probability is used as the name of the disease to be standardized.

103. Construct a target dictionary tree based on the preset ICD standard disease name set.

Among them, the aforementioned preset ICD standard disease name set may include multiple preset ICD standard disease names. When a new diagnosis text appears, in order to facilitate matching with the text information in the target dictionary, or to search the preset ICD standard disease name set The electronic device can construct a target dictionary tree based on the preset ICD standard disease name set of the preset ICD standard disease names. The target dictionary tree can be understood as a dictionary tree constructed from any one or more strings. It stores the string of the above-mentioned preset ICD standard disease name set.

In a possible example, the above step 103, building a target dictionary tree based on the preset ICD standard disease name set, may include the following steps:

31. Based on the preset ICD standard disease name set, determine the first sequence corresponding to each preset ICD standard disease name in the multiple preset ICD standard disease names in the preset ICD standard disease name set, and obtain multiple first sequences A sequence, wherein each first sequence includes at least one character;

32. Obtain a preset dictionary tree, where the preset dictionary tree includes multiple nodes;

33. Traverse the multiple first sequences, and match each of the first sequences with multiple nodes corresponding to the preset dictionary tree to obtain multiple third matching degrees;

34. Calculate the average value of the plurality of third matching degrees;

35. If the average value is greater than the second preset threshold, do not update the preset dictionary tree, and use the preset dictionary tree as the target dictionary tree;

36. If the average value is less than or equal to the second preset threshold, update the preset dictionary tree to obtain the target dictionary tree.

Among them, the above second preset threshold value can be set by the user or the system default, which is not limited here. The above preset dictionary tree can be set by the user or the system defaults, and is not limited here. The preset dictionary tree can be understood as The initial dictionary tree is a dictionary tree that has not yet stored multiple strings in the preset ICD standard disease name set. The preset dictionary tree can include multiple nodes, and each node can correspond to a character. When constructing the preset dictionary tree It can be generated based on the International Disease Classification ICD code. For example, the preset dictionary tree can have a two-layer structure, the first layer is the disease category, such as A00. (cholera), and the second layer is the diseases included in the category Name, such as A00.0 (typical cholera) and so on.

In specific implementation, the first sequence corresponding to each preset ICD standard disease name among the multiple preset ICD standard disease names can be determined according to the preset ICD standard disease name set to obtain multiple first sequences, where, Each first sequence includes at least one character, which can include two types of text characters and special characters; further, based on a preset dictionary tree, each first sequence corresponding to the multiple first sequences can be traversed, and the Multiple first sequences are matched one by one with multiple nodes corresponding to the above-mentioned preset dictionary tree. If the matching is successful, the above-mentioned preset dictionary tree is not updated. If the matching fails, the above-mentioned preset dictionary tree is updated to obtain the target The dictionary tree. In this way, the above-mentioned preset dictionary tree can be expanded step by step to convert the preset ICD standard disease name set into the target dictionary tree, which is beneficial to improve the efficiency of subsequent disease name standardization. It should be noted that it can also be based on The preset ICD standard disease name set is used to construct a target dictionary tree. In this way, when the electronic device obtains the current diagnosis text, it can directly implement subsequent steps of standardizing disease names based on the target dictionary tree.

Further, when traversing each of the multiple first sequences, multiple third matching degrees can be obtained, and the average value of the multiple third matching degrees can be calculated. If the average value is greater than the second preset threshold, the matching is considered successful , The above-mentioned preset dictionary tree is not updated. On the contrary, if the mean value is less than or equal to the second preset threshold, it is considered that the matching fails, and the above-mentioned preset dictionary tree can be updated to obtain the target dictionary tree.

In addition, the step of gradually updating the preset dictionary tree can also be performed during the traversal process. A third preset threshold can be preset in the electronic device. The third preset threshold can be set by the user or the system defaults. If a third match occurs If the degree is greater than the third preset threshold, it can be considered that the corresponding first sequence matches the node in the preset dictionary tree successfully, that is, if the matching is successful, the preset dictionary tree is not updated. If the three matching degrees are less than or equal to the third preset threshold, it can be considered that the corresponding first sequence fails to match the node in the preset dictionary tree, and the preset dictionary tree can be updated based on the corresponding first node. In this way, the above-mentioned first sequence is traversed step by step, and the above-mentioned method is used cyclically, and the above-mentioned preset dictionary tree can be gradually updated to obtain the target dictionary tree.

In a possible example, the above step 36, updating the preset dictionary tree to obtain the target dictionary tree, may include the following steps:

361. Determine initial mappings corresponding to the multiple nodes in the preset dictionary tree based on the preset dictionary tree.

362. If the character i corresponding to the first sequence i does not exist in the initial mapping, add a new mapping i, save the character i in the mapping i, and update the initial mapping to the mapping i , Wherein the first sequence i is any one of the multiple first sequences, and the character i is any one character in the first sequence;

363. Based on the mapping i, update the preset dictionary tree to the target dictionary tree.

Among them, since the above steps are based on the average value of the multiple third matching degrees corresponding to the multiple first sequences, it is not necessary to update the preset dictionary tree, and it is not clear which node of the preset dictionary tree needs to be updated. Therefore, when updating, a judgment can be made based on each first sequence to gradually update the above-mentioned preset dictionary tree, which is beneficial to improve the accuracy of constructing the target dictionary tree.

In a specific implementation, the electronic device may determine an initial mapping corresponding to the preset dictionary tree based on the foregoing preset dictionary tree, and the initial mapping may be an initial path corresponding to the foregoing preset dictionary tree, and the initial path indicates the foregoing preset dictionary tree. The mapping relationship between every two nodes in the dictionary tree also reflects the mapping relationship between each node in the preset dictionary tree and the corresponding stored information; if there is any corresponding character i in the first sequence i does not exist in In the case of the initial mapping corresponding to the above multiple nodes, based on the above initial mapping, a new mapping i is added to save the character i. At this time, the preset dictionary tree is updated to the first dictionary tree. At the same time, Based on the mapping i, the initial mapping can also be updated to the first mapping including the mapping i, where the character i is any character in the first sequence, and the first sequence i is any one of the multiple first sequences In this way, the above steps can be repeated for other characters in the first sequence to gradually update the first mapping. Finally, the above steps can be performed for all the first sequences to update the first mapping step by step. After all the characters in a sequence are traversed, the target dictionary tree can be obtained.

As shown in Figure 1D below, it is a schematic diagram of the structure of a target dictionary tree. The target dictionary tree can be constructed based on the preset ICD standard disease name set. As shown in the figure, it can be based on the preset ICD standard disease name "Amoeba "Enteritis", "Amoebiasis", "Amoebic dysentery", "Addison's disease" and "Alzheimer's disease" and other preset ICD standard disease names to construct the target dictionary tree as shown in the figure , The solid circle represents the end node of a disease name path.

104. Based on the target dictionary tree, match the name of the disease to be standardized with the multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees.

Among them, after obtaining the name of the disease to be standardized, the name of the disease to be standardized can be matched and searched based on the above target dictionary tree to obtain the ICD standard disease name corresponding to the name of the disease to be standardized, that is, the name of the disease to be standardized can be changed Matching with multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees, and obtaining the conversion result of the disease name to be standardized based on the multiple first matching degrees.

In a possible example, in step 104, the name of the disease to be standardized is matched with the multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees , Can include the following steps:

41. Determine a target mapping table corresponding to the target dictionary tree, a plurality of preset paths in the target mapping table, and each disease name in the target dictionary tree corresponds to a preset path;

42. Based on the target dictionary tree, starting from the character j, and sequentially searching downwards according to the head node corresponding to the target dictionary tree, to obtain the path j to be matched and the predetermined path corresponding to the path to be matched in the target mapping table. Set path j, where the character j is the first character in the name of the disease to be standardized;

43. Match the path j to be matched with the preset path j to obtain the first matching degree j of the character j.

The electronic device may determine the target mapping table corresponding to the target dictionary tree, and the target mapping table may include a preset path corresponding to each disease name in the target dictionary tree. Furthermore, based on the target dictionary tree, the target mapping table may be selected from the disease name to be standardized. Starting with the first character j, search downwards from the head node corresponding to the target dictionary tree in order to obtain the path j to be matched and the preset path j corresponding to the path to be matched in the mapping table, and compare the path j to be matched with the preset path j. The path j is matched to obtain the first matching degree, and so on, until each character corresponding to the name of the disease to be standardized is looped through each path corresponding to the target dictionary tree, and multiple first matching degrees are obtained.

105. When there is a target first matching degree that meets a preset condition among the plurality of first matching degrees, obtain the target preset ICD standard disease name corresponding to the target first matching degree, and preset the target to the ICD standard The name of the disease is determined as the conversion result of the name of the disease to be standardized.

Among them, the above preset conditions can be set by the user or the system defaults, which are not limited here. For example, it can be set that when the first matching degree is 100%, it can be determined that the matching is successful, and it can be determined when the matching is successful. The preset ICD standard disease name corresponding to the first matching degree of the target that meets the preset condition is the conversion result of the disease name to be standardized, otherwise the matching fails.

Optionally, if the name of the disease to be standardized is matched with multiple preset ICD standard disease names in the set of preset ICD standard disease names based on the above target dictionary tree, if the characters corresponding to the name of the disease to be standardized cannot be found When the above-mentioned target dictionary tree is completely matched, that is, when the above-mentioned preset conditions are not met, the electronic device may, based on the path similarity and semantic similarity of the target dictionary tree, compare the name of the disease to be standardized with the above-mentioned preset ICD standard disease name. The name of the disease to be standardized is matched to obtain the conversion result of the name of the disease to be standardized. In this way, when the characters corresponding to the name of the disease to be standardized are not completely matched in the target dictionary tree, the disease to be standardized can still be realized. The name conversion is helpful to improve the accuracy of matching.

In specific implementation, the target dictionary tree may include a root node, and the second sequence corresponding to the name of the disease to be standardized can be determined; starting from the root node corresponding to the target dictionary tree, it is obtained that the second sequence is in the target Multiple second paths obtained by matching in the dictionary tree; calculating the semantic similarities corresponding to the multiple second paths to obtain multiple semantic similarities; selecting the maximum value of the multiple semantic similarities corresponding to the second path as the target Path; determine that the character corresponding to the target path in the target dictionary tree is the conversion result of the name of the disease to be standardized.

Among them, the above calculation of the semantic similarity corresponding to the multiple second paths may adopt the following formula:

X=a×X ₁ + b×X ₂ ;

Among them, wi=(a1, a2,...,an), wj=(b1,b2,...,bn), and n is the dimension of the word vector. The above X ₁ represents the path length corresponding to each second path, which can also be understood as the depth of each traversal. The above a and b are real numbers respectively. The values of a and b can be adjusted to adjust X ₁ and X ₂ to be similar in calculation semantics. The weight in degrees.

In addition, when starting the traversal with the root node corresponding to the target dictionary tree as the starting point, and obtaining multiple second paths obtained by matching the second sequence in the target dictionary tree, the following steps may be included: taking the root node as the starting point , Select a path m (m is a positive integer) as the current subtree, and the path m is any path with the root node as the starting point; in any layer corresponding to the current subtree, search for the name of the disease to be standardized The first character. If the character is found, search for the second character corresponding to the name of the disease to be standardized in the next layer of any of the above layers, and then repeat the search for the third character corresponding to the name of the disease to be standardized. If it is not found in the above-mentioned current subtree, select another path other than the path m and repeat the above steps to traverse. In this way, multiple second paths can be obtained, and the second paths may or may not be completely included All the characters corresponding to the name of the disease to be standardized, so the matching method that combines path similarity and semantic similarity through the above method is beneficial to improve the accuracy of matching, so as to quickly convert the name of the disease to be standardized into the standardized disease name.

It can be seen that the disease name standardization method described in the embodiments of this application is applied to electronic equipment, and this application can be applied to the field of smart medical care, thereby promoting the construction of smart cities. The above method includes: obtaining the target dictionary, the current diagnosis text and the preset ICD standard disease name set, the preset ICD standard disease name set includes multiple preset ICD standard disease names, and the current diagnosis text is word-cut based on the target dictionary. Get the name of the disease to be standardized contained in the current diagnosis text, build a target dictionary tree based on the set of preset ICD standard disease names, and based on the target dictionary tree, combine the name of the disease to be standardized with the preset ICD standard disease names of multiple preset ICDs The standard disease names are matched to obtain multiple first matching degrees. When there is a target first matching degree that meets the preset conditions among the multiple first matching degrees, the target preset ICD standard disease corresponding to the target first matching degree is obtained Name, the target preset ICD standard disease name is determined as the conversion result of the disease name to be standardized. In this way, the current diagnosis text can be cut through the target dictionary to reduce the colloquialization, typos, and omissions in the current diagnosis text In addition, based on the target dictionary tree constructed based on the preset ICD standard disease name set, the multiple preset ICD standard disease names are matched with the names to be standardized to obtain the conversion structure, which is beneficial to improve the conversion efficiency and accuracy. .

Consistent with the above, please refer to FIG. 2. FIG. 2 is an exemplary flow chart of a disease name standardization method disclosed in an embodiment of the present application, which is applied to an electronic device. The disease name standardization method may include the following steps:

201. Obtain a preset ICD standard disease name set, where the preset ICD standard disease name set includes multiple preset ICD standard disease names.

202. Extract historical diagnosis text information from the historical disease case database.

203. Perform data cleaning on the historical diagnosis text information to obtain a set of historical disease names.

204. Perform data processing on the historical disease name set and the preset ICD standard disease name set to obtain the target dictionary.

205. Acquire the current diagnosis text.

206. Based on the target dictionary, perform a word cutting operation on the current diagnosis text to obtain the name of the disease to be standardized contained in the current diagnosis text.

207. Construct a target dictionary tree based on the preset ICD standard disease name set.

208. Based on the target dictionary tree, match the name of the disease to be standardized with the multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees.

209. When there is a target first matching degree that meets a preset condition among the plurality of first matching degrees, obtain the target preset ICD standard disease name corresponding to the target first matching degree, and preset the target ICD The standard disease name is determined as the conversion result of the disease name to be standardized.

For the disease name standardization method described in the above steps 201 to 209, refer to the corresponding steps of the disease name standardization method described in FIG. 1B.

It can be seen that the disease name standardization method described in the embodiment of this application obtains a preset ICD standard disease name set, the preset ICD standard disease name set includes multiple preset ICD standard disease names, and the history is extracted from the historical disease case database Diagnose text information, perform data cleaning on historical diagnosis text information to obtain a historical disease name set, perform data processing on the historical disease name set and the preset ICD standard disease name set, obtain the target dictionary, obtain the current diagnosis text, based on the target dictionary, The current diagnosis text is word-cut to obtain the name of the disease to be standardized in the current diagnosis text. Based on the preset ICD standard disease name set, the target dictionary tree is constructed. Based on the target dictionary tree, the name of the disease to be standardized and the preset ICD standard disease are combined Multiple preset ICD standard disease names in the name set are matched to obtain multiple first matching degrees. When there is a target first matching degree that meets the preset conditions among the multiple first matching degrees, the target first matching degree is obtained The corresponding target preset ICD standard disease name, and the target preset ICD standard disease name is determined as the conversion result of the disease name to be standardized. In this way, the historical diagnosis text in the historical disease database can be processed through a series of processing to obtain a more practical application Expand the target dictionary and perform word cutting operations on the current diagnosis text through the target dictionary to reduce the problems of colloquialization, typos, omissions, abbreviations, etc. in the current diagnosis text. In addition, based on the preset ICD standard disease name set The constructed target dictionary tree matches multiple preset ICD standard disease names with names to be standardized to obtain a conversion structure, which is beneficial to improve conversion efficiency and accuracy.

Consistent with the above, please refer to FIG. 3. FIG. 3 is an exemplary flowchart of a disease name standardization method disclosed in an embodiment of the present application, which is applied to an electronic device. The disease name standardization method may include the following steps:

301. Obtain a target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes multiple preset ICD standard disease names;

302. Based on the target dictionary, perform a word cutting operation on the current diagnosis text to obtain the name of the disease to be standardized contained in the current diagnosis text.

303. Based on the preset ICD standard disease name set, determine the first sequence corresponding to each preset ICD standard disease name in the multiple preset ICD standard disease names in the preset ICD standard disease name set, and obtain multiple first sequences. A sequence, wherein each first sequence includes at least one character.

304. Obtain a preset dictionary tree, where the preset dictionary tree includes multiple nodes.

305. Traverse the multiple first sequences, and match each of the first sequences with multiple nodes corresponding to the preset dictionary tree to obtain multiple third matching degrees.

306. Calculate the average value of the multiple third matching degrees.

307. If the average value is greater than the second preset threshold, the preset dictionary tree is not updated, and the preset dictionary tree is used as the target dictionary tree.

308. If the average value is less than or equal to the second preset threshold, update the preset dictionary tree to obtain the target dictionary tree.

309. Based on the target dictionary tree, match the name of the disease to be standardized with the multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees.

310. When there is a target first matching degree that meets a preset condition among the plurality of first matching degrees, obtain the target preset ICD standard disease name corresponding to the target first matching degree, and preset the target ICD The standard disease name is determined as the conversion result of the disease name to be standardized.

For the disease name standardization method described in the above steps 301 to 310, refer to the corresponding steps of the disease name standardization method described in FIG. 1B.

It can be seen that in the disease name standardization method described in the embodiment of the application, the electronic device can obtain the target dictionary, the current diagnosis text, and the preset ICD standard disease name set. The preset ICD standard disease name set includes multiple preset ICD standard diseases. Name, based on the target dictionary, perform word cutting operations on the current diagnosis text to obtain the name of the disease to be standardized in the current diagnosis text, and determine the preset ICD standard disease name set based on the preset ICD standard disease name set. Multiple preset ICD standards In the disease name, the first sequence corresponding to each preset ICD standard disease name is obtained, and multiple first sequences are obtained, where each first sequence includes at least one character, and the preset dictionary tree is obtained. The preset dictionary tree includes multiple Nodes, traverse multiple first sequences, match each first sequence with multiple nodes corresponding to the preset dictionary tree to obtain multiple third matching degrees, and calculate the average value of the multiple third matching degrees, if the average value is greater than The second preset threshold, the preset dictionary tree is not updated, and the preset dictionary tree is used as the target dictionary tree. If the mean value is less than or equal to the second preset threshold, the preset dictionary tree is updated to obtain the target dictionary tree, based on the target dictionary Tree, match the name of the disease to be standardized with multiple preset ICD standard disease names in the set of preset ICD standard disease names to obtain multiple first matching degrees, when there are targets that meet the preset conditions in the multiple first matching degrees For the first matching degree, the target preset ICD standard disease name corresponding to the first matching degree of the target is obtained, and the target preset ICD standard disease name is determined as the conversion result of the disease name to be standardized. In this way, the current diagnosis can be made through the target dictionary The text is word-cut operation to reduce the colloquialization, typos, omissions, abbreviations and other problems in the current diagnosis text, and the target dictionary tree is obtained by processing the preset ICD standard disease name set under the international standard. The target dictionary tree is used to process the name of the disease to be standardized to obtain the conversion result of the disease name to be standardized, which is beneficial to improve the conversion accuracy.

Consistent with the above, please refer to FIG. 4. FIG. 4 is a schematic structural diagram of an electronic device provided by an embodiment of the application. As shown in FIG. 4, it includes a processor, a communication interface, a memory, and one or more programs. The processor, the communication interface, and the memory are connected to each other, where the memory is used to store a computer program, the computer program includes program instructions, the processor is configured to call the program instructions, and the one or more programs include Instructions to perform the following steps:

Acquire a target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes a plurality of preset ICD standard disease names; based on the target dictionary, perform word segmentation on the current diagnosis text Operation, the name of the disease to be standardized contained in the current diagnosis text is obtained; based on the preset ICD standard disease name set, a target dictionary tree is constructed; based on the target dictionary tree, the name of the disease to be standardized and the predicted disease name It is assumed that the multiple preset ICD standard disease names in the ICD standard disease name set are matched to obtain multiple first matching degrees; when there is a target first matching degree that meets the preset condition among the multiple first matching degrees Obtain the target preset ICD standard disease name corresponding to the first degree of matching of the target, and determine the target preset ICD standard disease name as the conversion result of the disease name to be standardized.

It can be seen that the electronic device described in the embodiment of the application can obtain the target dictionary, the current diagnosis text, and the preset ICD standard disease name set. The preset ICD standard disease name set includes multiple preset ICD standard diseases. Name, based on the target dictionary, perform word cutting operations on the current diagnosis text to get the name of the disease to be standardized in the current diagnosis text, build a target dictionary tree based on the preset ICD standard disease name set, and based on the target dictionary tree, the disease to be standardized The name is matched with multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees. When there is a target first matching degree that meets the preset conditions among the multiple first matching degrees, Obtain the target preset ICD standard disease name corresponding to the first matching degree of the target, and determine the target preset ICD standard disease name as the conversion result of the disease name to be standardized. In this way, the current diagnosis text can be cut through the target dictionary. In order to reduce the problems of colloquialization, typos, omissions, abbreviations, etc. in the current diagnosis text, in addition, a target dictionary tree constructed based on the set of preset ICD standard disease names is to compare multiple preset ICD standard disease names with names to be standardized Matching to obtain the conversion structure is beneficial to improve the conversion efficiency and accuracy.

In a possible example, before the acquisition of the target dictionary, the above-mentioned processor is specifically further configured to: extract historical diagnosis text information from the historical disease case database; perform data cleaning on the historical diagnosis text information to obtain historical disease Name set; data processing is performed on the historical disease name set and the preset ICD standard disease name set to obtain the target dictionary.

In a possible example, in terms of performing data cleaning on the historical diagnosis text information to obtain a set of historical disease names, the processor is specifically configured to: obtain multiple preset regular expressions for multiple preset disease names Formula, wherein each preset disease name corresponds to a preset regular expression; the historical diagnosis text information is matched with each preset regular expression of the plurality of preset regular expressions to obtain multiple A second matching degree, each of the preset regular expressions corresponds to a second matching degree; determining at least one preset corresponding to at least one second matching degree that exceeds the first preset threshold among the plurality of second matching degrees Set a disease name, and use the at least one preset disease name as the disease name set.

In a possible example, in the aspect of performing data processing on the historical disease name set and the preset ICD standard disease name set to obtain a target dictionary, the processor is specifically configured to: combine the disease name set with The preset ICD standard disease name sets are merged to obtain a first dictionary, and the first dictionary includes multiple first disease names; the multiple first disease names are deduplicated to obtain the target dictionary.

In a possible example, in terms of constructing a target dictionary tree based on the preset ICD standard disease name set, the processor is specifically configured to: determine the preset based on the preset ICD standard disease name set The first sequence corresponding to each preset ICD standard disease name in the multiple preset ICD standard disease names in the ICD standard disease name set is obtained, and multiple first sequences are obtained, wherein each first sequence includes at least one character; Suppose a dictionary tree, the preset dictionary tree includes multiple nodes; traverse the multiple first sequences, and match each of the first sequences with multiple nodes corresponding to the preset dictionary tree to obtain multiple A third matching degree; calculating an average value of the plurality of third matching degrees; if the average value is greater than a second preset threshold, the preset dictionary tree is not updated, and the preset dictionary tree is used as the target Dictionary tree; if the average value is less than or equal to the second preset threshold, update the preset dictionary tree to obtain the target dictionary tree.

In a possible example, in the aspect of updating the preset dictionary tree to obtain a target dictionary tree, the above-mentioned processor is specifically configured to: based on the preset dictionary tree, determine the number of items in the preset dictionary tree. The initial mapping corresponding to each node; if the character i corresponding to the first sequence i does not exist in the initial mapping, a new mapping i is added, and the character i is stored in the mapping i, where the first sequence i A sequence i is any one of the plurality of first sequences, the character i is any character in the first sequence, and i is a positive integer; based on the mapping i, the preset dictionary tree is updated as all The target dictionary tree.

In a possible example, in the aspect of matching the name of the disease to be standardized with the multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees, The above-mentioned processor is specifically configured to determine a target mapping table corresponding to the target dictionary tree, a plurality of preset paths in the target mapping table, and each disease name in the target dictionary tree corresponds to a preset path; based on the The target dictionary tree starts with character j and searches downwards in turn according to the head node corresponding to the target dictionary tree to obtain the path j to be matched and the preset path j corresponding to the path to be matched in the target mapping table, where The character j is the first character in the name of the disease to be standardized; the path j to be matched is matched with the preset path j to obtain the first matching degree j of the character j.

The foregoing mainly introduces the solution of the embodiment of the present application from the perspective of the execution process on the method side. It can be understood that, in order to implement the above-mentioned functions, an electronic device includes hardware structures and/or software modules corresponding to each function. Those skilled in the art should easily realize that in combination with the units and algorithm steps of the examples described in the embodiments provided herein, this application can be implemented in the form of hardware or a combination of hardware and computer software. Whether a certain function is executed by hardware or computer software-driven hardware depends on the specific application and design constraint conditions of the technical solution. Professionals and technicians can use different methods for each specific application to implement the described functions, but such implementation should not be considered beyond the scope of this application.

Consistent with the above, please refer to FIG. 5. FIG. 5 is a schematic structural diagram of a disease name standardization device disclosed in an embodiment of the present application, which is applied to electronic equipment. The device includes: an acquisition unit 501, a word segmentation unit 502, and a construction unit 503 , The matching unit 504 and the determining unit 505, wherein:

The acquiring unit 501 is configured to acquire a target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes multiple preset ICD standard disease names;

The word segmentation unit 502 is configured to perform a word segmentation operation on the current diagnosis text based on the target dictionary to obtain the name of the disease to be standardized contained in the current diagnosis text;

The construction unit 503 is configured to construct a target dictionary tree based on the preset ICD standard disease name set;

The matching unit 504 is configured to match the name of the disease to be standardized with the plurality of preset ICD standard disease names in the preset ICD standard disease name set based on the target dictionary tree to obtain a plurality of first A degree of match

The determining unit 505 is configured to obtain the target preset ICD standard disease name corresponding to the target first matching degree when there is a target first matching degree that meets a preset condition among the plurality of first matching degrees, and The target preset ICD standard disease name is determined as the conversion result of the disease name to be standardized.

In a possible example, in terms of constructing a target dictionary tree based on the preset ICD standard disease name set, the construction unit 503 is specifically configured to: determine the preset ICD standard disease name set based on the preset ICD standard disease name set. Suppose the first sequence corresponding to each preset ICD standard disease name in the multiple preset ICD standard disease names in the ICD standard disease name set, obtain multiple first sequences, wherein each first sequence includes at least one character; obtain A preset dictionary tree, the preset dictionary tree includes multiple nodes; the multiple first sequences are traversed, and each of the first sequences is matched with multiple nodes corresponding to the preset dictionary tree to obtain A plurality of third matching degrees; calculating an average value of the plurality of third matching degrees; if the average value is greater than a second preset threshold, the preset dictionary tree is not updated, and the preset dictionary tree is used as the Target dictionary tree; if the average value is less than or equal to the second preset threshold, update the preset dictionary tree to obtain the target dictionary tree.

In a possible example, in terms of updating the preset dictionary tree to obtain a target dictionary tree, the construction unit 503 is specifically further configured to: based on the preset dictionary tree, determine what is in the preset dictionary tree. The initial mapping corresponding to the multiple nodes; if the character i corresponding to the first sequence i does not exist in the initial mapping, a new mapping i is added, and the character i is stored in the mapping i, where all The first sequence i is any one of the multiple first sequences, the character i is any character in the first sequence, and i is a positive integer; based on the mapping i, the preset dictionary tree is updated Is the target dictionary tree.

In a possible example, in the aspect of matching the name of the disease to be standardized with the multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees, The matching unit 504 is specifically configured to determine a target mapping table corresponding to the target dictionary tree, a plurality of preset paths in the target mapping table, and each disease name in the target dictionary tree corresponds to a preset path; The target dictionary tree starts from character j, and searches downwards in order according to the head node corresponding to the target dictionary tree to obtain the path j to be matched and the preset path j corresponding to the path to be matched in the target mapping table, Wherein, the character j is the first character in the name of the disease to be standardized; the path j to be matched is matched with the preset path j to obtain the first matching degree j of the character j.

An embodiment of the present application further provides a computer-readable storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and when the program instructions are executed by a processor, the processor executes the following steps: obtain A target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes a plurality of preset ICD standard disease names; based on the target dictionary, the current diagnosis text is word-cut , Obtain the name of the disease to be standardized contained in the current diagnosis text; construct a target dictionary tree based on the preset ICD standard disease name set; based on the target dictionary tree, combine the name of the disease to be standardized with the preset The multiple preset ICD standard disease names in the ICD standard disease name set are matched to obtain multiple first matching degrees; when there is a target first matching degree that meets the preset condition among the multiple first matching degrees, The target preset ICD standard disease name corresponding to the first matching degree of the target is acquired, and the target preset ICD standard disease name is determined as the conversion result of the disease name to be standardized.

In a possible example, the processor is also used to extract historical diagnosis text information from a historical disease case database; perform data cleaning on the historical diagnosis text information to obtain a set of historical disease names; Data processing is performed on the set and the preset ICD standard disease name set to obtain the target dictionary.

In a possible example, the processor is also used to obtain multiple preset regular expressions for multiple preset disease names, where each preset disease name corresponds to a preset regular expression; The historical diagnosis text information is matched with each of the plurality of preset regular expressions to obtain a plurality of second matching degrees, and each of the preset regular expressions corresponds to a second match Degree; determining at least one preset disease name corresponding to at least one second matching degree that exceeds a first preset threshold among the plurality of second matching degrees, and using the at least one preset disease name as the disease name set .

In a possible example, the processor is further configured to merge the disease name set and the preset ICD standard disease name set to obtain a first dictionary, and the first dictionary includes multiple first Disease name; deduplicate the multiple first disease names to obtain the target dictionary.

In a possible example, the processor is further configured to determine each preset ICD in the preset ICD standard disease name set based on the preset ICD standard disease name set The first sequence corresponding to the standard disease name obtains multiple first sequences, wherein each first sequence includes at least one character; obtains a preset dictionary tree, the preset dictionary tree includes multiple nodes; traverses the Multiple first sequences, matching each of the first sequences with multiple nodes corresponding to the preset dictionary tree to obtain multiple third matching degrees; calculating the mean value of the multiple third matching degrees; if If the average value is greater than the second preset threshold, the preset dictionary tree is not updated, and the preset dictionary tree is used as the target dictionary tree; if the average value is less than or equal to the second preset threshold, then Update the preset dictionary tree to obtain the target dictionary tree.

In a possible example, the processor is further configured to determine the initial mapping corresponding to the multiple nodes in the preset dictionary tree based on the preset dictionary tree; if there are characters corresponding to the first sequence i i does not exist in the initial mapping, then a new mapping i is added, and the character i is stored in the mapping i, where the first sequence i is any one of the multiple first sequences, so The character i is any character in the first sequence, and i is a positive integer; based on the mapping i, the preset dictionary tree is updated to the target dictionary tree.

In a possible example, the processor is further configured to determine a target mapping table corresponding to the target dictionary tree, a plurality of preset paths in the target mapping table, and each disease name in the target dictionary tree corresponds to A preset path; based on the target dictionary tree, starting from character j, and searching downwards in turn according to the head node corresponding to the target dictionary tree, to obtain the path j to be matched and the path to be matched in the target mapping table The corresponding preset path j, wherein the character j is the first character in the name of the disease to be standardized; the path j to be matched is matched with the preset path j to obtain the first character of the character j A matching degree j.

Further, the computer-readable storage medium may mainly include a storage program area and a storage data area, where the storage program area may store an operating system, an application program required by at least one function, etc.; the storage data area may store Data created by the use of nodes, etc.

In addition, the computer-readable storage medium may be non-volatile or volatile.

The blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm. Blockchain, essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block. The blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

The embodiments of the application are described in detail above, and specific examples are used in this article to illustrate the principles and implementation of the application. The descriptions of the above embodiments are only used to help understand the methods and core ideas of the application; at the same time, for Those of ordinary skill in the art, based on the ideas of the application, will have changes in the specific implementation and the scope of application. In summary, the content of this specification should not be construed as limiting the application.

Claims

A method for standardizing disease names, which is applied to electronic equipment, including:

Acquiring a target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes multiple preset ICD standard disease names;

Based on the target dictionary, perform a word cutting operation on the current diagnosis text to obtain the name of the disease to be standardized contained in the current diagnosis text;

Constructing a target dictionary tree based on the preset ICD standard disease name set;

Based on the target dictionary tree, matching the name of the disease to be standardized with the multiple preset ICD standard disease names in the preset ICD standard disease name set to obtain multiple first matching degrees;

When there is a target first matching degree that meets a preset condition among the plurality of first matching degrees, the target preset ICD standard disease name corresponding to the target first matching degree is acquired, and the target is preset to the ICD standard disease The name is determined as the conversion result of the name of the disease to be standardized.
The method according to claim 1, wherein, before the acquiring the target dictionary, the method further comprises:

Extract historical diagnosis text information from the historical disease case database;

Perform data cleaning on the historical diagnosis text information to obtain a set of historical disease names;

Data processing is performed on the historical disease name set and the preset ICD standard disease name set to obtain the target dictionary.
The method according to claim 2, wherein said performing data cleaning on said historical diagnosis text information to obtain a set of historical disease names comprises:

Acquiring multiple preset regular expressions for multiple preset disease names, where each preset disease name corresponds to a preset regular expression;

The historical diagnosis text information is matched with each of the plurality of preset regular expressions to obtain a plurality of second matching degrees, and each of the preset regular expressions corresponds to a first Second match

At least one preset disease name corresponding to at least one second matching degree that exceeds a first preset threshold among the plurality of second matching degrees is determined, and the at least one preset disease name is used as the disease name set.
The method according to claim 2 or 3, wherein the data processing of the historical disease name set and the preset ICD standard disease name set to obtain a target dictionary comprises:

Merging the disease name set and the preset ICD standard disease name set to obtain a first dictionary, and the first dictionary includes multiple first disease names;

The multiple first disease names are deduplicated to obtain the target dictionary.
The method according to claim 1, wherein said constructing a target dictionary tree based on said preset ICD standard disease name set comprises:

Based on the preset ICD standard disease name set, determine the first sequence corresponding to each preset ICD standard disease name in the multiple preset ICD standard disease names in the preset ICD standard disease name set, and obtain multiple first sequences , Wherein each first sequence includes at least one character;

Acquiring a preset dictionary tree, the preset dictionary tree including a plurality of nodes;

Traversing the multiple first sequences, matching each of the first sequences with multiple nodes corresponding to the preset dictionary tree, to obtain multiple third matching degrees;

Calculating the mean value of the plurality of third matching degrees;

If the average value is greater than the second preset threshold, do not update the preset dictionary tree, and use the preset dictionary tree as the target dictionary tree;

If the average value is less than or equal to the second preset threshold, updating the preset dictionary tree to obtain the target dictionary tree.
The method according to claim 5, wherein said updating said preset dictionary tree to obtain a target dictionary tree comprises:

Determine the initial mapping corresponding to the multiple nodes in the preset dictionary tree based on the preset dictionary tree;

If the character i corresponding to the first sequence i does not exist in the initial mapping, a new mapping i is added, and the character i is stored in the mapping i, where the first sequence i is the multiple Any one of the first sequence, where the character i is any one of the characters in the first sequence, and i is a positive integer;

Based on the mapping i, update the preset dictionary tree to the target dictionary tree.
The method according to any one of claims 1 to 6, wherein the name of the disease to be standardized is matched with the multiple preset ICD standard disease names in the set of preset ICD standard disease names to obtain Multiple first match degrees, including:

Determining a target mapping table corresponding to the target dictionary tree, a plurality of preset paths in the target mapping table, and each disease name in the target dictionary tree corresponds to a preset path;

Based on the target dictionary tree, starting from the character j, and sequentially searching downwards according to the head node corresponding to the target dictionary tree, the path j to be matched and the preset path corresponding to the path to be matched in the target mapping table are obtained j, wherein the character j is the first character in the name of the disease to be standardized;

The path j to be matched is matched with the preset path j to obtain the first matching degree j of the character j.
A disease name standardization device, which is applied to electronic equipment, the device includes: an acquisition unit, a word segmentation unit, a construction unit, a matching unit, and a determination unit, wherein,

The acquiring unit is configured to acquire a target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes a plurality of preset ICD standard disease names;

The word segmentation unit is configured to perform a word segmentation operation on the current diagnosis text based on the target dictionary to obtain the name of the disease to be standardized contained in the current diagnosis text;

The construction unit is configured to construct a target dictionary tree based on the preset ICD standard disease name set;

The matching unit is configured to match the name of the disease to be standardized with the plurality of preset ICD standard disease names in the set of preset ICD standard disease names based on the target dictionary tree to obtain a plurality of first suitability;

The determining unit is configured to obtain the target preset ICD standard disease name corresponding to the target first matching degree when there is a target first matching degree that satisfies a preset condition among the plurality of first matching degrees, and The target preset ICD standard disease name is determined as the conversion result of the disease name to be standardized.
An electronic device, including a processor, a memory, a communication interface, and one or more programs, the one or more programs are stored in the memory and configured to be executed by the processor, wherein, The memory is used to store a computer program, the computer program includes program instructions, and the processor is configured to call the program instructions to execute instructions of the following steps:

Acquire a target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes a plurality of preset ICD standard disease names; based on the target dictionary, perform word segmentation on the current diagnosis text Operation, the name of the disease to be standardized contained in the current diagnosis text is obtained; based on the preset ICD standard disease name set, a target dictionary tree is constructed; based on the target dictionary tree, the name of the disease to be standardized and the predicted disease name It is assumed that the multiple preset ICD standard disease names in the ICD standard disease name set are matched to obtain multiple first matching degrees; when there is a target first matching degree that meets the preset condition among the multiple first matching degrees Obtain the target preset ICD standard disease name corresponding to the first degree of matching of the target, and determine the target preset ICD standard disease name as the conversion result of the disease name to be standardized.
The electronic device according to claim 9, wherein, before the acquisition of the target dictionary, the processor is specifically further configured to:

Extract historical diagnosis text information from the historical disease case database; perform data cleaning on the historical diagnosis text information to obtain a historical disease name set; perform data processing on the historical disease name set and the preset ICD standard disease name set, Obtain the target dictionary.
The electronic device according to claim 10, wherein, in the aspect of performing data cleaning on the historical diagnosis text information to obtain a set of historical disease names, the processor is specifically configured to:

Acquire multiple preset regular expressions for multiple preset disease names, where each preset disease name corresponds to a preset regular expression; and compare the historical diagnosis text information with the multiple preset regular expressions respectively Each preset regular expression in the formula is matched to obtain a plurality of second matching degrees, and each of the preset regular expressions corresponds to a second matching degree; it is determined that the plurality of second matching degrees exceeds the first At least one preset disease name corresponding to at least one second matching degree of the preset threshold, and the at least one preset disease name is used as the disease name set.
The electronic device according to claim 10 or 11, wherein, in the aspect of performing data processing on the historical disease name set and the preset ICD standard disease name set to obtain a target dictionary, the processor is specifically configured to :

Combine the disease name set with the preset ICD standard disease name set to obtain a first dictionary, the first dictionary includes multiple first disease names; deduplicate the multiple first disease names , Get the target dictionary.
The electronic device according to claim 9, wherein, in terms of constructing a target dictionary tree based on the preset ICD standard disease name set, the processor is specifically configured to:

Based on the preset ICD standard disease name set, determine the first sequence corresponding to each preset ICD standard disease name in the multiple preset ICD standard disease names in the preset ICD standard disease name set, and obtain multiple first sequences , Wherein each first sequence includes at least one character; obtains a preset dictionary tree, the preset dictionary tree includes a plurality of nodes; traverses the plurality of first sequences, and compares each of the first sequences with Matching multiple nodes corresponding to the preset dictionary tree to obtain multiple third matching degrees; calculating the average value of the multiple third matching degrees; if the average value is greater than the second preset threshold, then not updating the A preset dictionary tree, using the preset dictionary tree as the target dictionary tree; if the average value is less than or equal to the second preset threshold, the preset dictionary tree is updated to obtain the target dictionary tree.
The electronic device according to claim 13, wherein, in the aspect of updating the preset dictionary tree to obtain a target dictionary tree, the processor is specifically configured to:

Based on the preset dictionary tree, determine the initial mapping corresponding to the multiple nodes in the preset dictionary tree; if the character i corresponding to the first sequence i does not exist in the initial mapping, then add a new mapping i. Save the character i in the mapping i, where the first sequence i is any one of the multiple first sequences, and the character i is any character in the first sequence, i Is a positive integer; based on the mapping i, the preset dictionary tree is updated to the target dictionary tree.
The electronic device according to any one of claims 9-14, wherein the name of the disease to be standardized is matched with the plurality of preset ICD standard disease names in the set of preset ICD standard disease names , In terms of obtaining multiple first matching degrees, the processor is specifically configured to:

Determine the target mapping table corresponding to the target dictionary tree, a plurality of preset paths in the target mapping table, and each disease name in the target dictionary tree corresponds to a preset path; based on the target dictionary tree, from the character j Initially, according to the head node corresponding to the target dictionary tree, search downwards in order to obtain the path j to be matched and the preset path j corresponding to the path to be matched in the target mapping table, where the character j is the The first character in the name of the disease to be standardized; the path j to be matched is matched with the preset path j to obtain the first matching degree j of the character j.
A computer-readable storage medium includes a storage data area and a storage program area. The storage data area stores data created according to the use of blockchain nodes, and the storage program area stores a computer program, wherein the computer program includes program instructions When the program instructions are executed by the processor, the processor executes the following steps:

Acquire a target dictionary, a current diagnosis text, and a preset ICD standard disease name set, where the preset ICD standard disease name set includes a plurality of preset ICD standard disease names; based on the target dictionary, perform word segmentation on the current diagnosis text Operation, the name of the disease to be standardized contained in the current diagnosis text is obtained; based on the preset ICD standard disease name set, a target dictionary tree is constructed; based on the target dictionary tree, the name of the disease to be standardized is compared with the predicted disease name. It is assumed that the multiple preset ICD standard disease names in the ICD standard disease name set are matched to obtain multiple first matching degrees; when there is a target first matching degree that meets the preset condition among the multiple first matching degrees Obtain the target preset ICD standard disease name corresponding to the first degree of matching of the target, and determine the target preset ICD standard disease name as the conversion result of the disease name to be standardized.
The medium according to claim 16, wherein the processor is further configured to extract historical diagnosis text information from a historical disease case database; perform data cleaning on the historical diagnosis text information to obtain a set of historical disease names; Data processing is performed on the historical disease name set and the preset ICD standard disease name set to obtain the target dictionary.
The medium according to claim 17, wherein the processor is further configured to obtain a plurality of preset regular expressions for a plurality of preset disease names, wherein each preset disease name corresponds to a preset regular expression式; The historical diagnosis text information is matched with each of the plurality of preset regular expressions to obtain a plurality of second matching degrees, and each of the preset regular expressions corresponds to A second matching degree; determining at least one preset disease name corresponding to at least one second matching degree that exceeds the first preset threshold among the plurality of second matching degrees, and using the at least one preset disease name as all State the set of disease names.
The medium according to claim 17 or 18, wherein the processor is specifically further configured to merge the disease name set and the preset ICD standard disease name set to obtain a first dictionary, and the first The dictionary includes multiple first disease names; the multiple first disease names are deduplicated to obtain the target dictionary.
The medium according to claim 16, wherein the processor is further configured to determine each of the plurality of preset ICD standard disease names in the preset ICD standard disease name set based on the preset ICD standard disease name set A first sequence corresponding to a preset ICD standard disease name is obtained, and a plurality of first sequences are obtained, wherein each first sequence includes at least one character; a preset dictionary tree is obtained, and the preset dictionary tree includes a plurality of nodes Traverse the multiple first sequences, and match each of the first sequences with multiple nodes corresponding to the preset dictionary tree to obtain multiple third matching degrees; calculating the multiple third matching degrees If the average value is greater than the second preset threshold, the preset dictionary tree is not updated, and the preset dictionary tree is used as the target dictionary tree; if the average value is less than or equal to the second preset threshold If the threshold is set, the preset dictionary tree is updated to obtain the target dictionary tree.