CN112908487A

CN112908487A - Automatic identification method and system for clinical guideline update content

Info

Publication number: CN112908487A
Application number: CN202110418664.XA
Authority: CN
Inventors: 吴思竹; 崔佳伟; 修晓蕾; 钱庆
Original assignee: Institute of Medical Information CAMS
Current assignee: Institute of Medical Information CAMS
Priority date: 2021-04-19
Filing date: 2021-04-19
Publication date: 2021-06-04
Anticipated expiration: 2041-04-19
Also published as: CN112908487B

Abstract

The invention provides an automatic identification method and system for updated contents of clinical guidelines, wherein the method comprises the following steps: analyzing and structurally extracting a first clinical guideline and a second clinical guideline respectively according to a module hierarchical structure tree established by using each grade title of the clinical guideline in advance to obtain at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline; the difference information between the first guideline module and the second guideline module is determined, corresponding labels are marked at the positions corresponding to the difference information in the first clinical guideline and the second clinical guideline respectively, the difference and the change between different clinical guidelines are found without manually consulting the two clinical guidelines needing to be compared, and the efficiency and the accuracy of determining the difference and the change between different clinical guidelines are improved.

Description

Automatic identification method and system for clinical guideline update content

Technical Field

The invention relates to the technical field of data processing, in particular to an automatic identification method and system for updated contents of a clinical guideline.

Background

With the expansion of clinical research scope (such as tumor clinical research scope) and the innovation of clinical diagnosis and treatment technology, new medical evidence is continuously iterated, and this situation accelerates the update frequency of clinical guidelines.

At present, more new versions of clinical guidelines do not give updated descriptions, and even if the updated descriptions are given, the knowledge difference and the accurate position of the knowledge difference between the new and old versions of clinical guidelines cannot be intuitively given in the updated descriptions. The clinician is required to manually consult the new and old clinical guidelines to find the differences and changes between the different clinical guidelines, but this method requires a lot of manpower and time, and the manual consultation is easy to miss the examination, so the efficiency and accuracy of determining the differences and changes between the different clinical guidelines are low.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and a system for automatically identifying updated content of a clinical guideline, so as to solve the problems of low efficiency and low accuracy in the current method for determining differences and changes between different clinical guidelines.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

the embodiment of the invention discloses a method for automatically identifying updated contents of a clinical guideline in a first aspect, which comprises the following steps:

according to a module hierarchical structure tree which is established by utilizing all levels of titles of clinical guidelines in advance, analyzing and structurally extracting a first clinical guideline and a second clinical guideline respectively to obtain at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, wherein the first guideline module is text content contained in the minimum level title in the first clinical guideline, and the second guideline module is text content contained in the minimum level title in the second clinical guideline;

if the first clinical guideline and the second clinical guideline belong to the same source and when an updated description exists in the first clinical guideline relative to the second clinical guideline, determining first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline, and labeling corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline respectively, wherein the labels are added labels, deleted labels or modified labels;

if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source and the first clinical guideline does not have the updated description, matching the first guideline module and the second guideline module, respectively using the matched first guideline module and the matched second guideline module as a first to-be-processed guideline module and a second to-be-processed guideline module, and using the first guideline module which is not matched with all the second guideline modules as a third to-be-processed guideline module;

according to the first to-be-processed guideline module and the second to-be-processed guideline module, determining second difference information between the first clinical guideline and the second clinical guideline, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively;

and according to the third to-be-processed guideline module and all the second guideline modules, determining third difference information between the first clinical guideline and the second clinical guideline, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively.

Preferably, the process of matching the first guideline module and the second guideline module includes:

for each first guide module, determining the title similarity between the title of the first guide module and the title of each second guide module by using a preset deep semantic matching model;

for each first guidance module, if all the title similarities are smaller than a title similarity threshold, determining that the first guidance module is not matched with all the second guidance modules, and if at least one title similarity is larger than or equal to the title similarity threshold, determining that the first guidance module is matched with the second guidance module corresponding to the maximum title similarity.

Preferably, the determining second difference information between the first clinical guideline and the second clinical guideline according to the first to-be-processed guideline module and the second to-be-processed guideline module, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, respectively, includes:

performing sentence splitting processing on the first to-be-processed guide module and the second to-be-processed guide module respectively to obtain a plurality of first sentences corresponding to the first to-be-processed guide module and a plurality of second sentences corresponding to the second to-be-processed guide module;

for the mth first sentence of the first to-be-processed guide module, calculating sentence similarity between the mth first sentence and H second sentences of the second to-be-processed guide module, wherein m is an integer which is greater than or equal to 1 and less than or equal to x, x is the total number of the first sentences contained in the first to-be-processed guide module, m starts from 1 and is increased by 1, H is an integer which is greater than or equal to 1 and less than or equal to y, and y is the total number of the second sentences contained in the second to-be-processed guide module;

if the sentence similarity between the mth first sentence and the nth second sentence is equal to 1, determining that the mth first sentence and the nth second sentence are the same, and not executing labeling processing, wherein n is an integer which is greater than or equal to 1 and less than or equal to y;

if the sentence similarity between the mth first sentence and the nth second sentence is greater than or equal to a sentence similarity threshold and less than 1, marking a modification label at a position corresponding to the mth first sentence in the first clinical guideline and marking a modification label at a position corresponding to the nth second sentence in the second clinical guideline, when n is greater than m, determining that the sentence similarity between the second to-be-processed guideline module and the first m first sentences is less than the sentence similarity threshold and a third sentence which is not marked is located in the second clinical guideline, and marking a deletion label at a position corresponding to the third sentence in the second clinical guideline;

and if the sentence similarity between the mth first sentence and the H second sentences is smaller than the sentence similarity threshold value, marking a newly added label at a position corresponding to the mth first sentence in the first clinical guideline.

Preferably, the determining, according to the third to-be-processed guideline module and all the second guideline modules, third difference information between the first clinical guideline and the second clinical guideline, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline, respectively, includes:

calculating the first sentence similarity between the first P% content of the first sentence in the third guide module to be processed and a plurality of second sentences of each second guide module;

if at least one of the first sentence similarity degrees is greater than a first sentence similarity degree threshold value, determining that the third guide module to be processed is matched with the second guide module corresponding to the maximum first sentence similarity degree;

changing the existing tags in the first clinical guideline located after it to modified tags starting at the first P% content of the first sentence of the third pending guideline module;

and changing the existing label in the second clinical guideline after the second sentence from the second sentence corresponding to the maximum first sentence similarity in the second guideline module matched with the third module to be processed into a modification label.

Preferably, after the parsing and the structured extracting of the first clinical guideline and the second clinical guideline respectively are performed, and at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline are obtained, the method further includes:

and preprocessing the first guide module and the second guide module respectively, and extracting the preprocessed knowledge characteristics in the first guide module and the second guide module respectively.

Preferably, after labeling a modification tag at a position in the first clinical guideline corresponding to the mth first sentence and labeling a modification tag at a position in the second clinical guideline corresponding to the nth second sentence, the method further comprises:

comparing the difference between the knowledge characteristics in the mth first sentence and the nth second sentence to obtain knowledge characteristic difference information;

labeling corresponding labels at positions corresponding to the knowledge characteristic difference information in the mth first sentence and the nth second sentence respectively.

Preferably, the method further comprises the following steps:

and respectively displaying the labels of different categories by using different display forms.

Preferably, after parsing and structurally extracting the first clinical guideline and the second clinical guideline respectively and obtaining at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, the method further includes:

and carrying out normalization processing on the first guide module and the second guide module.

storing the first guide module and the second guide module into a database, storing the hierarchical relationship among all the first guide modules into the database, and storing the hierarchical relationship among all the second guide modules into the database.

The second aspect of the embodiment of the present invention discloses an automatic identification system for clinical guideline update content, the system comprising:

the analysis unit is used for respectively analyzing and structurally extracting a first clinical guideline and a second clinical guideline according to a module hierarchical structure tree which is established by utilizing each grade title of the clinical guideline in advance, and at least obtaining a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, wherein the first guideline module is text content contained in the smallest grade title in the first clinical guideline, and the second guideline module is text content contained in the smallest grade title in the second clinical guideline;

a first processing unit, configured to, if the first clinical guideline and the second clinical guideline belong to the same source and there is an updated description of the first clinical guideline with respect to the second clinical guideline, determine first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline, and label corresponding labels at positions in the first clinical guideline and the second clinical guideline corresponding to the first difference information, where the labels are an added label, a deleted label, or a modified label;

a second processing unit, configured to, if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source and the first clinical guideline does not have the updated description, match the first guideline module and the second guideline module, respectively use the matched first guideline module and the matched second guideline module as a first to-be-processed guideline module and a second to-be-processed guideline module, and use the first guideline module that is not matched with all the second guideline modules as a third to-be-processed guideline module;

a third processing unit, configured to determine second difference information between the first clinical guideline and the second clinical guideline according to the first to-be-processed guideline module and the second to-be-processed guideline module, and label corresponding labels at positions in the first clinical guideline and the second clinical guideline, which correspond to the second difference information, respectively;

and the fourth processing unit is used for determining third difference information between the first clinical guideline and the second clinical guideline according to the third module to be processed and all the second guideline modules, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively.

Based on the above automatic identification method and system for clinical guideline update content provided by the embodiment of the invention, the method is as follows: analyzing and structurally extracting a first clinical guideline and a second clinical guideline respectively according to a module hierarchical structure tree established by using each grade title of the clinical guideline in advance to obtain at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline; if the first clinical guideline and the second clinical guideline belong to the same source and when an updated description exists in the first clinical guideline relative to the second clinical guideline, determining first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline, and labeling corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline respectively; if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source and the first clinical guideline does not have an updated description, matching the first guideline module and the second guideline module, respectively using the matched first guideline module and the matched second guideline module as a first module to be processed and a second module to be processed, and using the first guideline module which is not matched with all the second guideline modules as a third module to be processed; according to the first to-be-processed guideline module and the second to-be-processed guideline module, determining second difference information between the first clinical guideline and the second clinical guideline, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively; and according to the third to-be-processed guideline module and all second guideline modules, determining third difference information between the first clinical guideline and the second clinical guideline, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively. The difference and the change between different clinical guidelines are found without manually consulting the two clinical guidelines which need to be compared, and the efficiency and the accuracy of determining the difference and the change between different clinical guidelines are improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method for automatically identifying clinical guideline update content according to an embodiment of the invention;

FIG. 2 is a flow chart of tagging in a first clinical guideline and a second clinical guideline as provided by an embodiment of the invention;

FIG. 3 is another flow chart of a method for automatically identifying clinical guideline updates according to embodiments of the invention;

FIG. 4 is a schematic diagram of a summary of the main points of the clinical guidelines section of EAU renal cell carcinoma provided in accordance with an embodiment of the present invention;

FIG. 5 is a diagram illustrating an embodiment of a label update tag;

fig. 6 is a schematic diagram of a tag corresponding to labeled knowledge characteristic difference information according to an embodiment of the present invention;

fig. 7 is another schematic diagram of a tag corresponding to labeled knowledge characteristic difference information according to an embodiment of the present invention;

FIG. 8 is a further diagram of a tag corresponding to labeled knowledge characteristic difference information according to an embodiment of the present invention;

FIG. 9 is a schematic diagram illustrating an update timing sequence according to an embodiment of the present invention;

fig. 10 is a block diagram of an automatic identification system for clinical guideline update content according to an embodiment of the invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As can be seen from the background art, currently, when comparing differences and changes of different clinical guidelines, a clinician generally needs to refer to the different clinical guidelines to find the differences and changes of the different clinical guidelines, but the manual comparison method needs to consume a lot of manpower and material resources, and is prone to miss-checking during the comparison process, which results in low efficiency and accuracy in comparing the different clinical guidelines.

Therefore, the embodiment of the invention provides an automatic identification method and system for clinical guideline update content, wherein a first clinical guideline and a second clinical guideline are analyzed and structurally extracted respectively according to a module hierarchical structure tree established by using each grade title of the clinical guideline in advance, and at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline are obtained; the difference information between the first guideline module and the second guideline module is determined, corresponding labels are marked at positions corresponding to the difference information in the first clinical guideline and the second clinical guideline respectively, and the difference and the change between different clinical guidelines are found without manually consulting the two clinical guidelines needing to be compared, so that the efficiency and the accuracy of determining the difference and the change between different clinical guidelines are improved.

It should be noted that there are multiple levels of headings in the clinical guideline, and the guideline modules (e.g., the first guideline module and the second guideline module referred to below) in the embodiments of the present invention specifically refer to: textual content contained under a minimal level heading in the clinical guideline. As can be seen from the foregoing, each tutorial module also has a corresponding heading.

It is understood that the clinical guideline (e.g., tumor clinical guideline, etc.) involved in the embodiments of the present invention is a guidance document about a specific healthcare field, such as a medical guideline or a clinical practice guideline.

In embodiments of the present invention, it is desirable to compare the difference between the first clinical guideline and the second clinical guideline, where the first clinical guideline may be referred to as a matching clinical guideline and the second clinical guideline may be referred to as a to-be-matched clinical guideline.

Referring to fig. 1, a flowchart of an automatic identification method for updating content of a clinical guideline according to an embodiment of the present invention is shown, where the automatic identification method includes:

step S101: and analyzing and structurally extracting the first clinical guideline and the second clinical guideline respectively according to a module hierarchical structure tree which is established by using the titles of all levels of the clinical guideline in advance to obtain at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline.

It should be noted that the first guideline module is the text content contained in the minimum-level heading in the first clinical guideline, and the second guideline module is the text content contained in the minimum-level heading in the second clinical guideline.

In the specific implementation process of step S101, a module hierarchical structure tree is constructed in advance according to each level title of the clinical guideline, and a mapping rule between the module hierarchical structure tree and each guideline module in the clinical guideline is formulated, so as to implement guideline module mapping based on the module hierarchical structure tree.

Analyzing and structurally extracting an unstructured first clinical guideline by using a module hierarchical structure tree to at least obtain a first guideline module corresponding to the first clinical guideline, and analyzing and structurally extracting an unstructured second clinical guideline by using the module hierarchical structure tree to at least obtain a second guideline module corresponding to the second clinical guideline.

It will be appreciated that parsing and structured extraction of the first clinical guideline results in data in a specified format (e.g., CSV format), where the column data of the data in the specified format is the module name of the first guideline module, and the row data of the data in the specified format is the module content of the first guideline module. The same is true. And after the second clinical guideline is analyzed and structurally extracted, data in a specified format is also obtained, wherein the column data is the module name of the second guideline module, and the row data is the module content of the second guideline module.

Preferably, after the first guidance module and the second guidance module are extracted, the first guidance module (i.e., the data in the specified format mentioned above) and the second guidance module are stored in a database (e.g., a relational database), and the hierarchical relationship between all the first guidance modules is stored in the database, and the hierarchical relationship between all the second guidance modules is stored in the database. It is understood that the hierarchical relationship between all the first guideline modules specifically refers to: the hierarchical relationship among the module names of all the first guideline modules, and the hierarchical relationship among all the second guideline modules specifically means: the hierarchical relationship between the module names of all second southbound modules.

That is, the database stores two parts of data, one part is the guide module (for example, the aforementioned CSV format data), the other part is the module name of the guide module, and the hierarchical relationship between the module names is the aforementioned module hierarchical structure tree. In the database, the guideline modules and the structures between them (i.e., the structures between guideline modules) are automatically mapped according to the hierarchical relationship of the module names (i.e., the module hierarchical structure tree).

It should be noted that, irregular contents may exist in module contents corresponding to the extracted first guidance module and second guidance module, and therefore, it is preferable that the first guidance module and the second guidance module are normalized.

The specific way of normalization is (the first guideline module and the second guideline module are represented here by the guideline module): because more reference citation identifications are inserted into the module content (namely text content) corresponding to the guide module, all the reference citation identifications in the guide module are removed by formulating a regular expression matching rule, and meanwhile, the guide module is subjected to noise reduction processing such as text cleaning, word deactivation, word drying, English number conversion into Arabic numbers, abbreviation full-name expansion and the like, so that the module content corresponding to the guide module is converted into a normalized text.

And performing related processing of the following steps by using the first guidance module and the second guidance module after the normalization processing, wherein the content of the related processing is detailed in the content shown in the following steps.

Step S102: if the first clinical guideline and the second clinical guideline belong to the same source and when the first clinical guideline has updated description relative to the second clinical guideline, determining first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline, and labeling corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline, respectively.

It should be noted that the label is a new label, a delete label, or a modify label, where the new label is a label indicating new content of the first clinical guideline compared with the second clinical guideline, that is, the new content only exists in the first clinical guideline, and the new content does not exist in the second clinical guideline; the delete label is a label indicating deleted content of the first clinical guideline as compared to the second clinical guideline, i.e., the deleted content is only present in the second clinical guideline and is not present in the first clinical guideline; the modification label is a label indicating modified content of the first clinical guideline as compared to the second clinical guideline, i.e., the modified content is similar but not identical content in the first clinical guideline and the second clinical guideline.

It should be noted that the update description attached to the clinical guideline indicates which part of the content of which page in the clinical guideline has been changed, or the PDF is attached with a hyperlink for updating the content, and clicking the hyperlink can locate the updated part.

It is understood that the first clinical guideline and the second clinical guideline may be affiliated with different sources or may be affiliated with the same source. When the first clinical guideline and the second clinical guideline belong to the same source, the first clinical guideline is an updated version compared to the second clinical guideline, at which time the updated description is used to determine difference information between the first clinical guideline and the second clinical guideline if the first clinical guideline has an updated description (relative to the updated description of the second clinical guideline), and the comparison of the first guideline module and the second guideline module is required to determine difference information between the first clinical guideline and the second clinical guideline if the first clinical guideline does not have an updated description.

Also, if the first clinical guideline and the second clinical guideline are affiliated with different sources, the first guideline module and the second guideline module may also need to be compared to determine difference information between the first clinical guideline and the second clinical guideline.

Such as: assume that there are two sources of renal tumor guidelines, one written for NCCN (i.e., the source of the clinical guideline) with updated specifications attached to the renal tumor guideline written by NCCN, and the other written for EAU. If the first clinical guideline is NCCN of year 2020 version 2V 2 and the second clinical guideline is NCCN of year 2020 version 1V 1, i.e. the first clinical guideline and the second clinical guideline belong to the same source and the first clinical guideline is accompanied by an updated description, the updated description is directly used to determine the difference information between the first clinical guideline and the second clinical guideline. If the first clinical guideline is EAU, year 2020, version 2V 2, and the second clinical guideline is EAU, year 2020, version 1V 1, i.e., the first clinical guideline and the second clinical guideline belong to the same source but the first clinical guideline is not accompanied by updated description, the first guideline module and the second guideline module need to be compared to determine difference information between the first clinical guideline and the second clinical guideline. If the first clinical guideline is year 2020, version 2V 2 in EAU and the second clinical guideline is year 2020, version 2V 2 in NCCN, i.e. the first clinical guideline and the second clinical guideline are affiliated from different sources, then the first guideline module and the second guideline module need to be compared to determine difference information between the first clinical guideline and the second clinical guideline.

In the specific implementation of step S102, if the first clinical guideline and the second clinical guideline belong to the same source (in this case, the first clinical guideline is a new clinical guideline with respect to the second clinical guideline), and when there is an updated description of the first clinical guideline with respect to the second clinical guideline, the updated guideline module and the updated sentence between the first clinical guideline and the second clinical guideline are automatically located and identified by using the preset guideline update identification and labeling rule, so as to determine first difference information between the first clinical guideline and the second clinical guideline, and label corresponding labels at positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline, respectively.

If the first difference information is a modified portion of the first clinical guideline relative to the second clinical guideline (indicating a guideline module in which the modification occurred and a sentence in which the modification occurred in the guideline module), then a modification tag is tagged in the first clinical guideline and the second clinical guideline at a location corresponding to the modified portion, respectively.

And if the first difference information is a newly added part of the first clinical guideline relative to the second clinical guideline, marking a newly added label at a position corresponding to the newly added part in the first clinical guideline.

If the first difference information is a deleted portion of the first clinical guideline relative to the second clinical guideline, labeling the deletion label in the second clinical guideline at a location corresponding to the deleted portion.

It should be noted that, the process of setting the guidance update identification and labeling rule is as follows: summarize and generalize the writing rules of the updated description of the clinical guideline, and set guideline update recognition and annotation rules according to the writing rules, such as: in part of the clinical guideline, a sentence with "remove" is used to illustrate deleted contents in the old version of the clinical guideline, a sentence with "modified" is used to illustrate modified contents in the new version of the clinical guideline, or a separate table is used to list newly added contents in the new version of the clinical guideline.

It is understood from the above that, the difference information may indicate a modified portion, an added portion or a deleted portion of the first clinical guideline compared with the second clinical guideline, and preferably, when the corresponding labels are labeled at the positions corresponding to the first difference information in the first clinical guideline and the second clinical guideline, different display forms may be used to display different types of labels, respectively.

That is, the new label, the delete label, and the modify label are displayed in the first clinical guideline and the second clinical guideline, respectively, through different display forms, thereby distinguishing the new label, the delete label, and the modify label.

For example: displaying the newly added label, the deleted label and the modified label in a mode of highlighting the text by different colors, wherein the modified parts in the first clinical guideline and the second clinical guideline are displayed in a mode of highlighting the text in yellow, namely the modified labels are displayed in a mode of highlighting the text in yellow; displaying the newly added part of the first clinical guideline compared with the second clinical guideline in a blue highlighting text mode, namely displaying the newly added label in a blue highlighting text mode; the deleted portion of the second clinical guideline compared to the first clinical guideline is shown with text highlighted in red, i.e., the delete label is shown with text highlighted in red.

The above manners of displaying the added tag, the deleted tag, and the modified tag are only used for illustration, and in practical applications, other different manners may also be used to display the added tag, the deleted tag, and the modified tag.

Step S103: if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source and the first clinical guideline does not have an updated description, the first guideline module and the second guideline module are matched, the matched first guideline module and the matched second guideline module are respectively used as a first module to be processed and a second module to be processed, and the first guideline module which is not matched with all the second guideline modules is used as a third module to be processed.

In the process of implementing step S103, if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source but the first clinical guideline does not have an updated description (compared with the updated description of the second clinical guideline), the first guideline module and the second guideline module are matched, the matched first guideline module and the matched second guideline module are respectively used as the first module to be processed and the second module to be processed, the first guideline module not matched with all the second guideline modules is used as the third module to be processed, and step S104 and step S105 are executed.

In a specific implementation, for each first guidance module, matching the first guidance module with all second guidance modules, if a second guidance module matched with the first guidance module can be determined, using the first guidance module as a first guidance module to be processed, and using the second guidance module matched with the first guidance module (i.e. the first guidance module to be processed) as a second guidance module; and if the first guide module is not matched with all the second guide modules, taking the first guide module as a third guide module to be processed.

Each pair of the first to-be-processed guide module and the second to-be-processed guide module is as follows: a first guideline module and a second guideline module in match.

It should be noted that, as a result of research, the title of the guideline module is a high summary of the module contents of the guideline module, the title of the guideline module is characterized by brevity and high definition, and the title name of each guideline module is mostly not changed greatly with the revision and update of the clinical guideline version.

Therefore, the specific way to match the first guideline module and the second guideline module is: for each first guide module, a title similarity between the title of the first guide module and the title of each second guide module is determined by using a preset deep semantic matching model (DSSM semantic matching model).

For each first guide module, if all the title similarities between the first guide module and all the second guide modules are smaller than a title similarity threshold, determining that the first guide module is not matched with all the second guide modules; and if the title similarity between the first guidance module and at least one second guidance module is greater than or equal to a title similarity threshold value, determining that the first guidance module is matched with the second guidance module corresponding to the maximum title similarity.

That is, if the title similarity between the first guidance module and all the second guidance modules is smaller than the title similarity threshold, the first guidance module is not matched with all the second guidance modules, and the first guidance module is the third guidance module to be processed.

And if the title similarity between the first guide module and one or more second guide modules is larger than or equal to the title similarity threshold, determining that the first guide module is matched with the second guide module corresponding to the maximum title similarity, wherein the first guide module is a first guide module to be processed, and the second guide module corresponding to the maximum title similarity is a second guide module to be processed.

Firstly, determining second difference information between the first clinical guideline and the second clinical guideline by using all the determined first to-be-processed guideline modules and the corresponding second to-be-processed guideline modules, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline; then, a third to-be-processed guideline module is used for determining third difference information between the first clinical guideline and the second clinical guideline, and finally, corresponding labels are marked at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively; the details of the implementation are described in the following.

Step S104: and according to the first to-be-processed guideline module and the second to-be-processed guideline module, determining second difference information between the first clinical guideline and the second clinical guideline, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively.

In the process of implementing step S104 specifically, for each pair of matched first and second to-be-processed guideline modules, using the first and second to-be-processed guideline modules, second difference information between the first and second clinical guidelines is determined, and corresponding labels are labeled at positions in the first and second clinical guidelines corresponding to the second difference information, where the second difference information indicates a difference portion in the matched first and second to-be-processed guideline modules, for example: the first pending guide module is compared to the modified portion of the second pending guide module, the first pending guide module is compared to the added portion of the second pending guide module, and the first pending guide module is compared to the deleted portion of the second pending guide module.

For specific contents of labeling corresponding labels at the positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, reference may be made to the contents in step S102, and details are not repeated here.

It is to be understood that, when determining the second difference information using the matched first and second to-be-processed guide modules, the difference (i.e., the second difference information) between the first to-be-processed guide module and the second to-be-processed guide module is determined in units of sentences. Therefore, it is necessary to perform sentence segmentation on the first to-be-processed guide module and the second to-be-processed guide module, calculate the sentence similarity between the sentences of the first to-be-processed guide module and the second to-be-processed guide module, and finally determine the difference (i.e., the second difference information) between the first to-be-processed guide module and the second to-be-processed guide module by using the calculated sentence similarity.

It should be noted that, the Sentence similarity between sentences is calculated by using a specified algorithm, for example, the Sentence similarity is calculated by using a Universal Sentence coder (Universal sequence Encoder), and the manner of calculating the Sentence similarity in the embodiment of the present invention is not particularly limited.

Step S105: and according to the third to-be-processed guideline module and all second guideline modules, determining third difference information between the first clinical guideline and the second clinical guideline, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively.

As can be seen from the above, the third to-be-processed guidance module is the first guidance module that is not matched with all the second guidance modules, and it should be noted that there are 3 cases when the title similarity between the titles of the first guidance module and the second guidance module is smaller than the title similarity threshold (i.e., the first guidance module and the second guidance module are not similar).

The 1 st case is: the module content corresponding to the second guideline module is a deleted content of the first clinical guideline compared to the second clinical guideline, i.e., the module content corresponding to the second guideline module is present in the second clinical guideline but not present in the first clinical guideline.

The 2 nd case is: the module content corresponding to the first guideline module is a new content of the first clinical guideline compared with the second clinical guideline, i.e. the module content corresponding to the first guideline module exists in the first clinical guideline and does not exist in the second clinical guideline.

The 3 rd case is: if the first guideline module is a third pending guideline module, the module contents corresponding to the third pending guideline module are incorporated in other guideline modules of another version of the clinical guideline. The other clinical guideline is specifically as follows: if the content corresponding to the third pending guideline module is in the first clinical guideline, the other version of the clinical guideline is the second clinical guideline, and vice versa.

For the above-mentioned case 3, in the process of specifically implementing step S105, for each third module of the guideline to be processed, the third module of the guideline to be processed and all the second modules of the guideline to be processed are utilized to determine third difference information between the first clinical guideline and the second clinical guideline, and corresponding labels are respectively labeled at positions in the first clinical guideline and the second clinical guideline corresponding to the third difference information, and for specific contents of labeling corresponding labels at positions in the first clinical guideline and the second clinical guideline corresponding to the third difference information, reference may be made to the contents in step S102, which is not described herein again.

Specifically, the process of determining the third difference information by using the third to-be-processed guidance module and all the second guidance modules is as follows: calculating first sentence similarity between the top P% (such as the top 20%) content of the first sentence in the third to-be-processed guidance module and a plurality of (e.g., 5) second sentences of each second guidance module; if at least one first sentence similarity is greater than the first sentence similarity threshold, determining that the third guide module to be processed is matched with the second guide module corresponding to the maximum first sentence similarity, namely, if only one first sentence similarity is greater than the first sentence similarity threshold, determining that the third guide module to be processed is matched with the second guide module of which the first sentence similarity is greater than the first sentence similarity threshold, and if a plurality of first sentence similarities are greater than the first sentence similarity threshold, determining that the third guide module to be processed is matched with the second guide module corresponding to the maximum first sentence similarity.

It should be noted that, calculating the first sentence similarity between the top P% (for example, top 20%) content of the first sentence in the third guidance module to be processed and the plurality of (for example, 5) second sentences of each second guidance module specifically means: at most, first sentence similarity between the top P% content of the first sentence in the third pending guideline module and a plurality of second sentences of the second guideline module is calculated, for example: at most, the first sentence similarity between the top P% content of the first sentence in the third pending tutorial module and the 5 second sentences of the second tutorial module is calculated.

As can be seen from the above step S104, the corresponding labels are already labeled in the first clinical guideline and the second clinical guideline at the positions corresponding to the second difference information, that is, there may already be some labels in the first clinical guideline and the second clinical guideline, and after the third module to be processed and the second guideline module matched therewith are determined, the labels already existing in the first clinical guideline after the third module to be processed are changed into the modification labels from the first P% content of the first sentence of the third module to be processed; and, starting from the second sentence corresponding to the maximum first sentence similarity in the second guideline module matching the third pending guideline module, changing the existing label in the second clinical guideline located after the second sentence to the modification label.

That is, if the sentence of the second tutorial module can be matched according to the top P% content of the first sentence in the third tutorial module (the similarity of the first sentence is greater than the threshold value of the similarity of the first sentence and is the maximum), in the first clinical tutorial, the existing tag in the content after the top P% content is covered by the modified tag from the top P% content of the first sentence of the third tutorial module (the new tag marked in step S104); and in the second clinical guideline, from the second sentence corresponding to the maximum first sentence similarity in the second guideline matched with the third pending guideline module, covering the existing tags in the content after the sentence with the modified tags (the deletion tags marked in the step S104).

Preferably, after the steps S104 and S105 are executed, the labels marked in the steps S104 and S105 are displayed at the front end, wherein different types of labels are displayed in different display forms.

In the embodiment of the invention, according to a module hierarchical structure tree which is established by using the titles of all levels of clinical guidelines in advance, a first clinical guideline and a second clinical guideline are analyzed and structurally extracted respectively, and at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline are obtained; the difference information between the first guideline module and the second guideline module is determined, corresponding labels are marked at the positions corresponding to the difference information in the first clinical guideline and the second clinical guideline respectively, the difference and the change between different clinical guidelines are found without manually consulting the two clinical guidelines needing to be compared, and the efficiency and the accuracy of determining the difference and the change between different clinical guidelines are improved.

In the above embodiment of the present invention, referring to fig. 2, the process of labeling the corresponding label at the position corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively, which is referred to in step S104 in fig. 1, shows a flowchart of labeling in the first clinical guideline and the second clinical guideline provided in the embodiment of the present invention, which includes the following steps:

step S201: and performing sentence splitting processing on the first to-be-processed guide module and the second to-be-processed guide module respectively to obtain a plurality of first sentences corresponding to the first to-be-processed guide module and a plurality of second sentences corresponding to the second to-be-processed guide module.

It should be noted that, corresponding knowledge features may exist in the sentences of the guidance modules, preferably, after the first guidance module to be processed and the second guidance module to be processed are obtained, the first guidance module and the second guidance module are respectively preprocessed, and the knowledge features in the preprocessed first guidance module and the preprocessed second guidance module are respectively extracted, and a manner of specifically extracting the knowledge features of the guidance modules (to characterize the first guidance module and the second guidance module) is as follows: and identifying and extracting the knowledge features in the guide module by using a predefined knowledge feature type and adopting a rule-based or machine learning method.

Knowledge features in the tutorial module include, but are not limited to: the medical entity characteristics, the grade characteristics, the quantity characteristics and the time characteristics, and the detailed explanation of each knowledge characteristic are described in the following.

Medical entity characteristics: the medical entity characteristics refer to various medical terms such as clinical symptoms, medicines and examination methods involved in the process of disease diagnosis and treatment, are mainly used for explaining objects involved in various medical activities and behaviors in the clinical guideline, and are the main concerns of clinicians in learning the clinical guideline.

Disease grade characteristics: the medical intervention measure is referred to in the clinical guideline, and the clinical guideline editorial committee determines the evidence or recommended grade of the intervention measure based on evidence sources, acquisition routes and expert consensus conditions, wherein the higher the grade is, the more sufficient the evidence is, and the more the evidence is recommended to be applied to clinical practice. For example, in each tumor clinical guideline text there is usually a fixed level classification and expression form, such as "category + level" and "LE: + level", etc., and either directly in the sentence explaining a certain medical intervention or in parenthesis at the end of the sentence.

Quantitative characterization: the method is used for describing various diagnosis and treatment related statistics, and besides basic experimental statistics data such as sample number, P value and survival rate, clinical diagnosis and treatment data such as medicine dosage, in-vivo substance content and tumor size are explained, and two forms of basic numerics and compound numerics mainly exist.

The expression forms of the basic number words comprise three forms, namely English basic number words, Arabic numerals and percentage numbers. The English basic number words are in a form such as 'sixty-four', the expression of Arabic numerals may contain decimal points, commas and spaces, the expression of Arabic numerals may not contain decimal points, commas and spaces, and the percentage number may be: english basic word + "percent" (e.g. seven type), the percentage may also be: arabic plus "%" (e.g., 70%).

The compound number means that the content of the description of the basic number in the clinical guideline is described or defined using a number modifier, such as "mg/dL" indicating the dosage of medication, "cm" indicating the size of the tumor, and "times" indicating the multiple, etc.

Time characteristics: the method is used for expressing the time or duration of a certain medical event and operation in a clinical guideline, and mainly comprises two types of moments and time periods.

The time is a specific time point, is mainly used for explaining specific year time of a certain discovery and experiment in clinical guidelines, and has a basic composition form of four-digit Arabic numerals. A session refers to a period of time that lasts, and is usually used to express the duration of a certain medical operation or a certain range of years, such as "6 months" and "2011 to 2013", etc., which are basic components except common numbers (arabic numbers and english) combined with time quanta in order that words representing different moments of time may be connected together (e.g., "2006 through 2015" or "2006 + 2015") or integrated with time quanta (e.g., "5-year", etc.) through a connecting word or a connecting symbol.

The above content is a detailed description of a part of knowledge features, and other types of knowledge features are not illustrated and described in detail, and a person skilled in the art can determine the knowledge features to be extracted according to actual situations, and is not limited specifically herein.

In the process of implementing step S201 specifically, the first to-be-processed guidance module and the second to-be-processed guidance module are sentence-divided, and the knowledge characteristics in each sentence are subjected to semantic expansion and normalization processing, so as to obtain a plurality of first sentences corresponding to the first to-be-processed guidance module and a plurality of second sentences corresponding to the second to-be-processed guidance module.

Specifically, the semantic expansion and normalization processing method for the knowledge characteristics in each sentence of the guidance module (which represents the first guidance module to be processed and the second guidance module to be processed) is as follows: the method comprises the steps of firstly carrying out sentence segmentation on a guide module, and then carrying out semantic expansion and normalization processing on knowledge characteristics in sentences of the guide module by utilizing a specified disease code (such as ICD-10) and a controlled word list, so that the problems of morphology and structural heterogeneity of the knowledge characteristics in the sentences of the guide module are solved, and the sentences of the guide module after the semantic expansion and normalization processing are obtained. Wherein, lexical isomerism includes synonyms, abbreviations, acronyms, case sensitivity, deformation and the like, and syntactic problems includes ordering, separators, deletions and the like.

Step S202: for the mth first sentence of the first to-be-processed guidance module, sentence similarity between the mth first sentence and the H second sentences of the second to-be-processed guidance module is calculated.

It should be noted that m is an integer greater than or equal to 1 and less than or equal to x, x is the total number of first sentences contained in the first to-be-processed tutorial module, m starts from 1 and increases by 1, H is an integer greater than or equal to 1 and less than or equal to y, and y is the total number of second sentences contained in the second to-be-processed tutorial module.

As can be seen from the content in fig. 1 in the embodiment of the present invention, the first to-be-processed guidance module and the second to-be-processed guidance module correspond to each other, and in the process of the step S202, based on the knowledge characteristics and the attributes (such as the category and the position of the knowledge characteristics) of the sentences after the semantic expansion and normalization processing in the step S201, for the mth first sentence in the first to-be-processed guidance module, traversal similarity calculation is performed on the mth first sentence and the second sentence in the second to-be-processed guidance module, that is, sentence similarity between the mth first sentence and the H second sentences in the second to-be-processed guidance module is calculated.

That is, starting from the 1 st first sentence of the first candidate processing guidance module, the sentence similarity between the 1 st (m ═ 1) first sentence and the H second sentences of the second candidate processing guidance module is calculated, then, starting from the 2 nd first sentence of the first candidate processing guidance module, the sentence similarity between the 2 nd (m ═ 2) first sentence and the H second sentences of the second candidate processing guidance module is calculated, and so on, until the sentence similarity between the x (m ═ x) th first sentence and the H second sentences of the second candidate processing guidance module is calculated.

It is understood that, to save computational resources and improve efficiency, the sentence iteration step H is set, i.e. only the sentence similarity between the mth first sentence and the partial (H) second sentences of the second pending tutorial module is calculated.

That is, at most, the sentence similarity between the mth first sentence and the H second sentences of the second pending guideline module is calculated, and the sequence number ranges of the H second sentences are: and pushing the sequence number of the second sentence with the sentence similarity of the m-1 th first sentence being more than or equal to the sentence similarity threshold value back to H.

For example: assuming that the sentence iteration step size is 5(H ═ 5), starting from the first 1 st sentence of the first candidate processing guidance module, sentence similarity between the first 1 st sentence (m ═ 1) and the first 5 (including the 5 th) second sentences of the second candidate processing guidance module is calculated at most, assuming that the sentence similarity between the first 1 st sentence and the second 1 st sentence at this time is greater than or equal to the sentence similarity threshold. Then, by the 2 nd first sentence of the first candidate processing guidance module, the sentence similarity between the 2 nd (m ═ 2) first sentence and the 5 th second sentences of the second candidate processing guidance module (in this case, the 2 nd to 6th second sentences are calculated) is calculated at most, and it is assumed that the sentence similarity between the 2 nd first sentence and the 3 rd second sentence is greater than or equal to the sentence similarity threshold. Then, by the 3 rd first sentence of the first guidance module to be processed, the sentence similarity between the 3 rd (m ═ 3) first sentence and the 5 th second sentences of the second guidance module to be processed (in this case, the 4 th second sentence to the 8 th second sentence) is calculated at most, and so on.

It should be noted that the value of the sentence iteration step length H may be set according to an actual situation, and is not specifically limited herein.

Step S203: and if the sentence similarity between the mth first sentence and the nth second sentence is equal to 1, determining that the mth first sentence is the same as the nth second sentence, and not executing labeling processing.

It should be noted that n is an integer greater than or equal to 1 and less than or equal to y, and is located within the sequence number range of the H second sentences whose sentence similarity is calculated with the mth first sentence.

In the process of implementing step S203 specifically, if the sentence similarity between the mth first sentence in the first guidance module to be processed and the nth second sentence in the second guidance module to be processed is equal to 1, it indicates that the mth first sentence and the nth second sentence are the same, and at this time, no label is made at the position corresponding to the mth first sentence in the first guidance module to be processed, and no label is made at the position corresponding to the nth second sentence in the second guidance module to be processed.

Step S204: if the sentence similarity between the mth first sentence and the nth second sentence is greater than or equal to the sentence similarity threshold and less than 1, marking a modification label at a position corresponding to the mth first sentence in the first clinical guideline, and marking a modification label at a position corresponding to the nth second sentence in the second clinical guideline, when n is greater than m, determining that a third sentence which is located before the nth second sentence in the second to-be-processed guideline module, has the sentence similarity with the previous m first sentences which is less than the sentence similarity threshold and is not marked, and marking a deletion label at a position corresponding to the third sentence in the second clinical guideline.

In the process of implementing step S204, if the sentence similarity between the mth first sentence and the nth second sentence is greater than or equal to the sentence similarity threshold and less than 1, it indicates that the mth first sentence in the first candidate guideline module has modified content compared with the nth second sentence in the second candidate guideline module, at this time, a modification tag is labeled at a position corresponding to the mth first sentence in the first candidate guideline module of the first clinical guideline, and a modification tag is labeled at a position corresponding to the nth second sentence in the second candidate guideline module of the second clinical guideline.

Meanwhile, when n is larger than m, a third sentence which is positioned in the second to-be-processed guide module and is positioned before the nth second sentence, has sentence similarity with the m first sentences smaller than the sentence similarity threshold and is not subjected to labeling processing is determined, and a deletion label is labeled at a position corresponding to the third sentence in the second to-be-processed guide module of the second clinical guide.

For example: assuming that the sentence similarity between the 4 th first sentence and the 5 th second sentence is greater than or equal to the sentence similarity threshold and less than 1, when n (n ═ 5) is greater than m (m ═ 4), a third sentence (assumed as the 2 nd second sentence) in the second to-be-processed guide module, which is located before the 5 th second sentence, and has no labeling processing with the sentence similarities with the first 4 (including the 4 th) first sentences, is determined, and a deletion tag is labeled at a position corresponding to the 2 nd second sentence in the second clinical guide.

It is understood that when it is determined that there is modified content in the mth first sentence in the first to-be-processed guidance module compared with the nth second sentence in the second to-be-processed guidance module, it may also indicate that there is an update situation in the knowledge characteristics of the mth first sentence compared with the knowledge characteristics of the nth second sentence, and it is necessary to determine the difference between the knowledge characteristics in the mth first sentence and the nth second sentence.

Preferably, after step S204 is executed, based on the knowledge characteristics and the attributes of the knowledge characteristics (such as the category and the position of the knowledge characteristics) of the sentences subjected to semantic expansion and normalization processing in step S201, the difference between the knowledge characteristics in the mth first sentence and the nth second sentence is compared to obtain knowledge characteristic difference information, and corresponding labels are labeled at the positions corresponding to the knowledge characteristic difference information in the mth first sentence and the nth second sentence, respectively.

That is, labeling respective tags at the locations of updated knowledge features in the mth first sentence and the nth second sentence, indicating that the knowledge features have changed in the first pending guideline module of the first clinical guideline and the second pending guideline module of the second clinical guideline,

meanwhile, different types of labels used for indicating knowledge characteristic difference information are displayed in different display forms, such as: the difference information of the knowledge characteristics of the entity is represented by red underlines and font bolding, the difference information of the knowledge characteristics of the level is represented by cyan underlines and font bolding, the difference information of the knowledge characteristics of the quantity is represented by blue underlines and font bolding, and the difference information of the knowledge characteristics of the time is represented by green underlines. The specific display form can be set according to actual conditions, and is not particularly limited herein.

Preferably, knowledge characteristic difference information can be counted and displayed based on a time series (time of release of the clinical guideline). Meanwhile, different types of labels used for indicating knowledge characteristic difference information are displayed in different display forms at the front end.

Step S205: and if the sentence similarity between the mth first sentence and the H second sentences is smaller than the sentence similarity threshold value, marking a newly added label at the position corresponding to the mth first sentence in the first clinical guide.

In the process of implementing step S205, if the sentence similarity between the mth first sentence and the H second sentences is smaller than the sentence similarity threshold, it indicates that the mth first sentence in the first candidate guideline module is a new content compared with the second candidate guideline module, and a new tag is labeled at a position corresponding to the mth first sentence in the first candidate guideline module of the first clinical guideline.

Through the contents of the above steps S201 to S205, each first sentence in all the first to-be-processed guideline modules is sequentially processed (m is from 1to x), so as to obtain second difference information between the first clinical guideline and the second clinical guideline, and corresponding labels are marked at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, respectively.

In the embodiment of the invention, second difference information between the first clinical guideline and the second clinical guideline is determined from 3 dimensions of guideline modules, sentences and knowledge characteristics, corresponding labels are marked at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively, and the difference and the change between different clinical guidelines do not need to be found by manually consulting the two clinical guidelines needing to be compared, so that the efficiency and the accuracy of determining the difference and the change between different clinical guidelines are improved. And carrying out statistics of knowledge characteristic difference information based on the time sequence, mining time sequence change of knowledge characteristics in the clinical guideline, converting the static and unstructured clinical guideline into structural, knowledge characterization and visual representation forms, realizing multi-level and multi-dimensional disclosure of updated contents of the clinical guideline, so as to assist clinicians in visually knowing differences and change conditions between the clinical guidelines and improve learning efficiency of the clinicians.

To better explain the contents in fig. 1 and fig. 2 in the above embodiment of the present invention, the contents shown in fig. 3 are exemplified, wherein the first clinical guideline and the second clinical guideline shown in fig. 3 belong to the same source, and referring to fig. 3, another flow chart of an automatic identification method of clinical guideline updated contents provided by the embodiment of the present invention is shown, the automatic identification method includes:

step S301: and analyzing and structurally extracting the first clinical guideline and the second clinical guideline to obtain at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline.

Step S302: and carrying out normalization processing on the first guide module and the second guide module.

In the process of implementing step S302 specifically, the specific content of the normalization processing performed on the guidance module refers to the content in step S101 in fig. 1 in the embodiment of the present invention, and is not described herein again.

Step S303: knowledge features in the first guideline module and the second guideline module are extracted.

Step S304: if the updated description exists in the first clinical guideline, step S305 is executed, and if not, step S306 is executed.

Step S305: using the updated description of the first clinical guideline, first difference information between the first clinical guideline and the second clinical guideline is determined, and corresponding labels are labeled at positions in the first clinical guideline and the second clinical guideline, respectively, corresponding to the first difference information, and step S309 is performed.

Step S306: and matching the first guide module with the second guide module, respectively using the matched first guide module and second guide module as a first guide module to be processed and a second guide module to be processed, and using the first guide module which is not matched with all the second guide modules as a third guide module to be processed.

Step S307: and according to the first to-be-processed guideline module and the second to-be-processed guideline module, determining second difference information between the first clinical guideline and the second clinical guideline, and labeling corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline respectively.

Step S308: and according to the third to-be-processed guideline module and all second guideline modules, determining third difference information between the first clinical guideline and the second clinical guideline, and labeling corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline respectively.

Step S309: the labels marked in step S305 or step S307 to step S308 are displayed in different display forms at the front end.

It should be noted that, the execution principle of the above steps S301 to S309 refers to the contents in fig. 1 and fig. 2 in the embodiment of the present invention, and is not described herein again.

To better explain the contents of fig. 1to fig. 3 in the above embodiments of the present invention, the contents of fig. 1to fig. 3 in the above embodiments of the present invention are explained by processes a 1to a7 by taking specific clinical guidelines as examples, wherein the contents of processes a 1to a7 are merely for illustration.

Establishing a module hierarchical structure tree corresponding to the clinical guideline for the renal tumor by utilizing the titles of all levels of the clinical guideline for the renal tumor, and establishing a mapping rule between the module hierarchical structure tree and the contents of all guideline modules.

A1, according to the module hierarchical structure tree corresponding to the renal tumor clinical guideline, analyzing and structurally extracting the unstructured first clinical guideline and the unstructured second clinical guideline, and obtaining at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline.

The method comprises the steps of selecting and calling java open source tools, namely, spiral.PDF, spiral.doc and PDFBox for processing aiming at text and picture contents in a renal cell carcinoma clinical guideline document, and selecting and calling PDFPlumber in a Python open source library for processing aiming at a table in the renal cell carcinoma clinical guideline document.

The textual content of the extracted clinical guideline (in this way characterizing the first clinical guideline and the second clinical guideline) is stored in TXT and DOCX format, the pictorial content is stored in PNG format and the tabular content is stored in CSV format. Meanwhile, in order to distinguish and identify the titles of each level in the clinical guideline from the text content, the titles of each level of the clinical guideline are stored into a separate TXT file, and references cited in the clinical guideline are also stored separately. And storing the extracted first guide module and the extracted second guide module into a database, storing the hierarchical relationship among all the first guide modules into the database, and storing the hierarchical relationship among all the second guide modules into the database.

A2, performing text cleaning, word deactivation, word drying, English number conversion into Arabic number, abbreviation full-scale expansion and other processing on each first guide module and each second guide module, and converting the first guide module and the second guide module into normalized texts.

In a specific implementation, a specific processing mode of text cleaning is as follows: based on the regular expression matching rule, removing all reference citation identifications in the first guideline module and the second guideline module.

Stop words refer to words that appear frequently in text but do not have practical meaning, such as the, of, is, are, and various punctuations, etc., and are removed by removing the stop words in the first and second tutorial modules by a dictionary matching method.

Since the written language of the clinical guideline for renal cell carcinoma is english, there are complex morphological changes of words in the first clinical guideline and the second clinical guideline, such as various tenses of verbs, noun unit number and comparative grade of adjectives, so after the first clinical guideline and the second clinical guideline are subjected to word stem processing, various variants of the words in the first clinical guideline and the second clinical guideline are uniformly standardized, and only the word stem part is reserved to improve the effectiveness of the similarity calculation results of the first clinical guideline and the second clinical guideline.

And constructing an abbreviation-full-name mapping table by using an abbreviation identification mode based on rules and a right-to-left reverse-order scanning method, replacing the abbreviations in the first clinical guideline and the second clinical guideline according to the abbreviation-full-name mapping table, and expanding the abbreviations into full names which are indicated by intentions in the first clinical guideline and the second clinical guideline, such as replacing the abbreviation 'RCC' with a 'Renal Cell Carcinoma' full name.

A3, randomly extracting a version of the first clinical guideline and the second clinical guideline, and carrying out knowledge feature identification and extraction, manual proofreading and effect evaluation on the first clinical guideline and the second clinical guideline. Identifying 5 classes of knowledge features in a renal cell carcinoma clinical guideline (i.e., categories of the first clinical guideline and the second clinical guideline), the 5 classes of knowledge features being: the identification and extraction of knowledge characteristics are carried out by combining dictionary and rule based methods for clinical manifestation, treatment method, treatment medicine (except for secondary category 'combined medicine) ",' inspection method 'and' disease (except for renal tumor)". The criteria for evaluating the identification and extraction of knowledge features are: precision (P), Recall (Recall, R), and a harmonic mean of precision and Recall (F-measure, F1), where P is the number of correctly recognized and extracted features/total number of correctly recognized and extracted features/100%, R is the number of correctly recognized and extracted features/100%, and F1 is 2RP/(R + P) 100%.

A4, it can be understood that the update description attached to the clinical guideline of NCCN renal cancer only aims at the summary part, the pages illustrate the change situation of pictures and tables in different pages, and the text part is not directly involved, so the update description is only split based on the summary page, and the detailed subsequent processing is not needed.

For the clinical guideline of the EAU renal cell carcinoma, the body of the clinical guideline of the EAU renal cell carcinoma is that, besides the description of the pathological and diagnosis related knowledge of the renal cell carcinoma in sections, some sections also sort out the relevant important evidence and recommendation in the form of a table, and mark the evidence grade and recommendation strength to form a section main point summary, and the specific contents of the section main point are shown in the schematic diagram of the section main point summary of the EAU renal cell carcinoma clinical guideline shown in fig. 4. The updated instructions in the clinical guidelines for EAU renal cell carcinoma are in the form of a table listing in sections additional evidence and recommendations to be added to the summary of the points of each section of the clinical guidelines for EAU renal cell carcinoma compared to the previous version.

The above is relevant to the updated description of the clinical guidelines for NCCN renal cancer and EAU renal cell carcinoma.

It should be noted that, in the parsing part of the PDF document of the clinical guideline, the updated description content and the main point summary of each section in the clinical guideline are extracted and dumped into the CSV format table, so that the updated description of the clinical guideline is traversed, and the main point summary of each section is searched and located, so as to determine the new content of the first clinical guideline (here, the new clinical guideline) compared with the second clinical guideline (here, the old clinical guideline), and add the label at the position corresponding to the new content in the first clinical guideline. For deleted contents or modified contents of the first clinical guideline compared with the second clinical guideline, it is necessary to determine the deleted contents or modified contents based on the similarity calculation result by using the update description content discovery method, and perform corresponding labeling, and for how to label specifically, refer to the contents shown in step S102 in fig. 1 in the embodiment of the present invention, which is not described herein again.

The above-mentioned contents are to determine and label the difference information between the first clinical guideline and the second clinical guideline according to the updated description of the clinical guideline, and the following contents are to determine and label the difference information between the first clinical guideline and the second clinical guideline without using the updated description (corresponding to the contents of step S103 to step S105 in fig. 1 and fig. 2 in the above-mentioned embodiment of the present invention).

It is understood that two versions of the NCCN renal cancer clinical guideline are selected, and two versions of the EAU renal cell carcinoma clinical guideline are selected, and an alignment between the clinical guidelines is performed (without using updated instructions to determine differences and changes between the two versions of the clinical guideline) with the sentence similarity threshold set at 0.51.

A5, it can be seen from the above process a4 that the sentence similarity threshold is set to 0.51, the difference information between the clinical guidelines of NCCN kidney cancer of the two versions is compared and labeled accordingly, and the difference information between the clinical guidelines of EAU kidney cell cancer of the two versions is compared and labeled accordingly, without using the updated description, according to the embodiment of the present invention shown in steps S103 to S105 in fig. 1 and shown in the steps in fig. 2 in the above embodiment of the present invention.

It is understood that the alignment between the two versions of the clinical guideline for NCCN kidney cancer, as well as the alignment between the two versions of the clinical guideline for EAU kidney cell carcinoma, can be assessed by indices of accuracy, recall, and F1 values.

In the process of comparing the two versions of clinical guidelines a6 and a5, regarding the third module to be processed (the first guideline module that does not match all the second guideline modules) mentioned in step S103 of fig. 1 of the embodiment of the present invention, the third module to be processed is processed and labeled correspondingly according to the content shown in step S105 of fig. 1 of the embodiment of the present invention, and will not be described again here.

A7, after comparing the two versions of clinical guidelines, marking the modified parts (namely modification labels) in the two versions of clinical guidelines in a mode of highlighting texts in yellow, wherein the highlighted contents in yellow in the two versions of clinical guidelines are in one-to-one correspondence; marking new parts (namely new labels) in the new clinical guideline version in a mode of highlighting the text in blue; the deleted portions of the old version of the clinical guideline (i.e., the delete tags) are marked in red highlighting the text.

To better explain how to highlight the difference information between the two versions of the clinical guideline, this is illustrated by the schematic of the label update tag shown in fig. 5.

It will be appreciated that the clinical guideline shown in fig. 5 is the difference information between the text of the guideline module for the two versions of the EAU renal cell carcinoma clinical guideline "epidemiology", with text for the guideline module for the 2016 year version of EAU renal cell carcinoma clinical guideline "epidemiology" on the left and text for the 2018 year version of the EAU renal cell carcinoma clinical guideline "epidemiology" on the right.

In FIG. 5, the text is highlighted in yellow as similar but not identical content in the two versions of the clinical guideline for EAU renal cell carcinoma; the left 2016 version of the clinical guideline for EAU renal cell carcinoma, which is indicated in red highlighting text, represents the deleted content of the 2016 version of the clinical guideline for EAU renal cell carcinoma, and the part of the content (the red highlighting content of the 2016 version of the clinical guideline for EAU renal cell carcinoma) has no relevant description in the 2018 version of the clinical guideline for EAU renal cell carcinoma; in the 2018 edition EAU renal cell carcinoma clinical guideline on the right, the content marked in a text manner in blue highlighting indicates that the part of the content is the new content in the 2018 edition EAU renal cell carcinoma clinical guideline, and the part of the content (the content in blue highlighting in the 2018 edition EAU renal cell carcinoma clinical guideline) does not appear in the 2016 edition EAU renal cell carcinoma clinical guideline.

As can be seen from the contents shown in step S204 of fig. 2 in the above embodiment of the present invention, for the modified contents between the two versions of clinical guidelines (i.e., the first clinical guideline and the second clinical guideline in step S204), that is, for the corresponding yellow highlighted contents between the two versions of clinical guidelines, the knowledge characteristics between the two versions of modified contents may change, and at this time, the knowledge characteristic difference information between the modified contents needs to be determined and labeled correspondingly, and meanwhile, different types of labels for indicating the knowledge characteristic difference information are displayed in different display forms.

It is noted that, as can be seen from the above, the clinical guideline for renal cell carcinoma is written in english, so that the contents of the clinical guideline shown in fig. 5 and the text contents of the clinical guidelines shown in fig. 6 to 9 are written in english,

it is further noted that the left side of fig. 6 to 8 is part of the clinical guideline for EAU renal cell carcinoma of 2016 year version, and the right side is part of the clinical guideline for EAU renal cell carcinoma of 2018 year version.

As shown in fig. 6, in the yellow highlighted content of the two versions of clinical guidelines, the difference of the knowledge characteristics of the entity is represented by red underlining and font bolding, i.e., there is a change in the knowledge characteristics of the entity marked by red underlining and font bolding in the two versions of clinical guidelines.

As shown in fig. 7, in the yellow highlighting content of the two versions of clinical guidelines, the difference in the knowledge characteristics of the levels is indicated by cyan underlining and font bolding, i.e., there is a change in the knowledge characteristics of the levels indicated by cyan underlining and font bolding in the two versions of clinical guidelines.

As shown in fig. 8, in the yellow highlighted content of the two versions of clinical guidelines, the difference of the quantitative knowledge characteristic is represented by blue underlining and font bolding, and the difference of the temporal knowledge characteristic is represented by green underlining and font bolding, that is, the quantitative knowledge characteristic marked by blue underlining and font bolding in the two versions of clinical guidelines changes, and the temporal knowledge characteristic marked by green underlining and font bolding in the two versions of clinical guidelines changes.

It can be understood that the display update timing sequence can comb the differences of the entity knowledge characteristics among different versions of the clinical guideline from the time dimension, and the reading staff can be assisted to quickly clear the knowledge update context of the clinical guideline by browsing the change of the entity knowledge characteristics of each version of the clinical guideline compared with the previous version or the next version. Taking the contents of the guideline module "targeted Therapy of recurrent, progressive or Metastatic renal cell carcinoma (targeted Therapy of delayed or Advanced or metastic RCC)" as an example, the display update timing sequence is illustrated by the display diagram of the update timing sequence shown in fig. 9.

In the context shown in fig. 9, the "targeted treatment of recurrent, progressive or metastatic renal cell carcinoma" guideline module of the 2016 year version of clinical guideline adds three drugs compared to the 2015 year version of clinical guideline; compared with the 2016 version of clinical guidelines, the 2017 version of clinical guidelines for targeted treatment of recurrent, progressive or metastatic renal cell carcinoma adds one drug and deletes five drugs; compared to the 2017 version of clinical guidelines, the 2018 version of clinical guidelines for targeted treatment of recurrent, progressive or metastatic renal cell carcinoma is supplemented with one new drug.

Note that, the contents shown in bold in fig. 9 indicate that the source is the title of each level of the clinical guideline.

Corresponding to the above automatic identification method of clinical guideline update content provided by the embodiment of the invention, referring to fig. 10, the embodiment of the invention further provides a structural block diagram of an automatic identification system of clinical guideline update content, the automatic identification system includes: the parsing unit 100, the first processing unit 110, the second processing unit 120, the third processing unit 130, and the fourth processing unit 140;

the analysis unit 100 is configured to analyze and structurally extract the first clinical guideline and the second clinical guideline respectively according to a module hierarchical structure tree that is established in advance by using the headings of the clinical guidelines at each level, so as to obtain at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline, where the first guideline module is text content included in the headings of the minimum level in the first clinical guideline, and the second guideline module is text content included in the headings of the minimum level in the second clinical guideline.

The first processing unit 110 is configured to, if the first clinical guideline and the second clinical guideline belong to the same source and there is an updated description of the first clinical guideline with respect to the second clinical guideline, determine first difference information between the first clinical guideline and the second clinical guideline by using the updated description of the first clinical guideline, and label corresponding labels, which are a new label, a deleted label, or a modified label, at positions in the first clinical guideline and the second clinical guideline corresponding to the first difference information, respectively.

The second processing unit 120 is configured to, if the first clinical guideline and the second clinical guideline belong to different sources, or if the first clinical guideline and the second clinical guideline belong to the same source and the first clinical guideline does not have an updated description, match the first guideline module and the second guideline module, respectively use the matched first guideline module and second guideline module as the first to-be-processed guideline module and the second to-be-processed guideline module, and use the first guideline module that is not matched with all the second guideline modules as the third to-be-processed guideline module.

In a specific implementation, the second processing unit 120 is specifically configured to: for each first guide module, determining the title similarity between the title of the first guide module and the title of each second guide module by using a preset deep semantic matching model; and for each first guide module, if all the title similarities are smaller than the title similarity threshold, determining that the first guide module is not matched with all the second guide modules, and if at least one title similarity is larger than or equal to the title similarity threshold, determining that the first guide module is matched with the second guide module corresponding to the maximum title similarity.

And a third processing unit 130, configured to determine second difference information between the first clinical guideline and the second clinical guideline according to the first to-be-processed guideline module and the second to-be-processed guideline module, and label corresponding labels at positions corresponding to the second difference information in the first clinical guideline and the second clinical guideline, respectively.

And a fourth processing unit 140, configured to determine third difference information between the first clinical guideline and the second clinical guideline according to the third to-be-processed guideline module and all the second guideline modules, and label corresponding labels at positions corresponding to the third difference information in the first clinical guideline and the second clinical guideline, respectively.

In a specific implementation, the fourth processing unit 140 is specifically configured to: calculating the first sentence similarity between the first P% content of the first sentence in the third guide module to be processed and a plurality of second sentences of each second guide module; if at least one first sentence similarity is larger than the first sentence similarity threshold, determining that the third guide module to be processed is matched with the second guide module corresponding to the maximum first sentence similarity; changing the existing tags in the first clinical guideline located after the first pending guideline module to modification tags starting from the first P% of the content of the first sentence of the third pending guideline module; starting from the second sentence corresponding to the maximum first sentence similarity in the second guideline module matching the third pending guideline module, the existing label in the second clinical guideline located after the second sentence is changed into the modification label.

Preferably, in conjunction with the content shown in fig. 10, the third processing unit 130 includes: the sentence dividing subunit, the calculating subunit, the first labeling subunit, the second labeling subunit and the third labeling subunit, and the execution principle of each subunit is as follows:

and the sentence dividing sub-unit is used for respectively carrying out sentence dividing processing on the first to-be-processed guide module and the second to-be-processed guide module to obtain a plurality of first sentences corresponding to the first to-be-processed guide module and a plurality of second sentences corresponding to the second to-be-processed guide module.

And the calculating subunit is used for calculating the sentence similarity between the mth first sentence and H second sentences of the second to-be-processed guide module for the mth first sentence of the first to-be-processed guide module, wherein m is an integer which is greater than or equal to 1 and less than or equal to x, x is the total number of the first sentences contained in the first to-be-processed guide module, m starts from 1 and is increased by 1, H is an integer which is greater than or equal to 1 and less than or equal to y, and y is the total number of the second sentences contained in the second to-be-processed guide module.

And the first labeling subunit is used for determining that the mth first sentence is the same as the nth second sentence and not executing labeling processing if the sentence similarity between the mth first sentence and the nth second sentence is equal to 1, wherein n is an integer which is greater than or equal to 1 and less than or equal to y.

And the second labeling subunit is used for labeling a modification label at a position corresponding to the mth first sentence in the first clinical guideline and labeling a modification label at a position corresponding to the nth second sentence in the second clinical guideline if the sentence similarity between the mth first sentence and the nth second sentence is greater than or equal to the sentence similarity threshold and less than 1, when n is greater than m, determining that the sentence similarity between the second to-be-processed guideline module and the first m first sentences is less than the sentence similarity threshold and a third sentence which is not labeled and processed before the nth second sentence in the second to-be-processed guideline module, and labeling a deletion label at a position corresponding to the third sentence in the second clinical guideline.

And a third labeling subunit, configured to label, in the first clinical guideline, the new tag at the position corresponding to the mth first sentence, if the sentence similarity between the mth first sentence and the H second sentences is smaller than the sentence similarity threshold.

Preferably, in conjunction with the content shown in fig. 10, the automatic identification system further includes:

and the preprocessing unit is used for respectively preprocessing the first guide module and the second guide module and respectively extracting the knowledge characteristics in the preprocessed first guide module and the preprocessed second guide module.

Correspondingly, the second labeling subunit is further configured to: comparing the difference between the knowledge characteristics in the mth first sentence and the nth second sentence to obtain knowledge characteristic difference information; and labeling corresponding labels at the positions corresponding to the knowledge characteristic difference information in the mth first sentence and the nth second sentence respectively.

and the display unit is used for displaying the labels of different categories by using different display forms respectively.

and the normalization unit is used for performing normalization processing on the first guide module and the second guide module.

the storage unit is used for storing the first guide modules and the second guide modules into the database, storing the hierarchical relationship among all the first guide modules into the database, and storing the hierarchical relationship among all the second guide modules into the database.

In summary, embodiments of the present invention provide an automatic identification method and system for clinical guideline update content, wherein a first clinical guideline and a second clinical guideline are analyzed and structurally extracted according to a module hierarchical structure tree established in advance by using headings of each level of the clinical guideline, so as to at least obtain a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline; the difference information between the first guideline module and the second guideline module is determined, corresponding labels are marked at the positions corresponding to the difference information in the first clinical guideline and the second clinical guideline respectively, the difference and the change between different clinical guidelines are found without manually consulting the two clinical guidelines needing to be compared, and the efficiency and the accuracy of determining the difference and the change between different clinical guidelines are improved.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for automatic identification of clinical guideline updates, the method comprising:

2. The method of claim 1, wherein the process of matching the first guideline module and the second guideline module comprises:

3. The method of claim 1, wherein determining second difference information between the first clinical guideline and the second clinical guideline and labeling respective labels in the first clinical guideline and the second clinical guideline at locations corresponding to the second difference information according to the first to-be-processed guideline module and the second to-be-processed guideline module comprises:

4. The method of claim 1, wherein determining third difference information between the first clinical guideline and the second clinical guideline and labeling respective labels in the first clinical guideline and the second clinical guideline at locations corresponding to the third difference information according to the third pending guideline module and all of the second guideline modules comprises:

5. The method of claim 3, wherein parsing and structuring the first clinical guideline and the second clinical guideline, respectively, further comprises, after obtaining at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline:

6. The method of claim 5, wherein after labeling a modification tag in the first clinical guideline at a location corresponding to the mth first sentence and labeling a modification tag in the second clinical guideline at a location corresponding to the nth second sentence, further comprising:

7. The method according to any one of claims 1-5, further comprising:

8. The method of claim 5, wherein parsing and structuring the first clinical guideline and the second clinical guideline, respectively, further comprises, after obtaining at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline:

9. The method of claim 1, wherein parsing and structuring a first clinical guideline and a second clinical guideline, respectively, further comprises, after obtaining at least a first guideline module corresponding to the first clinical guideline and a second guideline module corresponding to the second clinical guideline:

10. An automatic identification system of clinical guideline updates, the system comprising: