CN114743681B - Case grouping screening method and system based on natural language processing - Google Patents

Case grouping screening method and system based on natural language processing Download PDF

Info

Publication number
CN114743681B
CN114743681B CN202111564591.1A CN202111564591A CN114743681B CN 114743681 B CN114743681 B CN 114743681B CN 202111564591 A CN202111564591 A CN 202111564591A CN 114743681 B CN114743681 B CN 114743681B
Authority
CN
China
Prior art keywords
data
group
entering
case data
case
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111564591.1A
Other languages
Chinese (zh)
Other versions
CN114743681A (en
Inventor
杨�远
刘昊
曹润卿
史俊才
钟炎萤
陈华达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Health Data Beijing Technology Co ltd
Original Assignee
Health Data Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Health Data Beijing Technology Co ltd filed Critical Health Data Beijing Technology Co ltd
Priority to CN202111564591.1A priority Critical patent/CN114743681B/en
Publication of CN114743681A publication Critical patent/CN114743681A/en
Application granted granted Critical
Publication of CN114743681B publication Critical patent/CN114743681B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Public Health (AREA)
  • General Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Biomedical Technology (AREA)
  • Pathology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a case group entering screening method and a system based on natural language processing, wherein the method comprises the following steps: primary recognition is carried out on the original case data by adopting an NLP model to obtain a text label set; constructing a PLSA classification model, and performing association mapping on the original case data, the text label set and the type nodes; determining a group entering tag set of a group entering rule text by adopting an NLP model; matching the set of the input group labels with the set of the text labels, and determining a specific type of node corresponding to the set of the input group labels; and extracting the group-in case data which are mapped by the nodes of the specific type. In the embodiment of the invention, the text label set and the group entering label set are obtained by adopting natural language processing, further the association mapping between the text label set and each type of nodes based on probability distribution is obtained by adopting PLSA, and then the group entering label set and the specific type of nodes are matched to extract the required group entering case data, so that the group entering screening of the original case data comprising unstructured data is completed, the process does not need manual intervention, and the accuracy is high.

Description

Case grouping screening method and system based on natural language processing
Technical Field
The invention relates to data processing, in particular to a case grouping and screening method and system based on natural language processing.
Background
The case data distribution is generated at each stage in the diagnosis and treatment process of the patient, is necessary information data in the processes of diagnosis, follow-up visit, scientific research and the like, and comprises basic patient information, medical history information, auxiliary examination information, operation information, doctor's advice information, image information and the like existing at each stage.
In the case data, besides the inherent information such as names and certificate numbers in the patient information, other text expressions with large space through manual input exist in other information such as medical history information and medical advice information, the image information further comprises image information different from the text expressions, the information is semi-structured and unstructured information, necessary effective information in the information is difficult to be completely extracted by means of detection based on keywords, the case data needs to be checked and recorded manually in the process of transferring, the process is seriously dependent on manual work, time and labor are wasted, and extremely high requirements are placed on career literacy of a processor.
In addition, when scientific research is performed on specific cases, huge case data with multiple mechanisms and multiple time periods are often involved, classification and identification are needed before research, case data meeting research requirements are screened out for group entry, and group entry rules mainly exist in unstructured data such as medical history information and operation information and are reflected in a semantic form and cannot be identified through keywords.
Disclosure of Invention
The embodiment of the invention discloses a case grouping screening method and a system based on natural language processing, which are characterized in that original case data and grouping rule texts are processed in a natural language processing mode to obtain a text label set and a grouping label set, then the original case data and the text label set are processed by a PLSA classification model to obtain association mapping between the two and various types of nodes based on probability distribution, and then the required grouping case data is obtained by matching the grouping label set and the specific type of nodes, so that grouping screening of the original case data comprising unstructured data is completed, manual intervention is not needed in the process, and the accuracy is high.
The first aspect of the embodiment of the invention discloses a case group entering screening method based on natural language processing, which comprises the following steps:
primary recognition is carried out on the original case data by adopting an NLP model, and a text label set is obtained;
constructing a PLSA classification model based on the original case data and the text label set, and performing association mapping on the original case data, the text label set and a plurality of types of nodes;
based on the rule text of the group entering, determining a label set of the group entering by adopting an NLP model;
matching the set of the group-entering labels with the set of the text labels, and determining a specific type node corresponding to the set of the group-entering labels in the implicit semantic space;
and extracting the original case data of the associated mapping of the specific type node to obtain the group-entering case data.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the rule text of the group includes at least a preference rule, an exclusion rule, and a remark rule;
and in the original case data, the original case data which accords with the optimization rule and does not accord with the exclusion rule at the same time, or the original case data which accords with the remark rule is the group-entering case data.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, after the extracting the original case data mapped by the specific type node to obtain the group-entering case data, the method further includes:
synchronizing the group-entering case data by using a Datax tool, and placing the structured data in the group-entering case data into a standard data table;
splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each group of case data relative to a standard data table, and respectively placing the semi-structured data and the unstructured data into the standard data table.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the method further includes:
performing value range verification on each standard data table constructed based on the group-entering case data, and eliminating standard data tables with overrun data;
and carrying out logic verification on each standard data table with the value range verification completed, and eliminating the standard data table with the defect data against the medical logic.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the method further includes:
storing image data related to the group-entering case data in an image library, and establishing an association relation between the group-entering case data and the image data through a patient main index;
when any group of case data is retrieved and called, corresponding image data is synchronously called based on the association relation.
The second aspect of the embodiment of the invention discloses a case grouping and screening system based on natural language processing, which comprises the following steps:
the label identification unit is used for carrying out primary identification on the original case data by adopting an NLP model to obtain a text label set;
the model building unit is used for building a PLSA classification model based on the original case data and the text label set and carrying out association mapping on the original case data, the text label set and a plurality of types of nodes;
the tag determining unit is used for determining a group-entering tag set by adopting an NLP model based on the group-entering rule text;
the label matching unit is used for matching the group-entering label set with the text label set and determining a specific type node corresponding to the group-entering label set in the implicit semantic space;
and the data extraction unit is used for extracting the original case data which are mapped by the specific type of nodes in an associated mode to obtain the group-entering case data.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the rule text of the group includes at least a preference rule, an exclusion rule, and a remark rule;
and in the original case data, the original case data which accords with the optimization rule and does not accord with the exclusion rule at the same time, or the original case data which accords with the remark rule is the group-entering case data.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the system further includes:
the data synchronization unit is used for synchronizing the group-entering case data by using a Datax tool after the data extraction unit extracts the original case data which is mapped by the specific type node in an associated way to obtain the group-entering case data, and placing the structured data in the group-entering case data into a standard data table;
the data matching unit is used for splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each group of case data relative to the standard data table and respectively placing the semi-structured data and the unstructured data into the standard data table.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the method further includes:
the first eliminating unit is used for verifying the value range of each standard data table constructed based on the group-entering case data and eliminating the standard data table with the overrun data;
and the second eliminating unit is used for carrying out logic verification on each standard data table with the value range verification, and eliminating the standard data table with the defect data against the medical logic.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the method further includes:
the image association unit is used for storing image data related to the group-entering case data in an image library and establishing association relation between the group-entering case data and the image data through a patient main index;
and the image calling unit is used for synchronously calling corresponding image data based on the association relation when any group of case data is searched and called.
The third aspect of the embodiment of the invention discloses a case group entering screening system based on natural language processing, which comprises the following steps:
a memory storing executable program code;
a processor coupled to the memory;
the processor invokes the executable program code stored in the memory to execute the case grouping screening method based on natural language processing disclosed in the first aspect of the embodiment of the present invention.
A fourth aspect of the embodiment of the present invention discloses a computer-readable storage medium storing a computer program, where the computer program causes a computer to execute a case entry group screening method based on natural language processing disclosed in the first aspect of the embodiment of the present invention.
A fifth aspect of the embodiments of the present invention discloses a computer program product which, when run on a computer, causes the computer to perform part or all of the steps of any one of the methods of the first aspect.
A sixth aspect of the embodiments of the present invention discloses an application publishing platform for publishing a computer program product, wherein the computer program product, when run on a computer, causes the computer to perform part or all of the steps of any one of the methods of the first aspect.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the original case data and the grouping rule text are processed in a natural language processing mode to obtain the text label set and the grouping label set, the PLSA classification model is further adopted to process the original case data and the text label set to obtain the association mapping between the original case data and various types of nodes based on probability distribution, and the required grouping case data is obtained by matching the grouping label set and the specific types of nodes, so that the grouping screening of the original case data comprising unstructured data is completed, manual intervention is not needed in the process, and the accuracy is high.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a schematic flow chart of a case grouping screening method based on natural language processing according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a case entry group screening system based on natural language processing according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another case-in-group screening system based on natural language processing according to an embodiment of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
It should be noted that the terms "first," "second," "third," "fourth," and the like in the description and in the claims of the present invention are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. The terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a case grouping screening method and a system based on natural language processing, which are characterized in that original case data and grouping rule texts are processed in a natural language processing mode to obtain a text label set and a grouping label set, then the original case data and the text label set are processed by a PLSA classification model to obtain association mapping between the two and various types of nodes based on probability distribution, and then the required grouping case data is obtained by matching the grouping label set and the specific type of nodes, so that grouping screening of the original case data comprising unstructured data is completed, manual intervention is not needed in the process, and the accuracy is high.
Example 1
Referring to fig. 1, fig. 1 is a flow chart of a case grouping screening method based on natural language processing according to an embodiment of the present invention. As shown in fig. 1, the case-grouping screening method based on natural language processing may include the following steps.
101. And (5) performing primary recognition on the original case data by adopting an NLP model to obtain a text label set.
In this embodiment, an NLP model is used to extract text labels of keywords in each original case data, and the text labels are used as basic recognition and matching basis.
102. And constructing a PLSA classification model based on the original case data and the text label set, and performing association mapping on the original case data, the text label set and a plurality of types of nodes.
In this embodiment, a PLSA classification model is established in which original case data and type nodes are associated with each other, where each case data is represented by a probability distribution on a text label, and each type node is represented by a probability distribution on each case data, so as to form a probability distribution of a double-layer structure, thereby obtaining a probability relationship between the original case data and the type node, and determining the type node corresponding to the case data based on the probability relationship with the strongest association.
103. And determining the set of the group-entering tags by using an NLP model based on the group-entering rule text.
In this embodiment, the group-entering rule text is determined based on the research project, and most of the group-entering rule text is text expression in long sentence or multi-segment distribution, and is mainly embodied as unstructured data when the research field is relatively refined.
As an alternative implementation mode, the rule text of the group at least comprises a preference rule, an exclusion rule and a remark rule; and in the original case data, the original case data which accords with the optimization rule and does not accord with the exclusion rule at the same time, or the original case data which accords with the remark rule is the group-entering case data.
Specifically, taking breast cancer study as an example, the preferred rules may be: pathologically diagnosed left-or right-breast malignancy, which is defined in the following ranges: carcinoma, sarcoma, malignant or borderline phyllotor, interstitial, CDCIS, paget's disease. (at the time of initial diagnosis, no effect on the inclusion of the group was accompanied by other secondary tumors or not).
The exclusion rules may be:
a. patients with no histologically confirmed malignant breast tumor lesions;
b. breast cancer patients who did not receive surgical treatment for the breast and armpit at home;
c. histologically confirmed diagnosis is of patients with classical lobular carcinoma in situ, benign breast cancer, mastitis, papilloma, benign lobular tumor and no malignant focus;
d. receiving breast surgery treatment at the outer hospital, and obtaining a negative edge, and performing armpit surgery on the patient at the home;
e. primary focus surgery is not performed in home, and patients with recurrent metastasis appear after surgery;
f. other malignant tumors metastasize to patients with breast or axilla.
The remark rules may be:
a. the coarse needle biopsy is not positioned as a surgically treated patient;
b. primary stage iv breast cancer surgery patient;
c. patients who have undergone resection biopsy of the tumor (including minimally invasive surgery) at the hospital, who have undergone open surgery at the hospital, or who have undergone axillary surgery, have reported consultation with the clinical department of the hospital based on the white piece of pathological tissue of the tumor at the hospital.
The above criteria for the preference rule are broader, and the rule is applied to preliminary screening, the rule is excluded to supplement the preference rule, the situation that the group is not to be entered is clarified, and the rule is further remarked to supplement the preference rule and the rule is discharged, so that the special situation that the group is to be entered is clarified.
104. And matching the set of the group-entering labels with the set of the text labels, and determining the specific type of nodes corresponding to the set of the group-entering labels in the implicit semantic space.
In this embodiment, under the condition that the group entering tag set defines the group entering requirement and the PLSA classification model constructs a complete association mapping for the original case data, the group entering tag set is matched with the text tag set, and a specific type node consistent with the group entering requirement is determined.
105. And extracting the original case data of the associated mapping of the specific type node to obtain the group-entering case data.
In this embodiment, the original case data mapped and associated with the specific type node is the group entering case data meeting the text requirement of the group entering rule, and is extracted according to the specific type node.
In this embodiment, after the required group-entering case data is screened out, it is entered into a group,
as an optional implementation manner, synchronizing the group-entering case data by using a Datax tool, and placing the structured data in the group-entering case data into a standard data table; splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each group of case data relative to a standard data table, and respectively placing the semi-structured data and the unstructured data into the standard data table.
Thus, the structured data in the case data of the group is correspondingly classified into the standard data table, and the semi-structured data and the unstructured data are split into groups based on the matching values of the semi-structured data and parameters such as data formats, field requirements and the like of all fields in the standard data table.
In this embodiment, verification and cleaning are also performed on the group-entering case data, and incorrect rejection is performed to obtain accurate and usable group-entering case data.
As an optional implementation manner, checking the value range of each standard data table constructed based on the case data of the group, and eliminating the standard data table with overrun data;
and carrying out logic verification on each standard data table with the value range verification completed, and eliminating the standard data table with the defect data against the medical logic.
Here, isomorphic value range verification is used for eliminating the group-entering case data with overrun data (such as negative age), and logic verification is used for eliminating the group-entering case data with medical logic errors (such as operation date earlier than pathological date before treatment), so that invalid data is prevented from negatively affecting the study.
In this embodiment, the image data is stored independently with respect to the group-entering case data, and establishes an association relationship with the group-entering case data in text format.
As an optional implementation manner, the image data related to the group-entering case data is stored in an image library, and an association relationship is established between the group-entering case data and the image data through a patient main index;
when any group of case data is retrieved and called, corresponding image data is synchronously called based on the association relation.
Therefore, special processing of the image data is not needed, and the influence on the accuracy of the image data is avoided.
In summary, the original case data and the grouping rule text are processed in a natural language processing mode to obtain a text label set and a grouping label set, the original case data and the text label set are processed in a PLSA classification model to obtain the association mapping between the two and various types of nodes based on probability distribution, and the required grouping case data is obtained by matching the grouping label set and the specific type of nodes, so that grouping screening of the original case data comprising unstructured data is completed, manual intervention is not needed in the process, and accuracy is high.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of a case entry group screening system based on natural language processing according to an embodiment of the present invention. As shown in fig. 2, the case-grouping screening system based on natural language processing may include:
a tag identification unit 201, configured to perform primary identification on original case data by using an NLP model, so as to obtain a text tag set;
the model building unit 202 is configured to build a PLSA classification model based on the original case data and the text label set, and perform association mapping on the original case data, the text label set, and a plurality of types of nodes;
a tag determining unit 203, configured to determine a set of tags entering a group by using an NLP model based on the rule text entering the group;
the rule text of the group at least comprises a preference rule, an exclusion rule and a remark rule;
and in the original case data, the original case data which accords with the optimization rule and does not accord with the exclusion rule at the same time, or the original case data which accords with the remark rule is the group-entering case data.
The tag matching unit 204 is configured to match the set of in-group tags with the set of text tags, and determine a specific type of node corresponding to the set of in-group tags in the implicit semantic space;
the data extraction unit 205 is configured to extract original case data mapped by the specific type node to obtain group-entering case data;
the data synchronization unit 206 is configured to synchronize the group-entering case data by using a Datax tool after the data extraction unit extracts the original case data mapped by the specific type node, and place the structured data in the group-entering case data into the standard data table;
a data matching unit 207, configured to split the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data included in each group of case data relative to the standard data table, and put the split semi-structured data and the unstructured data into the standard data table respectively;
a first rejecting unit 208, configured to perform value range verification on each standard data table constructed based on the group-entering case data, and reject the standard data table with overrun data;
a second rejection unit 209, configured to perform logic verification on each standard data table for which the value range verification is completed, and reject the standard data table for which defect data that violates medical logic exists;
the image association unit 210 is configured to store image data related to the group-entering case data in an image library, and establish an association relationship between the group-entering case data and the image data through a patient main index;
the image retrieving unit 211 is configured to retrieve and retrieve any of the group of case data, and to retrieve corresponding image data synchronously based on the association relationship.
As an alternative implementation manner, the data synchronization unit 206 uses a Datax tool to synchronize the group-entering case data, and puts the structured data in the group-entering case data into the standard data table; the data matching unit 207 splits the semi-structured data and the unstructured data according to the matching values of the semi-structured data and the unstructured data contained in each set of case data with respect to the standard data table, and places the split semi-structured data and the unstructured data into the standard data table respectively.
Thus, the structured data in the case data of the group is correspondingly classified into the standard data table, and the semi-structured data and the unstructured data are split into groups based on the matching values of the semi-structured data and parameters such as data formats, field requirements and the like of all fields in the standard data table.
As an alternative embodiment, the first culling unit 208 performs a value range check on each standard data table constructed based on the case data of the group, and culls the standard data table with overrun data;
the second culling unit 209 performs logic verification on each standard data table for which the value range verification is completed, and culls the standard data table for which defect data that violates medical logic exists.
Here, isomorphic value range verification is used for eliminating the group-entering case data with overrun data (such as negative age), and logic verification is used for eliminating the group-entering case data with medical logic errors (such as operation date earlier than pathological date before treatment), so that invalid data is prevented from negatively affecting the study.
In this embodiment, the image data is stored independently with respect to the group-entering case data, and establishes an association relationship with the group-entering case data in text format.
As an alternative embodiment, the image association unit 210 stores the image data related to the group-entering case data in the image library, and establishes an association relationship between the group-entering case data and the image data through the patient main index;
when retrieving and retrieving any of the incoming set of case data, the image retrieving unit 211 synchronously retrieves the corresponding image data based on the association relationship.
Therefore, special processing of the image data is not needed, and the influence on the accuracy of the image data is avoided.
In summary, the original case data and the grouping rule text are processed in a natural language processing mode to obtain a text label set and a grouping label set, the original case data and the text label set are processed in a PLSA classification model to obtain the association mapping between the two and various types of nodes based on probability distribution, and the required grouping case data is obtained by matching the grouping label set and the specific type of nodes, so that grouping screening of the original case data comprising unstructured data is completed, manual intervention is not needed in the process, and accuracy is high.
Example III
Referring to fig. 3, fig. 3 is a schematic structural diagram of another case entry group screening system based on natural language processing according to an embodiment of the present invention. As shown in fig. 3, the case-grouping screening system based on natural language processing may include:
a memory 301 storing executable program code;
a processor 302 coupled with the memory 301;
wherein the processor 302 invokes executable program code stored in the memory 301 to perform a case-grouping screening method based on natural language processing of fig. 1.
The embodiment of the invention discloses a computer readable storage medium storing a computer program, wherein the computer program enables a computer to execute a case grouping screening method based on natural language processing of fig. 1.
The embodiments of the present invention also disclose a computer program product, wherein the computer program product, when run on a computer, causes the computer to perform some or all of the steps of the method as in the method embodiments above.
Those of ordinary skill in the art will appreciate that all or part of the steps of the various methods of the above embodiments may be implemented by a program that instructs associated hardware, the program may be stored in a computer readable storage medium including Read-Only Memory (ROM), random access Memory (Random Access Memory, RAM), programmable Read-Only Memory (Programmable Read-Only Memory, PROM), erasable programmable Read-Only Memory (Erasable Programmable Read Only Memory, EPROM), one-time programmable Read-Only Memory (OTPROM), electrically erasable programmable Read-Only Memory (EEPROM), compact disc Read-Only Memory (Compact Disc Read-Only Memory, CD-ROM) or other optical disk Memory, magnetic disk Memory, tape Memory, or any other medium that can be used for carrying or storing data that is readable by a computer.
The above describes in detail a case grouping screening method and system based on natural language processing disclosed in the embodiment of the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, where the above description of the embodiment is only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in accordance with the ideas of the present invention, the present description should not be construed as limiting the present invention in view of the above.

Claims (8)

1. A method for case-group-entering screening based on natural language processing, the method comprising:
primary recognition is carried out on the original case data by adopting an NLP model, and a text label set is obtained;
constructing a PLSA classification model based on the original case data and the text label set, and performing association mapping on the original case data, the text label set and a plurality of types of nodes;
based on the rule text of the group entering, determining a label set of the group entering by adopting an NLP model;
matching the set of the group-entering labels with the set of the text labels, and determining a specific type node corresponding to the set of the group-entering labels in a latent semantic space;
extracting original case data of the associated mapping of the specific type node to obtain group-entering case data;
the rule text of the group at least comprises a preference rule, an exclusion rule and a remark rule;
and in the original case data, the original case data which accords with the optimization rule and does not accord with the exclusion rule at the same time, or the original case data which accords with the remark rule is the group-entering case data.
2. The method of claim 1, wherein after extracting the original case data mapped by the specific type node to obtain the group-entering case data, the method further comprises:
synchronizing the group-entering case data by using a Datax tool, and placing the structured data in the group-entering case data into a standard data table;
splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each group of case data relative to a standard data table, and respectively placing the semi-structured data and the unstructured data into the standard data table.
3. The natural language processing based case grouping screening method of claim 2, further comprising:
performing value range verification on each standard data table constructed based on the group-entering case data, and eliminating standard data tables with overrun data;
and carrying out logic verification on each standard data table with the value range verification completed, and eliminating the standard data table with the defect data against the medical logic.
4. The natural language processing based case grouping screening method of claim 1, further comprising:
storing image data related to the group-entering case data in an image library, and establishing an association relation between the group-entering case data and the image data through a patient main index;
when any group of case data is retrieved and called, corresponding image data is synchronously called based on the association relation.
5. A natural language processing based case entry group screening system, the system comprising:
the label identification unit is used for carrying out primary identification on the original case data by adopting an NLP model to obtain a text label set;
the model building unit is used for building a PLSA classification model based on the original case data and the text label set and carrying out association mapping on the original case data, the text label set and a plurality of types of nodes;
the tag determining unit is used for determining a group-entering tag set by adopting an NLP model based on the group-entering rule text;
the label matching unit is used for matching the group-entering label set with the text label set and determining a specific type node corresponding to the group-entering label set in a latent semantic space;
the data extraction unit is used for extracting the original case data which are mapped by the specific type of nodes in an associated mode to obtain group-entering case data;
the rule text of the group at least comprises a preference rule, an exclusion rule and a remark rule;
and in the original case data, the original case data which accords with the optimization rule and does not accord with the exclusion rule at the same time, or the original case data which accords with the remark rule is the group-entering case data.
6. The natural language processing based case entry group screening system of claim 5, further comprising:
the data synchronization unit is used for synchronizing the group-entering case data by using a Datax tool after the data extraction unit extracts the original case data which is mapped by the specific type node in an associated way to obtain the group-entering case data, and placing the structured data in the group-entering case data into a standard data table;
the data matching unit is used for splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each group of case data relative to the standard data table and respectively placing the semi-structured data and the unstructured data into the standard data table.
7. The natural language processing based case entry group screening system of claim 6, further comprising:
the first eliminating unit is used for verifying the value range of each standard data table constructed based on the group-entering case data and eliminating the standard data table with the overrun data;
and the second eliminating unit is used for carrying out logic verification on each standard data table with the value range verification, and eliminating the standard data table with the defect data against the medical logic.
8. The natural language processing based case entry group screening system of claim 5, further comprising:
the image association unit is used for storing image data related to the group-entering case data in an image library and establishing association relation between the group-entering case data and the image data through a patient main index;
and the image calling unit is used for synchronously calling corresponding image data based on the association relation when any group of case data is searched and called.
CN202111564591.1A 2021-12-20 2021-12-20 Case grouping screening method and system based on natural language processing Active CN114743681B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111564591.1A CN114743681B (en) 2021-12-20 2021-12-20 Case grouping screening method and system based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111564591.1A CN114743681B (en) 2021-12-20 2021-12-20 Case grouping screening method and system based on natural language processing

Publications (2)

Publication Number Publication Date
CN114743681A CN114743681A (en) 2022-07-12
CN114743681B true CN114743681B (en) 2024-01-30

Family

ID=82274760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111564591.1A Active CN114743681B (en) 2021-12-20 2021-12-20 Case grouping screening method and system based on natural language processing

Country Status (1)

Country Link
CN (1) CN114743681B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947858A (en) * 2017-07-26 2019-06-28 腾讯科技(深圳)有限公司 A kind of method and device of data processing
CN110197723A (en) * 2019-07-03 2019-09-03 四川大学华西医院 Clinical somatization classification diagnosis system under psychosomatic medicine theoretical frame
CN110413994A (en) * 2019-06-28 2019-11-05 宁波深擎信息科技有限公司 Hot topic generation method, device, computer equipment and storage medium
CN110570943A (en) * 2019-09-04 2019-12-13 医渡云(北京)技术有限公司 method and device for intelligently recommending MDT (minimization of drive test) grouping, electronic equipment and storage medium
CN111414393A (en) * 2020-03-26 2020-07-14 湖南科创信息技术股份有限公司 Semantic similar case retrieval method and equipment based on medical knowledge graph
CN112948471A (en) * 2019-11-26 2021-06-11 广州知汇云科技有限公司 Clinical medical text post-structured processing platform and method

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032678A1 (en) * 2016-07-29 2018-02-01 International Business Machines Corporation Medical recording system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109947858A (en) * 2017-07-26 2019-06-28 腾讯科技(深圳)有限公司 A kind of method and device of data processing
CN110413994A (en) * 2019-06-28 2019-11-05 宁波深擎信息科技有限公司 Hot topic generation method, device, computer equipment and storage medium
CN110197723A (en) * 2019-07-03 2019-09-03 四川大学华西医院 Clinical somatization classification diagnosis system under psychosomatic medicine theoretical frame
CN110570943A (en) * 2019-09-04 2019-12-13 医渡云(北京)技术有限公司 method and device for intelligently recommending MDT (minimization of drive test) grouping, electronic equipment and storage medium
CN112948471A (en) * 2019-11-26 2021-06-11 广州知汇云科技有限公司 Clinical medical text post-structured processing platform and method
CN111414393A (en) * 2020-03-26 2020-07-14 湖南科创信息技术股份有限公司 Semantic similar case retrieval method and equipment based on medical knowledge graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
An overview of topic modeling and its current applications in bioinformatics;Lin Liu 等;《SpringerPlus》;第1-22页 *
基于潜在语义相关算法的电子病历检索的研究与应用;吴东;《中国优秀硕士学位论文全文数据库 信息科技辑》(第5期);第I138-1343页 *

Also Published As

Publication number Publication date
CN114743681A (en) 2022-07-12

Similar Documents

Publication Publication Date Title
CN111414393B (en) Semantic similar case retrieval method and equipment based on medical knowledge graph
WO2018205609A1 (en) Medical intelligent triage method and device
CN108182207B (en) Intelligent coding method and system for Chinese surgical operation based on word segmentation network
EP2583207B1 (en) Identity matching of patient records
CN108352196A (en) There is no hospital's matching in the health care data library for going mark of apparent standard identifier
CN109920540A (en) Construction method, device and the computer equipment of assisting in diagnosis and treatment decision system
CN112365987A (en) Diagnostic data anomaly detection method and device, computer equipment and storage medium
CN106682411A (en) Method for converting physical examination diagnostic data into disease label
CN106502991B (en) Publication treating method and apparatus
CN108182972A (en) The intelligent coding method and system of Chinese medical diagnosis on disease based on participle network
CN112614565A (en) Traditional Chinese medicine classic famous prescription intelligent recommendation method based on knowledge-graph technology
CN117577350B (en) Training and reasoning method, device, equipment and medium of medical large language model
CN114864107A (en) Clinical pathway variation analysis method, equipment and storage medium
CN110752027B (en) Electronic medical record data pushing method, device, computer equipment and storage medium
CN110534170A (en) Data processing method, device, electronic equipment and computer readable storage medium
CN110069614A (en) A kind of question and answer exchange method and device
CN114743681B (en) Case grouping screening method and system based on natural language processing
WO2021107099A1 (en) Document creation assistance device, document creation assistance method, and program
WO2020203558A1 (en) Learning method and information provision system
Mykowiecka et al. Rule-based medical content extraction and classification
CN113972009A (en) Medical examination consultation system based on clinical examination medical big data
CN114694847A (en) Data processing method, apparatus, medium, and program product
CN112579790A (en) Method and device for constructing severe disease knowledge base, storage medium and electronic equipment
CN111105871A (en) Medical auxiliary diagnosis method and system
CN112712899A (en) Data analysis method based on primary liver cancer big data and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant