CN114743681A - Case grouping screening method and system based on natural language processing - Google Patents

Case grouping screening method and system based on natural language processing Download PDF

Info

Publication number
CN114743681A
CN114743681A CN202111564591.1A CN202111564591A CN114743681A CN 114743681 A CN114743681 A CN 114743681A CN 202111564591 A CN202111564591 A CN 202111564591A CN 114743681 A CN114743681 A CN 114743681A
Authority
CN
China
Prior art keywords
data
case data
grouping
label set
grouped
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111564591.1A
Other languages
Chinese (zh)
Other versions
CN114743681B (en
Inventor
杨�远
刘昊
曹润卿
史俊才
钟炎萤
陈华达
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Health Data Beijing Technology Co ltd
Original Assignee
Health Data Beijing Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Health Data Beijing Technology Co ltd filed Critical Health Data Beijing Technology Co ltd
Priority to CN202111564591.1A priority Critical patent/CN114743681B/en
Publication of CN114743681A publication Critical patent/CN114743681A/en
Application granted granted Critical
Publication of CN114743681B publication Critical patent/CN114743681B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Public Health (AREA)
  • Medical Informatics (AREA)
  • Primary Health Care (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a case grouping screening method and system based on natural language processing, wherein the method comprises the following steps: performing primary identification on original case data by adopting an NLP model to obtain a text label set; constructing a PLSA (partial least squares) classification model, and performing associated mapping on original case data, a text label set and type nodes; determining an grouping label set of a grouping rule text by adopting an NLP model; matching the grouping label set with the text label set, and determining a specific type node corresponding to the grouping label set; and extracting the grouped case data which is associated and mapped by the specific type node. In the embodiment of the invention, the text label set and the grouping label set are obtained by adopting natural language processing, the association mapping between the text label set and the grouping label set and the nodes of various types based on probability distribution is further obtained by adopting PLSA, and the required grouping case data is extracted by matching the grouping label set and the nodes of specific types, so that the grouping screening of the original case data comprising the unstructured data is completed, the process does not need manual intervention, and the accuracy is high.

Description

Case grouping screening method and system based on natural language processing
Technical Field
The invention relates to data processing, in particular to a case grouping and screening method and system based on natural language processing.
Background
The case data distribution is generated at each stage in the diagnosis and treatment process of the patient, is necessary information data in the processes of diagnosis and treatment, follow-up visit, scientific research and the like, and comprises basic patient information, medical history information, auxiliary examination information, operation information, medical advice information, image information and the like in each stage.
In the case data, except that the inherent information such as the name, the certificate number and the like in the patient information is structured data, other information such as medical history information and medical advice information has large text expression input manually, the image information further comprises image information different from the text expression, the information is semi-structured and unstructured information, the necessary effective information is difficult to extract completely by means of detection based on keywords, manual examination and entry of the case data are needed in the process of hospital transfer and the like, the process depends on manual work seriously, time and labor are wasted, and the requirement on occupational literacy of a processor is high.
In addition, when scientific research is carried out on a specific case, the method usually relates to large case data of multiple mechanisms and multiple time periods, classification and identification are carried out before research, case data meeting research requirements are screened out and grouped, and grouping rules mainly exist in unstructured data such as medical history information and operation information and are reflected in a semantic form, and cannot be identified through keywords.
Disclosure of Invention
The embodiment of the invention discloses a case grouping screening method and system based on natural language processing.
The embodiment of the invention discloses a case grouping and screening method based on natural language processing in a first aspect, which comprises the following steps:
performing primary identification on original case data by adopting an NLP model to obtain a text label set;
constructing a PLSA classification model based on the original case data and the text label set, and performing association mapping on the original case data, the text label set and a plurality of type nodes;
determining an grouping label set by adopting an NLP model based on the grouping rule text;
matching the grouped label set with the text label set, and determining a specific type node corresponding to the grouped label set in the implicit semantic space;
and extracting the original case data which are mapped and associated with the specific type nodes to obtain grouped case data.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, the grouping rule text at least includes a preference rule, an exclusion rule, and a remark rule;
and in the original case data, the original case data which conforms to the preferred rule and does not conform to the exclusion rule, or the original case data which conforms to the remark rule is grouped case data.
As an optional implementation manner, in the first aspect of the embodiment of the present invention, after the extracting the original case data mapped in association with the specific type node to obtain the grouped case data, the method further includes:
synchronizing the grouped case data by adopting a Datax tool, and placing the structured data in the grouped case data into a standard data table;
and splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each incident case data relative to a standard data table, and respectively placing the split semi-structured data and the unstructured data into the standard data table.
As an optional implementation manner, in the first aspect of this embodiment of the present invention, the method further includes:
performing value domain verification on each standard data table constructed based on the grouped case data, and removing the standard data table with the overrun data;
and performing logic verification on each standard data table subjected to value range verification, and removing the standard data tables with the defect data violating the medical logic.
As an optional implementation manner, in the first aspect of this embodiment of the present invention, the method further includes:
storing the image data related to the grouping case data in an image library, and establishing an association relation between the grouping case data and the image data through a patient main index;
when any of the group case data is retrieved and called, the corresponding image data is synchronously called based on the association relationship.
The second aspect of the embodiment of the invention discloses a case grouping and screening system based on natural language processing, which comprises:
the label identification unit is used for carrying out primary identification on the original case data by adopting an NLP model to obtain a text label set;
the model construction unit is used for constructing a PLSA classification model based on the original case data and the text label set, and performing association mapping on the original case data, the text label set and a plurality of type nodes;
the tag determining unit is used for determining an grouping tag set by adopting an NLP model based on the grouping rule text;
the label matching unit is used for matching the grouped label set with the text label set and determining a specific type node corresponding to the grouped label set in the implied semantic space;
and the data extraction unit is used for extracting the original case data associated and mapped with the specific type node to obtain the grouped case data.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the grouping rule text at least includes a preference rule, an exclusion rule, and a remarking rule;
and in the original case data, the original case data which conforms to the preferred rule and does not conform to the exclusion rule, or the original case data which conforms to the remark rule is grouped case data.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the system further includes:
the data synchronization unit is used for synchronizing the grouped case data by adopting a Datax tool after the data extraction unit extracts the original case data associated and mapped with the specific type node to obtain the grouped case data, and placing the structured data in the grouped case data into a standard data table;
and the data matching unit is used for splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each grouped case data relative to the standard data table and respectively placing the split semi-structured data and the unstructured data into the standard data table.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the method further includes:
the first removing unit is used for carrying out value domain verification on each standard data table constructed based on the grouped case data and removing the standard data table with the overrun data;
and the second eliminating unit is used for carrying out logic verification on each standard data table subjected to value range verification and eliminating the standard data tables with the defect data which violates medical logic.
As an optional implementation manner, in the second aspect of the embodiment of the present invention, the method further includes:
the image association unit is used for storing the image data related to the grouping case data in an image library and establishing an association relation between the grouping case data and the image data through a patient main index;
and the image calling unit is used for synchronously calling the corresponding image data based on the association relation when any group case data is searched and called.
The third aspect of the embodiments of the present invention discloses a case grouping and screening system based on natural language processing, which includes:
a memory storing executable program code;
a processor coupled with the memory;
the processor calls the executable program code stored in the memory to execute a case grouping and screening method based on natural language processing disclosed in the first aspect of the embodiment of the invention.
A fourth aspect of the embodiments of the present invention discloses a computer-readable storage medium storing a computer program, wherein the computer program enables a computer to execute the method for screening grouping of cases based on natural language processing disclosed in the first aspect of the embodiments of the present invention.
A fifth aspect of embodiments of the present invention discloses a computer program product, which, when run on a computer, causes the computer to perform some or all of the steps of any one of the methods of the first aspect.
A sixth aspect of the present embodiment discloses an application publishing platform, where the application publishing platform is configured to publish a computer program product, where when the computer program product runs on a computer, the computer is caused to execute part or all of the steps of any one of the methods in the first aspect.
Compared with the prior art, the embodiment of the invention has the following beneficial effects:
in the embodiment of the invention, the original case data and the grouping rule text are processed by adopting a natural language processing mode to obtain the text label set and the grouping label set, the original case data and the text label set are further processed by adopting a PLSA classification model to obtain the association mapping between the original case data and the text label set and various types of nodes based on probability distribution, the required grouping case data are extracted by matching the grouping label set and the specific type of nodes, the grouping screening of the original case data comprising unstructured data is completed, the process does not need manual intervention, and the accuracy is high.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art that other drawings can be obtained according to these drawings without creative efforts.
FIG. 1 is a schematic flow chart of a case grouping screening method based on natural language processing according to an embodiment of the present invention;
FIG. 2 is a schematic structural diagram of a case grouping and screening system based on natural language processing according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of another system for screening case grouping based on natural language processing according to an embodiment of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first", "second", "third", "fourth", etc. in the description and claims of the present invention are used for distinguishing different objects, and are not used for describing a specific order. The terms "comprises," "comprising," and "having," and any variations thereof, of embodiments of the present invention are intended to cover non-exclusive inclusions, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements explicitly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The embodiment of the invention discloses a case grouping screening method and system based on natural language processing, wherein original case data and a grouping rule text are processed by adopting a natural language processing mode to obtain a text label set and a grouping label set, the original case data and the text label set are further processed by adopting a PLSA classification model to obtain probability distribution-based association mapping between the original case data and the text label set and various types of nodes, the required grouping case data are extracted by matching the grouping label set and specific types of nodes, grouping screening of the original case data comprising unstructured data is completed, manual intervention is not needed in the process, and the accuracy is high.
Example one
Referring to fig. 1, fig. 1 is a schematic flow chart of a case grouping screening method based on natural language processing according to an embodiment of the present invention. As shown in fig. 1, the case grouping screening method based on natural language processing may include the following steps.
101. And performing initial identification on the original case data by adopting an NLP model to obtain a text label set.
In this embodiment, an NLP model is used to extract the text labels of the keywords in each original case data as the basic identification and matching basis.
102. And constructing a PLSA classification model based on the original case data and the text label set, and performing association mapping on the original case data and the text label set and a plurality of types of nodes.
In this embodiment, a PLSA classification model in which the original case data and the type nodes are associated with each other is established, where each case data is represented by a probability distribution of a text label thereon, and the type nodes are represented by a probability distribution of each case data thereon, so as to form a probability distribution of a double-layer structure, thereby obtaining a probability relationship between the original case data and the type nodes, and determining the type nodes corresponding to the case data based on the probability relationship with the strongest association.
103. And determining an grouping label set by adopting an NLP model based on the grouping rule text.
In this embodiment, the grouping rule text is determined based on the research project, and most of the grouping rule text is a text expression with a long sentence pattern or multi-segment distribution, and is mainly embodied as unstructured data when the research field is more detailed.
As an optional implementation manner, the grouping rule text at least comprises a preference rule, an exclusion rule and a remarking rule; in the original case data, the original case data which meets the optimization rule and does not meet the exclusion rule, or the original case data which meets the remark rule is the grouped case data.
Specifically, taking a breast cancer study as an example here, the preferred rule may be: pathologically confirmed left or right breast malignancies, defined as: carcinomas, sarcomas, malignant or borderline phyllodes, mesenchyme, CDCIS, Paget's disease. (the group was not affected by the presence or absence of other secondary tumors at the time of initial diagnosis).
Its exclusion rules may be:
a. patients with no histologically confirmed malignant breast tumor lesions;
b. breast cancer patients who have not received surgical treatment for the breast and axilla at the home hospital;
c. patients with typical lobular carcinoma in situ, benign breast cancer, mastitis, papilloma, benign leaf tumor, and no malignant lesion are histologically diagnosed;
d. patients who receive mammary gland operation treatment in an outer hospital and obtain a negative margin and are subjected to axillary operation in the same hospital;
e. the primary focus operation is not carried out in the hospital, and the patients with relapse and metastasis appear after the operation;
f. other malignancies metastasize to breast or axillary patients.
The remark rules may be:
a. coarse needle biopsy does not locate the patient for surgical treatment;
b. patients with primary stage iv breast cancer surgery;
c. patients who have undergone resection biopsy (including minimally invasive surgery) of a tumor at a hospital, have trimmed their margins during open surgery at the hospital, or have completed an axillary operation, have a consultation report from the hospital's pathology department based on a white film of the pathological tissue of the tumor at the hospital.
The above, the standard of the optimization rule is wider, and is applicable to primary general screening, the exclusion rule supplements the optimization rule, and the condition that the selection is not performed is determined, and further, the remark standard supplements the optimization rule and the discharge rule, and the special condition that the selection is performed is determined.
104. Matching the grouping label set with the text label set, and determining a specific type node corresponding to the grouping label set in the implicit semantic space.
In this embodiment, under the condition that the grouping label set defines the grouping requirement and the PLSA classification model constructs complete association mapping for the original case data, the grouping label set is matched with the text label set to determine a specific type node consistent with the grouping requirement.
105. And extracting the original case data associated and mapped with the specific type node to obtain grouped case data.
In this embodiment, the original case data mapped and associated with the specific type node is the grouping case data meeting the requirement of the grouping rule text, and the specific type node is extracted according to the specific type node.
In the embodiment, after the required grouping case data is screened out, the grouping is carried out,
as an optional implementation manner, the data x tool is used for synchronizing the grouped case data, and the structured data in the grouped case data is placed in the standard data table; and splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each incident case data relative to a standard data table, and respectively placing the split semi-structured data and the unstructured data into the standard data table.
Therefore, structured data in the grouped case data are correspondingly classified into the standard data table, and semi-structured data and unstructured data are split and grouped based on matching values of the semi-structured data and parameters such as data formats of fields and field requirements in the standard data table.
In the embodiment, the grouped case data is also checked and cleaned, and errors are removed, so that accurate and available grouped case data is obtained.
As an optional implementation manner, performing value domain verification on each standard data table constructed based on the grouped case data, and removing the standard data table with the overrun data;
and performing logic verification on each standard data table subjected to value range verification, and removing the standard data tables with the defect data violating the medical logic.
Here, the isomorphic value field check is used for eliminating the grouping case data with overrun data (such as negative value age), and the logic check is used for eliminating the grouping case data with medical logic errors (such as operation date earlier than pathological date before treatment), so that the negative influence of invalid data on the research is avoided.
In this embodiment, the image data is stored independently with respect to the grouped case data, and an association relationship is established with the grouped case data in a text format.
As an optional implementation manner, the image data related to the group case data is stored in an image library, and an association relationship is established between the group case data and the image data through a patient main index;
when any of the group case data is retrieved and called, the corresponding image data is synchronously called based on the association relationship.
Therefore, special processing on the image data is not needed, and the influence on the accuracy of the image data is avoided.
In summary, the original case data and the grouping rule text are processed by adopting a natural language processing mode to obtain a text label set and a grouping label set, the original case data and the text label set are further processed by adopting a PLSA classification model to obtain probability distribution-based association mapping between the original case data and the text label set and between the original case data and each type of node, the required grouping case data are extracted by matching the grouping label set and the specific type of node, grouping screening of the original case data including unstructured data is completed, manual intervention is not needed in the process, and the accuracy is high.
Example two
Referring to fig. 2, fig. 2 is a schematic structural diagram of a case grouping and screening system based on natural language processing according to an embodiment of the present invention. As shown in fig. 2, the system for screening case grouping based on natural language processing may include:
a tag identification unit 201, configured to perform initial identification on original case data by using an NLP model, so as to obtain a text tag set;
the model construction unit 202 is configured to construct a PLSA classification model based on the original case data and the text label set, and perform associated mapping on the original case data, the text label set, and a plurality of type nodes;
the tag determining unit 203 is configured to determine an entry tag set by using an NLP model based on the entry rule text;
wherein the grouping rule text at least comprises an optimization rule, an exclusion rule and a remark rule;
in the original case data, the original case data which meets the optimization rule and does not meet the exclusion rule, or the original case data which meets the remark rule is the grouped case data.
A tag matching unit 204, configured to match the grouped tag set with the text tag set, and determine a specific type node corresponding to the grouped tag set in an implied semantic space;
the data extraction unit 205 is configured to extract original case data mapped in association with a specific type node to obtain grouped case data;
the data synchronization unit 206 is configured to, after the data extraction unit extracts the original case data mapped in association with the specific type node to obtain the grouped case data, synchronize the grouped case data by using a Datax tool, and place the structured data in the grouped case data into a standard data table;
the data matching unit 207 is used for splitting the semi-structured data and the unstructured data according to a matching value of the semi-structured data and the unstructured data contained in each grouped case data relative to the standard data table and respectively placing the split semi-structured data and the unstructured data into the standard data table;
the first eliminating unit 208 is used for performing value domain verification on each standard data table constructed based on the grouped case data and eliminating the standard data table with the overrun data;
a second eliminating unit 209, configured to perform logic verification on each standard data table subjected to value range verification, and eliminate standard data tables having defective data that violates medical logic;
the image association unit 210 is configured to store image data related to the group entry case data in an image library, and establish an association relationship between the group entry case data and the image data through a patient main index;
the image retrieving unit 211 is configured to retrieve and retrieve any one of the incoming group case data, and retrieve corresponding image data synchronously based on the association relationship.
As an optional implementation manner, the data synchronization unit 206 synchronizes the grouped case data by using a Datax tool, and places the structured data in the grouped case data into a standard data table; the data matching unit 207 splits the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data included in each entered group case data relative to the standard data table, and places the split semi-structured data and the unstructured data into the standard data table respectively.
Therefore, structured data in the grouped case data are correspondingly classified into the standard data table, and semi-structured data and unstructured data are split and grouped based on matching values of the semi-structured data and parameters such as data formats of fields and field requirements in the standard data table.
As an optional implementation manner, the first removing unit 208 performs value domain verification on each standard data table constructed based on the grouped case data, and removes the standard data table with the overrun data;
the second culling unit 209 performs logic verification on each standard data sheet for which value range verification is completed, and culls the standard data sheet in which defective data against medical logic exists.
Here, the isomorphic value field check is used for eliminating the grouping case data with overrun data (such as negative value age), and the logic check is used for eliminating the grouping case data with medical logic errors (such as operation date earlier than pathological date before treatment), so that the negative influence of invalid data on the research is avoided.
In this embodiment, the image data is stored independently with respect to the grouped case data, and an association relationship is established with the grouped case data in a text format.
As an optional implementation manner, the image association unit 210 stores the image data related to the grouped case data in an image library, and establishes an association relationship between the grouped case data and the image data through a patient main index;
when retrieving and calling any one of the pieces of incoming group case data, image calling section 211 synchronously calls the corresponding image data based on the association relationship.
Therefore, special processing on the image data is not needed, and the influence on the accuracy of the image data is avoided.
In summary, the original case data and the grouping rule text are processed by adopting a natural language processing mode to obtain a text label set and a grouping label set, the original case data and the text label set are further processed by adopting a PLSA classification model to obtain probability distribution-based association mapping between the original case data and the text label set and between the original case data and each type of node, the required grouping case data are extracted by matching the grouping label set and the specific type of node, grouping screening of the original case data including unstructured data is completed, manual intervention is not needed in the process, and the accuracy is high.
EXAMPLE III
Referring to fig. 3, fig. 3 is a schematic structural diagram of another example grouping screening system based on natural language processing according to an embodiment of the present invention. As shown in fig. 3, the system for screening cases into groups based on natural language processing may include:
a memory 301 storing executable program code;
a processor 302 coupled to the memory 301;
the processor 302 calls the executable program code stored in the memory 301 to execute a case grouping screening method based on natural language processing of fig. 1.
The embodiment of the invention discloses a computer-readable storage medium which stores a computer program, wherein the computer program enables a computer to execute the case grouping screening method based on natural language processing in the figure 1.
Embodiments of the present invention also disclose a computer program product, wherein, when the computer program product is run on a computer, the computer is caused to execute part or all of the steps of the method as in the above method embodiments.
It will be understood by those skilled in the art that all or part of the steps in the methods of the embodiments described above may be implemented by hardware instructions of a program, and the program may be stored in a computer-readable storage medium, where the storage medium includes Read-Only Memory (ROM), Random Access Memory (RAM), Programmable Read-Only Memory (PROM), Erasable Programmable Read-Only Memory (EPROM), One-time Programmable Read-Only Memory (OTPROM), Electrically Erasable Programmable Read-Only Memory (EEPROM), Compact Disc Read-Only Memory (CD-ROM), or other Memory, such as a magnetic disk, or a combination thereof, A tape memory, or any other medium readable by a computer that can be used to carry or store data.
The method and the system for grouping and screening cases based on natural language processing disclosed by the embodiment of the invention are described in detail, a specific example is applied in the method to explain the principle and the implementation mode of the invention, and the description of the embodiment is only used for helping to understand the method and the core idea of the invention; meanwhile, for a person skilled in the art, according to the idea of the present invention, there may be variations in the specific embodiments and the application scope, and in summary, the content of the present specification should not be construed as a limitation to the present invention.

Claims (10)

1. A method for case grouping screening based on natural language processing, the method comprising:
performing primary identification on original case data by adopting an NLP model to obtain a text label set;
constructing a PLSA classification model based on the original case data and the text label set, and performing association mapping on the original case data, the text label set and a plurality of type nodes;
based on the grouping rule text, determining a grouping label set by adopting an NLP model;
matching the grouped label set with the text label set, and determining a specific type node corresponding to the grouped label set in the implicit semantic space;
and extracting the original case data associated and mapped with the specific type node to obtain grouped case data.
2. The method as claimed in claim 1, wherein the grouping rule text at least includes a preference rule, an exclusion rule and a remark rule;
and in the original case data, the original case data which conforms to the preferred rule and does not conform to the exclusion rule, or the original case data which conforms to the remark rule is grouped case data.
3. The method as claimed in claim 1, wherein after the extracting of the original case data mapped to the specific type of node to obtain the grouped case data, the method further comprises:
synchronizing the grouped case data by adopting a Datax tool, and placing the structured data in the grouped case data into a standard data table;
and splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each incident case data relative to a standard data table, and respectively placing the split semi-structured data and the unstructured data into the standard data table.
4. The method as claimed in claim 3, further comprising:
performing value domain verification on each standard data table constructed based on the grouped case data, and removing the standard data table with the overrun data;
and performing logic verification on each standard data table subjected to value range verification, and removing the standard data tables with the defect data violating the medical logic.
5. The method as claimed in claim 1, further comprising:
storing the image data related to the grouping case data in an image library, and establishing an incidence relation between the grouping case data and the image data through a patient main index;
when any group case data is retrieved and called, corresponding image data is synchronously called based on the association relation.
6. A system for case grouping screening based on natural language processing, the system comprising:
the label identification unit is used for carrying out primary identification on the original case data by adopting an NLP (non line segment) model to obtain a text label set;
the model construction unit is used for constructing a PLSA classification model based on the original case data and the text label set, and performing association mapping on the original case data, the text label set and a plurality of type nodes;
the tag determining unit is used for determining an grouping tag set by adopting an NLP model based on the grouping rule text;
the label matching unit is used for matching the grouped label set with the text label set and determining a specific type node corresponding to the grouped label set in the implied semantic space;
and the data extraction unit is used for extracting the original case data associated and mapped with the specific type node to obtain the grouped case data.
7. The system of claim 6, wherein the grouping rules text comprises at least preference rules, exclusion rules, and remark rules;
and in the original case data, the original case data which conforms to the preferred rule and does not conform to the exclusion rule, or the original case data which conforms to the remark rule is grouped case data.
8. The system of claim 6, further comprising:
the data synchronization unit is used for synchronizing the grouped case data by adopting a Datax tool after the data extraction unit extracts the original case data associated and mapped with the specific type node to obtain the grouped case data, and placing the structured data in the grouped case data into a standard data table;
and the data matching unit is used for splitting the semi-structured data and the unstructured data according to the matching value of the semi-structured data and the unstructured data contained in each grouping case data relative to the standard data table and respectively placing the split semi-structured data and the unstructured data into the standard data table.
9. The system of claim 8, wherein the method further comprises:
the first removing unit is used for carrying out value domain verification on each standard data table constructed based on the grouped case data and removing the standard data table with the overrun data;
and the second eliminating unit is used for carrying out logic verification on each standard data table subjected to value range verification and eliminating the standard data tables with defective data violating medical logic.
10. The system of claim 6, wherein the method further comprises:
the image association unit is used for storing the image data related to the grouping case data in an image library and establishing an association relation between the grouping case data and the image data through a patient main index;
and the image calling unit is used for synchronously calling the corresponding image data based on the association relation when any group case data is searched and called.
CN202111564591.1A 2021-12-20 2021-12-20 Case grouping screening method and system based on natural language processing Active CN114743681B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111564591.1A CN114743681B (en) 2021-12-20 2021-12-20 Case grouping screening method and system based on natural language processing

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111564591.1A CN114743681B (en) 2021-12-20 2021-12-20 Case grouping screening method and system based on natural language processing

Publications (2)

Publication Number Publication Date
CN114743681A true CN114743681A (en) 2022-07-12
CN114743681B CN114743681B (en) 2024-01-30

Family

ID=82274760

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111564591.1A Active CN114743681B (en) 2021-12-20 2021-12-20 Case grouping screening method and system based on natural language processing

Country Status (1)

Country Link
CN (1) CN114743681B (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032678A1 (en) * 2016-07-29 2018-02-01 International Business Machines Corporation Medical recording system
CN109947858A (en) * 2017-07-26 2019-06-28 腾讯科技(深圳)有限公司 A kind of method and device of data processing
CN110197723A (en) * 2019-07-03 2019-09-03 四川大学华西医院 Clinical somatization classification diagnosis system under psychosomatic medicine theoretical frame
CN110413994A (en) * 2019-06-28 2019-11-05 宁波深擎信息科技有限公司 Hot topic generation method, device, computer equipment and storage medium
CN110570943A (en) * 2019-09-04 2019-12-13 医渡云(北京)技术有限公司 method and device for intelligently recommending MDT (minimization of drive test) grouping, electronic equipment and storage medium
CN111414393A (en) * 2020-03-26 2020-07-14 湖南科创信息技术股份有限公司 Semantic similar case retrieval method and equipment based on medical knowledge graph
CN112948471A (en) * 2019-11-26 2021-06-11 广州知汇云科技有限公司 Clinical medical text post-structured processing platform and method

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180032678A1 (en) * 2016-07-29 2018-02-01 International Business Machines Corporation Medical recording system
CN109947858A (en) * 2017-07-26 2019-06-28 腾讯科技(深圳)有限公司 A kind of method and device of data processing
CN110413994A (en) * 2019-06-28 2019-11-05 宁波深擎信息科技有限公司 Hot topic generation method, device, computer equipment and storage medium
CN110197723A (en) * 2019-07-03 2019-09-03 四川大学华西医院 Clinical somatization classification diagnosis system under psychosomatic medicine theoretical frame
CN110570943A (en) * 2019-09-04 2019-12-13 医渡云(北京)技术有限公司 method and device for intelligently recommending MDT (minimization of drive test) grouping, electronic equipment and storage medium
CN112948471A (en) * 2019-11-26 2021-06-11 广州知汇云科技有限公司 Clinical medical text post-structured processing platform and method
CN111414393A (en) * 2020-03-26 2020-07-14 湖南科创信息技术股份有限公司 Semantic similar case retrieval method and equipment based on medical knowledge graph

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIN LIU 等: "An overview of topic modeling and its current applications in bioinformatics", 《SPRINGERPLUS》, pages 1 - 22 *
吴东: "基于潜在语义相关算法的电子病历检索的研究与应用", 《中国优秀硕士学位论文全文数据库 信息科技辑》, no. 5, pages 138 - 1343 *

Also Published As

Publication number Publication date
CN114743681B (en) 2024-01-30

Similar Documents

Publication Publication Date Title
CN110765770B (en) Automatic contract generation method and device
CN109582955B (en) Method, apparatus and medium for standardizing medical terms
CN110059697B (en) Automatic lung nodule segmentation method based on deep learning
CN112365987A (en) Diagnostic data anomaly detection method and device, computer equipment and storage medium
CN103530334B (en) Based on the data matching system and method for comparing template
CN103473375A (en) Data cleaning method and data cleaning system
US20090287663A1 (en) Disease name input support program, method and apparatus
Bertram et al. Computer-assisted mitotic count using a deep learning–based algorithm improves interobserver reproducibility and accuracy
CN106502991B (en) Publication treating method and apparatus
CN113488180B (en) Clinical guideline knowledge modeling method and system
CN110019542B (en) Generation of enterprise relationship, generation of organization member database and identification of same name member
Aggarwal et al. Semantic and content-based medical image retrieval for lung cancer diagnosis with the inclusion of expert knowledge and proven pathology
CN113743463B (en) Tumor benign and malignant recognition method and system based on image data and deep learning
CN114864107A (en) Clinical pathway variation analysis method, equipment and storage medium
CN108170691A (en) It is associated with the determining method and apparatus of document
CN110752027B (en) Electronic medical record data pushing method, device, computer equipment and storage medium
Tafavvoghi et al. Publicly available datasets of breast histopathology H&E whole-slide images: A scoping review
WO2021107099A1 (en) Document creation assistance device, document creation assistance method, and program
CN114743681A (en) Case grouping screening method and system based on natural language processing
CN110853716B (en) Medical record template creation method and device
CN116206767A (en) Disease knowledge mining method, device, electronic equipment and storage medium
Oh et al. 3D auto-segmentation of biliary structure of living liver donors using magnetic resonance cholangiopancreatography for enhanced preoperative planning
CN116910650A (en) Data identification method, device, storage medium and computer equipment
Gellatly Reconstructing historical populations from genealogical data files
Wessel Lindberg et al. Quantitative tumor heterogeneity assessment on a nuclear population basis

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant