CN111199782B - Etiology analysis method, device, storage medium and electronic equipment - Google Patents

Etiology analysis method, device, storage medium and electronic equipment Download PDF

Info

Publication number
CN111199782B
CN111199782B CN201911396700.6A CN201911396700A CN111199782B CN 111199782 B CN111199782 B CN 111199782B CN 201911396700 A CN201911396700 A CN 201911396700A CN 111199782 B CN111199782 B CN 111199782B
Authority
CN
China
Prior art keywords
data
attribute item
value
types
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201911396700.6A
Other languages
Chinese (zh)
Other versions
CN111199782A (en
Inventor
孙浩
侯广健
刘满兰
刘志鹏
邹存璐
王�锋
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201911396700.6A priority Critical patent/CN111199782B/en
Publication of CN111199782A publication Critical patent/CN111199782A/en
Application granted granted Critical
Publication of CN111199782B publication Critical patent/CN111199782B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H10/00ICT specially adapted for the handling or processing of patient-related medical or healthcare data
    • G16H10/60ICT specially adapted for the handling or processing of patient-related medical or healthcare data for patient-specific data, e.g. for electronic patient records
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02ATECHNOLOGIES FOR ADAPTATION TO CLIMATE CHANGE
    • Y02A90/00Technologies having an indirect contribution to adaptation to climate change
    • Y02A90/10Information and communication technologies [ICT] supporting adaptation to climate change, e.g. for weather forecasting or climate simulation

Abstract

The present disclosure relates to a method, an apparatus, a storage medium, and an electronic device for etiology analysis, so as to provide a new etiology analysis method, and implement etiology automation analysis. The method comprises the following steps: acquiring sample data of a control group and sample data of a case group, wherein the sample data comprises various attribute items of a sample and value data of the sample under each attribute item, and the symptoms of each case in the case group are the same; determining the data type of each attribute item according to the value data under each attribute item; inputting the information of each attribute item of the control group and the case group into a data processing model to obtain a target attribute item which is output by the data processing model and is related to the symptoms; the attribute item information comprises value data of an attribute item and a data type of the attribute item, and the data processing model is used for processing the value data of the attribute item according to a data processing algorithm corresponding to the data type of the attribute item.

Description

Etiology analysis method, device, storage medium and electronic equipment
Technical Field
The present disclosure relates to the field of computer technology, and in particular, to a method, an apparatus, a storage medium, and an electronic device for etiology analysis.
Background
Etiology analysis is an important research direction in the field of medical science, and mainly explores the reasons for occurrence of diseases, the mutual effects among related factors and the influence of each factor on occurrence and development of the diseases.
In the related art, the etiology analysis process mainly includes three steps: general data analysis, single factor analysis and multi-factor analysis. When general data analysis is performed, scientific researchers are required to label each variable word in the data to be analyzed, all variables in the data to be analyzed are classified based on labeling results, and then different algorithms are adopted for analyzing each type of variables. When single-factor analysis and multi-factor analysis are performed, the variables are required to be classified and labeled again, and then the reclassified variables of different types are analyzed. In this way, a significant amount of personnel is required to manually label each variable word. In the process of manually labeling a large number of variables, the types of the variables are easily mislabeled, and the mislabeling of the types of the variables can cause obvious inaccuracy of the etiology analysis result. However, if the result of the etiology analysis is obviously inaccurate, the scientific researchers can label and check a large number of variable words, which is certainly time-consuming, and if all variables are remarked, a large amount of time is also consumed. Therefore, the manual labeling mode makes the labor cost of the disease analysis high and the efficiency low.
Disclosure of Invention
The invention aims to provide a method, a device, a storage medium and electronic equipment for analyzing etiology, so as to provide a novel method for analyzing the etiology and realize automatic analysis of the etiology.
To achieve the above object, according to a first aspect of embodiments of the present disclosure, there is provided a etiology analysis method including:
acquiring sample data of a control group and sample data of a case group, wherein the sample data comprises various attribute items of a sample and value data of the sample under each attribute item, and the symptoms of each case in the case group are the same;
determining the data type of each attribute item according to the value data under each attribute item;
inputting the information of each attribute item of the control group and the case group into a data processing model to obtain a target attribute item which is output by the data processing model and is related to the symptoms;
the attribute item information comprises value data of an attribute item and a data type of the attribute item, and the data processing model is used for processing the value data of the attribute item according to a data processing algorithm corresponding to the data type of the attribute item.
Optionally, the determining the data type of each attribute item according to the valued data under each attribute item includes:
Determining that the data types of the attribute items with two value types of the value data are qualitative comparability types;
determining that the value types of the value data are not two, the value data are numerical data, and the data types of the attribute items, of which the value data accord with normal distribution, are quantitative types;
determining that the value types of the value data are not two, the value data are numerical data, and the data types of the attribute items, of which the value data do not accord with normal distribution, are the qualitative comparability types;
determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items in the knowledge base of the value data are qualitative incomparable types;
and determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items of which the value data exist in the knowledge base are the qualitative comparison types.
Optionally, the processing of the value data of each attribute item by the data processing model includes:
for the attribute items with the data types of the quantitative types, checking at least one of rank sum checking, T checking and T' checking to obtain a first intermediate attribute item;
For attribute items with data types being qualitative types, checking through a chi-square checking algorithm to obtain a second intermediate attribute item, wherein the qualitative types comprise the qualitative comparable type and the qualitative incomparable type;
and carrying out single factor analysis on the first intermediate attribute item and the second intermediate attribute item to obtain a first target attribute item related to the disorder, wherein the target attribute item comprises the first target attribute item.
Optionally, the single factor analysis includes performing a segmentation discretization process on the value data of each attribute item in the first intermediate attribute item, where a segmentation process in the segmentation discretization process includes:
determining a numerical interval of the attribute item according to the maximum value and the minimum value of the attribute item;
segmenting the numerical value interval according to each super parameter in a preset super parameter space to obtain a segmented interval sequence set under all segmentation conditions;
and calculating a P value representing the statistical significance of each segmented interval sequence in the segmented interval sequence set, and taking the segmented interval sequence with the minimum P value as a segmentation result.
Optionally, the processing of the value data of each attribute item by the data processing model further includes:
Performing multi-factor analysis on the first target attribute item to obtain a second target attribute item, wherein the target attribute item comprises the second target attribute item;
wherein the multi-factor analysis comprises:
generating a corresponding number of dummy variables according to the type of the value data of each attribute item of which the data type is the qualitative incomparable type in the first target attribute item;
and generating a comparability coefficient corresponding to each value data under the attribute item according to each dummy variable of the attribute item.
According to a second aspect of embodiments of the present disclosure, there is provided a etiology analysis device, the device comprising:
the acquisition module is used for acquiring sample data of a control group and sample data of a case group, wherein the sample data comprises various attribute items of a sample and value data of the sample under each attribute item, and the symptoms of each case in the case group are the same;
the determining module is used for determining the data type of each attribute item according to the value data under each attribute item;
the input module is used for inputting the information of each attribute item of the control group and the case group into a data processing model to obtain a target attribute item which is output by the data processing model and is related to the symptoms;
The attribute item information comprises value data of an attribute item and a data type of the attribute item, and the data processing model is used for processing the value data of the attribute item according to a data processing algorithm corresponding to the data type of the attribute item.
Optionally, the determining module includes:
the first determination submodule is used for determining that the data types of the attribute items with two value types of the value data are qualitative comparability types;
the second determination submodule is used for determining that the value types of the value data are not two, the value data are numerical value data, and the data types of the attribute items, of which the value data accord with normal distribution, are quantitative types;
the third determining submodule is used for determining that the value types of the value data are not two, the value data are numerical value data, and the data types of the attribute items, of which the value data do not accord with normal distribution, are the qualitative comparability types;
the fourth determination submodule is used for determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items in the knowledge base of the value data are qualitative incomparable types;
and the fifth determination submodule is used for determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items of which the value data exist in the knowledge base are the qualitative comparison types.
Optionally, the data processing model is configured to:
for the attribute items with the data types of the quantitative types, checking at least one of rank sum checking, T checking and T' checking to obtain a first intermediate attribute item;
for attribute items with data types being qualitative types, checking through a chi-square checking algorithm to obtain a second intermediate attribute item, wherein the qualitative types comprise the qualitative comparable type and the qualitative incomparable type;
and carrying out single factor analysis on the first intermediate attribute item and the second intermediate attribute item to obtain a first target attribute item related to the disorder, wherein the target attribute item comprises the first target attribute item.
Optionally, the single factor analysis includes performing a segmentation discretization process on the value data of each attribute item in the first intermediate attribute item, where a segmentation process in the segmentation discretization process includes:
determining a numerical interval of the attribute item according to the maximum value and the minimum value of the attribute item;
segmenting the numerical value interval according to each super parameter in a preset super parameter space to obtain a segmented interval sequence set under all segmentation conditions;
And calculating a P value representing the statistical significance of each segmented interval sequence in the segmented interval sequence set, and taking the segmented interval sequence with the minimum P value as a segmentation result.
Optionally, the data processing model is further configured to:
performing multi-factor analysis on the first target attribute item to obtain a second target attribute item, wherein the target attribute item comprises the second target attribute item;
wherein the multi-factor analysis comprises:
generating a corresponding number of dummy variables according to the type of the value data of each attribute item of which the data type is the qualitative incomparable type in the first target attribute item;
and generating a comparability coefficient corresponding to each value data under the attribute item according to each dummy variable of the attribute item.
By adopting the technical scheme, at least the following technical effects can be achieved:
acquiring sample data of a control group and sample data of a case group, wherein the sample data comprises various attribute items of a sample and value data of the sample under each attribute item; determining the data type of each attribute item according to the value data under each attribute item; in this way, the data type of each attribute item is automatically determined without manually classifying and labeling each attribute item. And inputting the information of each attribute item after the data types of the control group and the case group are determined into a data processing model for processing, and obtaining a target attribute item which is output by the data processing model and is related to the symptoms of the case group. The etiology analysis mode does not need to manually participate in the analysis process, realizes the etiology automatic analysis, and can avoid the problems in the related technology.
Additional features and advantages of the present disclosure will be set forth in the detailed description which follows.
Drawings
The accompanying drawings are included to provide a further understanding of the disclosure, and are incorporated in and constitute a part of this specification, illustrate the disclosure and together with the description serve to explain, but do not limit the disclosure. In the drawings:
fig. 1 is a flow chart illustrating a method of etiology analysis according to an exemplary embodiment of the present disclosure.
FIG. 2 is a flow chart illustrating one method of determining a data type of an attribute item according to an exemplary embodiment of the present disclosure.
FIG. 3 is a flow chart illustrating another method of determining a data type of an attribute item according to an exemplary embodiment of the present disclosure.
Fig. 4 is a block diagram illustrating a etiology analysis apparatus according to an exemplary embodiment of the present disclosure.
Fig. 5 is a block diagram of an electronic device, according to an exemplary embodiment of the present disclosure.
Detailed Description
Specific embodiments of the present disclosure are described in detail below with reference to the accompanying drawings. It should be understood that the detailed description and specific examples, while indicating and illustrating the disclosure, are not intended to limit the disclosure.
Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples are not representative of all implementations consistent with the present disclosure. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present disclosure as detailed in the accompanying claims.
Etiology analysis is an important research direction in the field of medical science, and mainly explores the reasons for occurrence of diseases, the mutual effects among related factors and the influence of each factor on occurrence and development of the diseases. That is, the etiology analysis is to conduct scientific research on the disease cause of the patient after diagnosing the disease.
In the related art, the etiology analysis process is mainly divided into three parts: general data analysis, single factor analysis and multi-factor analysis. When general data analysis is performed, scientific researchers are required to label each variable word in the data to be analyzed, all variables in the data to be analyzed are classified based on labeling results, and then different algorithms are adopted for analyzing each type of variables. After the end of the general data analysis, single-factor analysis and multi-factor analysis are performed based on the result of the general data analysis. The classification requirements of the single-factor analysis and the multi-factor analysis methods on the variables are different from those of the general data analysis, so that when the single-factor analysis and the multi-factor analysis are carried out, the variables are required to be classified and marked again, and then the reclassified variables of different types are analyzed.
In this way, a great deal of manpower is required to manually label each variable word during the etiology analysis. In the process of manually labeling a large number of variables, the types of the variables are easily mislabeled, and the mislabeling of the types of the variables can cause inaccurate etiology analysis results. However, if the result of the etiology analysis is inaccurate, the scientific researchers can label and check a large number of variable words, which is certainly time-consuming, and if all variables are remarked, a large amount of time is also consumed. Therefore, the manual labeling mode makes the labor cost of the disease analysis high and the efficiency low. When the result of the etiology analysis is inaccurate, the obtained data has no value, and can not provide clues for the targeted experimental design, such as RCT, queue research and the like.
In view of the above, the embodiments of the present disclosure provide a method, an apparatus, a storage medium, and an electronic device for analyzing a cause of a disease, so as to provide a new method for analyzing a cause of a disease, and implement automated analysis of a cause of a disease, thereby solving the problems in the related art.
FIG. 1 is a flow chart illustrating a method of etiology analysis, as shown in FIG. 1, according to an exemplary embodiment of the present disclosure, the method comprising:
S101, acquiring sample data of a control group and sample data of a case group, wherein the sample data comprises various attribute items of a sample and value data of the sample under each attribute item, and the symptoms of all cases in the case group are the same.
In performing the etiology analysis, it is first determined which disorder is to be etiologically analyzed. Two sets of sample data are then selected, one set being a case set for which a disease has been diagnosed and one set being a control set for a condition other than the case set. It is worth noting that the condition is the same for each case in the case group. Illustratively, analyzing the etiology of gastric cancer conditions, two sets of sample data are selected: one group is a gastric cancer group and one group is a non-gastric cancer group.
The selected two groups of sample data comprise a plurality of attribute items of each sample and the value data of each sample under each attribute item. By way of example, the attribute items included in the sample data may be the name, sex, ethnicity, age, academic, systolic blood pressure, diastolic blood pressure, blood glucose content, etc. of each sample; the value data of each sample under the attribute term refers to the specific value of each sample under each attribute term. For example, the value data of the sample a in the control group under the name attribute item is Zhang three, and the value data of the sample B in the case group under the name attribute item is Lifour; for example, the data of sample A under the blood pressure systolic attribute item is 110mmHg, and the data of sample B under the blood pressure systolic attribute item is 100mmHg.
Wherein, it should be understood by those of ordinary skill in the art that each sample in the control group and the case group has the same number of the same attribute items when performing the etiology analysis. Sample a has 100 attribute entries, and sample B has the same 100 attribute entries, for example.
In one implementation, sample data for the control group and the case group may be obtained from a clinical data center CDR database.
S102, determining the data type of each attribute item according to the value data under each attribute item.
The data type of each attribute item can be determined according to the value data under each attribute item. For example, if the attribute item is a ethnic attribute item, the valued data of the ethnic attribute item of each sample may be a han family, a Miao family, a Hui family, or the like; based on these valued data of the ethnic property items of all samples, the data type of the ethnic property item can be determined.
As another example, if the attribute item is a blood pressure diastolic pressure, the blood pressure diastolic pressure may be 120mmhg,100mmhg,80mmhg, or the like; according to the value data of the blood pressure diastolic blood pressure attribute items, the data type of the blood pressure diastolic blood pressure attribute items can be determined.
According to the method for determining the data type of each attribute item according to the value data under each attribute item, scientific researchers do not need to manually mark the data type of each attribute item, so that the labor cost in the etiology analysis process is reduced.
S103, inputting the information of each attribute item of the control group and the case group into a data processing model to obtain a target attribute item which is output by the data processing model and is related to the symptoms.
The attribute item information comprises value data of an attribute item and a data type of the attribute item, and the data processing model is used for processing the value data of the attribute item according to a data processing algorithm corresponding to the data type of the attribute item.
After the data type of each attribute item of the control group and the case group is determined, inputting the information of each attribute item into the data processing model for analysis and processing to obtain the target attribute item which is output by the data processing model and is related to the symptoms of the case group.
The target attribute item is the result of etiology analysis and is a risk factor for causing diseases. For example, assuming a causal analysis of gastric cancer conditions, the target attribute items that lead to gastric cancer may be diet, stay up, etc.
It will be appreciated by those skilled in the art that in the course of etiology analysis, different data processing algorithms may be employed to process the value data of each attribute item for attribute items of different data types. For example, in the related art, the value data of the blood pressure systolic pressure attribute item, the blood pressure diastolic pressure attribute item, and the age attribute item are subjected to the normalization test, then the attribute item conforming to the normalization test is subjected to the T test, and the attribute item not conforming to the normalization test is subjected to the rank sum test.
Thus, in the present disclosure, the attribute item information in the input data processing model includes the value data of the attribute item and the data type of the attribute item. The data processing model is used for selecting a corresponding data processing algorithm according to the data type of the attribute item to process the value data of the attribute item.
By adopting the method, the sample data of the control group and the sample data of the case group are obtained, wherein the sample data comprises various attribute items of the sample and the value data of the sample under each attribute item; determining the data type of each attribute item according to the value data under each attribute item; in this way, the data type of each attribute item is automatically determined without manually classifying and labeling each attribute item. And inputting the information of each attribute item after the data types of the control group and the case group are determined into a data processing model for processing, and obtaining a target attribute item which is output by the data processing model and is related to the symptoms of the case group. The etiology analysis mode does not need to manually participate in the analysis process, thereby realizing the etiology automatic analysis, and the etiology automatic analysis can avoid the problems caused by manually marking attribute items in the related technology.
In a possible implementation manner, as shown in fig. 2, the determining the data type of each attribute item according to the value data under each attribute item may include the following steps:
s201, determining that the data types of the attribute items with two value types of the value data are qualitative comparability types.
If the types of the valued data of the attribute item are two, determining that the data type of the attribute item is a qualitative comparison type. That is, if the value data of the attribute item is either a or B, the data type of the attribute item is a qualitatively comparable type. Illustratively, the value data of the attribute item is 0 or 1; the value data of the attribute items are yes or no; the value data of the attribute items is 0.01 or 0.02; the data type of such attribute items is determined to be a qualitatively comparable type.
S202, determining that the value types of the value data are not two, the value data are numerical value data, and the data types of the attribute items, of which the value data accord with the normal distribution, are quantitative types.
The types of the value data of the attribute items are not two, namely that the types of the value data are one, three or more.
Numerical (numerical) data is often characterized by the letter N, which is data consisting of numbers, decimal points, signs, and the letter E.
If the types of the value data of the attribute item are not two, the value data are numerical data, and the value data accord with the normal distribution, the data type of the attribute item is determined to be a quantitative type. In one implementation, whether the value data of the attribute item conforms to the normal distribution can be verified by a normal check.
S203, determining that the value types of the value data are not two, the value data are numerical value data, and the data types of the attribute items, of which the value data do not accord with the normal distribution, are the qualitative comparison types.
If the types of the value data of the attribute item are not two, the value data are numerical data, and the value data do not accord with the normal distribution, the data type of the attribute item is determined to be a qualitative comparison type.
S204, determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items in the knowledge base, which are not stored in the value data, are qualitative incomparable types.
The non-numeric data is single character data or character string data having no computing power such as chinese characters, english characters, numeric characters, ascii characters, and the like.
Knowledge base refers to a knowledge base related to medical treatment. The knowledge base is established after text analysis, word segmentation, part-of-speech tagging and other processing are carried out on medical data. In the knowledge base, according to the value data of the attribute items, the value data of the attribute items are segmented to obtain a plurality of value intervals, and then each value interval corresponds to a value representing a conclusion, such as high, medium and low conclusion words.
By way of example, it will be appreciated by those of ordinary skill in the art that in the medical arts, multiple value intervals for certain attribute items correspond to the conclusion categories for that attribute item, respectively. For example, the conclusion category corresponding to the blood pressure systolic pressure value in the 120-130mmHg interval is normal systolic pressure; the conclusion category corresponding to the blood pressure systolic pressure value in the 130-140mmHg interval is light high systolic pressure; the corresponding conclusion category is high systolic pressure when the value of the systolic blood pressure is more than 140 mmHg.
Then for the attribute items of which the value data is normal contraction pressure, slight high contraction pressure and high contraction pressure, if the value data does not exist in the knowledge base, the data type of the attribute item is determined to be a qualitative incomparable type.
It should be noted that the knowledge base may also be a medical knowledge graph with a complex structure and good maintenance, if possible.
S205, determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items of which the value data exist in the knowledge base are the qualitative comparison types.
If the value data are attribute items of normal contraction pressure, slight high contraction pressure and high contraction pressure, if the value data exist in the knowledge base, the data type of the attribute item is determined to be a qualitative comparison type.
It should be noted that, since the knowledge base directly affects the determination result of the data type of the attribute item in steps S204 and S205, in an implementation manner, the result of the data type determination of the attribute item in steps S204 and S205 may be manually checked. For example, the attribute item of the qualitative comparable type determined in step S205 may be adjusted to a qualitative incomparable type. If the knowledge base is a well-maintained knowledge base, the determination results of the data types of the attribute items in steps S204 and S205 are also more accurate, so that the determination results of both may not be adjusted.
In addition, it should be noted that, for the determination result of the data type of the attribute item in steps S202 and S203, manual adjustment may also be performed. For example, the data type of the attribute item of the qualitative comparable type determined in step S203 may be readjusted to a quantitative type.
It should be noted here that the present disclosure is not limited to the order of steps S201 to S205.
The method for classifying the data types of the attribute items replaces the manual classification labeling method for the attribute items in the related technology. The labor cost is reduced.
FIG. 3 is a flow chart illustrating one method of determining a data type of an attribute item according to an exemplary embodiment of the present disclosure. FIG. 3 illustrates a specific implementation flow of the method according to the method of FIG. 2 for determining the data type of an attribute term.
In one possible implementation manner, the processing of the value data of each attribute item by the data processing model includes: for the attribute items with the data types of the quantitative types, checking at least one of rank sum checking, T checking and T' checking to obtain a first intermediate attribute item;
for attribute items with data types being qualitative types, checking through a chi-square checking algorithm to obtain a second intermediate attribute item, wherein the qualitative types comprise the qualitative comparable type and the qualitative incomparable type;
And carrying out single factor analysis on the first intermediate attribute item and the second intermediate attribute item to obtain a first target attribute item related to the disorder, wherein the target attribute item comprises the first target attribute item.
In the related art, when analyzing the general data of the sample data, all attribute items in the sample data are required to be divided into two types, and then the analysis is performed based on the classification result. The general process of the general data analysis is that the value data of the first type attribute items in the sample data are subjected to normal test, and the value data of the attribute items conforming to normal distribution are subjected to T test or T' test; and carrying out rank sum check on the valued data of the attribute items which do not accord with the normal distribution. And carrying out chi-square test on the valued data of the second type attribute items in the sample data.
Thus, for this general data analysis approach in the related art, the present disclosure defines the data type of the attribute items of the first class as a quantitative type and the data type of the attribute items of the second class as a qualitative type. Then, checking at least one of rank sum check, T check and T' check for the attribute item with the data type being the quantitative type to obtain a first intermediate attribute item; and checking the attribute items with the data types being qualitative types through a chi-square checking algorithm to obtain second intermediate attribute items.
The first intermediate attribute term and the second intermediate attribute term characterize the results of general profile analysis in the related art. The number of attribute items included in the first intermediate attribute item and the second intermediate attribute item is smaller than the number of attribute items included in the sample data of the control group and the case group.
And carrying out single factor analysis on all attribute items in the first intermediate attribute item and the second intermediate attribute item to obtain a first target attribute item related to the symptoms in the case group. The target attribute items include the first target attribute item, that is, each attribute item in the result of the one-factor analysis may be a result of the etiology analysis.
And obtaining a first intermediate attribute item and a second intermediate attribute item by determining the data type of the attribute item and then carrying out general data analysis on the attribute items of the quantitative type and the qualitative type. In this way, no scientific research personnel is required to carry out labeling classification on each attribute item. This approach reduces labor costs compared to the related art.
In the case of the single-factor analysis for the first intermediate attribute item and the second intermediate attribute item, since the attribute items of the qualitative type are further divided into the attribute items of the qualitative comparable type and the qualitative incomparable type in the above steps, the single-factor analysis can be directly performed for the attribute items of the quantitative type, the qualitative comparable type, and the qualitative incomparable type in the first intermediate attribute item and the second intermediate attribute item. In this way, compared with the related art, it is unnecessary to re-classify each attribute item. This way further reduces the labor costs in the related art.
It should be noted that, in the related art, the process of single factor analysis generally includes discretizing with respect to a quantitative type of attribute item, performing Logistic regression analysis on a qualitative comparable type of attribute item, performing dummy coding analysis on a qualitative incomparable type of attribute item, and the like.
It will be appreciated by those skilled in the art that single factor analysis is based primarily on Logistic regression and corresponding OR and P values to analyze the impact of a single attribute on disease occurrence. The important index OR value is used for measuring the multiple of disease risk improvement when the value data of the attribute items are increased by one granularity.
Therefore, when the first intermediate attribute item and the second intermediate attribute item are subjected to single-factor analysis, if the quantitative type attribute item is subjected to the segmented discretization treatment, the quantitative type attribute item can have better statistical significance (P value). The statistical significance of the result is an estimation of the degree of realism of the result (which can represent the population). The greater the P value, the less likely the association of attribute items in the sample can be considered to be a reliable indicator of the association of attribute items in the population. For example, if the P value is 0.05, five percent of the associations of attribute terms in the characterization sample may be occasional.
In an implementation manner, the single factor analysis includes performing a segmentation discretization processing on the value data of each attribute item in the first intermediate attribute item, where a segmentation process in the segmentation discretization processing includes:
firstly, determining the numerical interval of the attribute item according to the maximum value and the minimum value of the attribute item.
For example, if the maximum value of an age attribute item is 100 and the minimum value is 0, the numerical interval of the age attribute item is [0, 100].
And then, segmenting the numerical value interval according to each super parameter in a preset super parameter space to obtain a segmented interval sequence set under all segmentation conditions.
For example, if the super-parameter space is (2, 10), then the super-parameters in the super-parameter space are 2,3, 4,5, 6, 7, 8, 9, 10.
Segmenting the value interval of the attribute item according to each super parameter, for example, dividing the value interval [0, 100] into two segments according to super parameter 2 to obtain all the conditions of dividing the value interval into two segments, such as [0,1], [2, 100]; [0,2], [3, 100]; [0,3], [4, 100], etc. (not all of the two-piece cases are listed here); as another example, the numerical interval [0, 100] is divided into three segments according to the super parameter 3, resulting in all the cases of being divided into three segments, for example, [0,1], [2,3], [4, 100]; [0,2], [3,4], [5, 100]; [0,3], [4,5], [6, 100], and so forth. And segmenting the numerical interval of the attribute item according to each super parameter to obtain a segmented interval sequence set under all segmentation conditions.
In a feasible real-time mode, the numerical intervals of the attribute items are segmented according to each super parameter to obtain a segmented interval sequence set under all segmentation conditions, and the segmentation can be realized by adopting a Bayesian optimization algorithm.
And then, calculating a P value representing the statistical significance of each segmented interval sequence in the segmented interval sequence set, and taking the segmented interval sequence with the minimum P value as a segmented result.
In one implementation, the P value for each sequence of segment intervals may be calculated as follows:
firstly, inputting each segment interval sequence in a segment interval sequence set into a Logistic regression model for analysis to obtain a variable coefficient and a variable standard error corresponding to each segment interval sequence.
In the related art, logistic regression analysis is a generalized linear regression analysis model. It will be understood by those skilled in the art that when each segment interval sequence in the segment interval sequence set is input into the Logistic regression model for analysis, a set of variable coefficients and variable standard errors are obtained for each segment interval sequence.
And then according to each obtained variable coefficient and variable standard error, calculating a corresponding wald χ2 value according to the following formula: wald χ2= (b j /s j ) 2 Wherein b j Characterization of the coefficient of variation, s j Variable standard errors are characterized. And obtaining a corresponding P value by looking up a table for each wald χ2 obtained by calculation.
Next, a sequence of segment intervals having the smallest P value is selected as a segment result, and then each segment interval is converted in turn for such a segment result. Illustratively, if the segmentation result is [0,2], [3,4], [5, 100], then the segmentation interval [0,2] of the attribute term is converted to 1; converting the segment interval [3,4] of the attribute item into 2; the segment interval [5, 100] of the attribute item is converted to 3. Thus, the segment discretization process for the quantitative type of attribute item ends.
By adopting the method, the quantitative type attribute items are subjected to the segmented discretization treatment, so that the quantitative type attribute items have statistical significance, and further, the result of single-factor analysis on the quantitative type attribute items can be more accurate. And the quantitative type attribute items in the single factor analysis result can better explain the diseases of the case group.
In one possible implementation manner, the processing of the value data of each attribute item by the data processing model further includes: and carrying out multi-factor analysis on the first target attribute item to obtain a second target attribute item, wherein the target attribute item comprises the second target attribute item.
After the single-factor analysis, multi-factor analysis can be performed on the result of the single-factor analysis to analyze the effect of the combination of multiple attribute items on the symptoms of the case group. That is, multi-factor analysis is the reason for analyzing whether a plurality of combinations of attribute items are diseased.
In the related art, multi-factor analysis is mainly based on the influence degree of Logistic regression analysis on disease occurrence after multi-attribute item combination. When multi-factor analysis is carried out on the attribute items, dummy coding is carried out on the attribute items of qualitative incomparable types in the attribute items, and then each dummy code is input into a Logistic regression model for analysis.
However, this approach may have an impact on the multi-factor analysis results, e.g., one dummy code of the attribute term may be used as an attribute term related to the condition of the case group, and another dummy code may be used as an attribute term unrelated to the condition of the case group.
In view of this, in the present disclosure, performing the multi-factor analysis for attribute items of qualitatively incomparable type includes:
generating a corresponding number of dummy variables according to the type of the value data of each attribute item of which the data type is the qualitative incomparable type in the first target attribute item;
For example, if the type of the attribute item value data is n, n dummy variables are generated.
And generating a comparability coefficient corresponding to each value data under the attribute item according to each dummy variable of the attribute item. Specifically, according to each dummy variable of the attribute item, a logic model coefficient of each dummy code is correspondingly generated.
Then each value data under the attribute itemThe comparison coefficient (Logistic model coefficient) is input into the following calculation formula to calculate so as to obtain the corresponding wald χ2 value: wald χ2= (qβ) T [Qvar(β)Q T ](Qβ);
It should be noted that, the assumption of the calculation formula is that: beta 0 =β 1 =.....β n-1 =0, where β 0 ,β 1 ,.....β n-1 And representing the Logistic model coefficients corresponding to each dummy variable.
It will be appreciated by those of ordinary skill in the art that assumptions need to be set when performing Logistic regression analysis. When the assumption preconditions for the setting are different, the derived wald χ2 formula is different.
In the wald χ2 formula, β represents a dummy variable coefficient, var (β) represents a standard error corresponding to the coefficient, T represents a transpose of the matrix, and Q is defined as:the number of rows of the matrix Q is n-1, the number of columns is n, the first column is all 0, and n represents the value type of the attribute item.
According to the calculated wald χ2 value, a corresponding P value is obtained by looking up a table, and according to the obtained P value, it can be determined whether or not to exclude the attribute item of the qualitative incomparable type at the time of multi-factor analysis. For example, assuming that the preset threshold is 0.05, if the obtained P value is greater than 0.05, the attribute item is excluded.
In this way, by taking all the value data of the attribute items of the qualitative incomparable type as a whole and then calculating the P value of the whole, the problem caused by calculating the P value for each value data of the attribute item in the related art can be avoided.
Based on the same inventive concept, the embodiments of the present disclosure further provide a etiology analysis apparatus, as shown in fig. 4, the apparatus 400 includes:
the obtaining module 410 is configured to obtain sample data of a control group and sample data of a case group, where the sample data includes multiple attribute items of a sample and value data of the sample under each attribute item, and conditions of each case in the case group are the same;
the determining module 420 is configured to determine a data type of each attribute item according to the value data under each attribute item;
the input module 430 is configured to input information of each attribute item of the control group and the case group into a data processing model, so as to obtain a target attribute item related to the disorder output by the data processing model;
the attribute item information comprises value data of an attribute item and a data type of the attribute item, and the data processing model is used for processing the value data of the attribute item according to a data processing algorithm corresponding to the data type of the attribute item.
By adopting the device, the sample data of the control group and the sample data of the case group are obtained, wherein the sample data comprises various attribute items of the sample and value data of the sample under each attribute item; determining the data type of each attribute item according to the value data under each attribute item; in this way, the data type of each attribute item is automatically determined without manually classifying and labeling each attribute item. And inputting the information of each attribute item after the data types of the control group and the case group are determined into a data processing model for processing, and obtaining a target attribute item which is output by the data processing model and is related to the symptoms of the case group. The etiology analysis mode does not need to manually participate in the analysis process, thereby realizing the etiology automatic analysis, and the etiology automatic analysis can avoid the problems caused by manually marking attribute items in the related technology.
Optionally, the determining module 420 includes:
the first determination submodule is used for determining that the data types of the attribute items with two value types of the value data are qualitative comparability types;
the second determination submodule is used for determining that the value types of the value data are not two, the value data are numerical value data, and the data types of the attribute items, of which the value data accord with normal distribution, are quantitative types;
The third determining submodule is used for determining that the value types of the value data are not two, the value data are numerical value data, and the data types of the attribute items, of which the value data do not accord with normal distribution, are the qualitative comparability types;
the fourth determination submodule is used for determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items in the knowledge base of the value data are qualitative incomparable types;
and the fifth determination submodule is used for determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items of which the value data exist in the knowledge base are the qualitative comparison types.
Optionally, the data processing model is configured to:
for the attribute items with the data types of the quantitative types, checking at least one of rank sum checking, T checking and T' checking to obtain a first intermediate attribute item;
for attribute items with data types being qualitative types, checking through a chi-square checking algorithm to obtain a second intermediate attribute item, wherein the qualitative types comprise the qualitative comparable type and the qualitative incomparable type;
And carrying out single factor analysis on the first intermediate attribute item and the second intermediate attribute item to obtain a first target attribute item related to the disorder, wherein the target attribute item comprises the first target attribute item.
Optionally, the single factor analysis includes performing a segmentation discretization process on the value data of each attribute item in the first intermediate attribute item, where a segmentation process in the segmentation discretization process includes:
determining a numerical interval of the attribute item according to the maximum value and the minimum value of the attribute item;
segmenting the numerical value interval according to each super parameter in a preset super parameter space to obtain a segmented interval sequence set under all segmentation conditions;
and calculating a P value representing the statistical significance of each segmented interval sequence in the segmented interval sequence set, and taking the segmented interval sequence with the minimum P value as a segmentation result.
Optionally, the data processing model is further configured to:
performing multi-factor analysis on the first target attribute item to obtain a second target attribute item, wherein the target attribute item comprises the second target attribute item;
wherein the multi-factor analysis comprises:
Generating a corresponding number of dummy variables according to the type of the value data of each attribute item of which the data type is the qualitative incomparable type in the first target attribute item;
and generating a comparability coefficient corresponding to each value data under the attribute item according to each dummy variable of the attribute item.
The specific manner in which the various modules perform the operations in the apparatus of the above embodiments have been described in detail in connection with the embodiments of the method, and will not be described in detail herein.
Fig. 5 is a block diagram of an electronic device 700, according to an example embodiment. As shown in fig. 5, the electronic device 700 may include: a processor 701, a memory 702. The electronic device 700 may also include one or more of a multimedia component 703, an input/output (I/O) interface 704, and a communication component 705.
Wherein the processor 701 is configured to control the overall operation of the electronic device 700 to perform all or part of the steps of the etiology analysis method described above. The memory 702 is used to store various types of data to support operation on the electronic device 700, which may include, for example, instructions for any application or method operating on the electronic device 700, as well as application-related data, such as contact data, messages sent and received, pictures, audio, video, and so forth. The Memory 702 may be implemented by any type or combination of volatile or non-volatile Memory devices, such as static random access Memory (Static Random Access Memory, SRAM for short), electrically erasable programmable Read-Only Memory (Electrically Erasable Programmable Read-Only Memory, EEPROM for short), erasable programmable Read-Only Memory (Erasable Programmable Read-Only Memory, EPROM for short), programmable Read-Only Memory (Programmable Read-Only Memory, PROM for short), read-Only Memory (ROM for short), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia component 703 can include a screen and an audio component. Wherein the screen may be, for example, a touch screen, the audio component being for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signals may be further stored in the memory 702 or transmitted through the communication component 705. The audio assembly further comprises at least one speaker for outputting audio signals. The I/O interface 704 provides an interface between the processor 701 and other interface modules, which may be a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 705 is for wired or wireless communication between the electronic device 700 and other devices. Wireless communication, such as Wi-Fi, bluetooth, near field communication (Near Field Communication, NFC for short), 2G, 3G, 4G, NB-IOT, eMTC, or other 5G, etc., or one or a combination of more of them, is not limited herein. The corresponding communication component 705 may thus comprise: wi-Fi module, bluetooth module, NFC module, etc.
In an exemplary embodiment, the electronic device 700 may be implemented by one or more application specific integrated circuits (Application Specific Integrated Circuit, abbreviated ASIC), digital signal processor (Digital Signal Processor, abbreviated DSP), digital signal processing device (Digital Signal Processing Device, abbreviated DSPD), programmable logic device (Programmable Logic Device, abbreviated PLD), field programmable gate array (Field Programmable Gate Array, abbreviated FPGA), controller, microcontroller, microprocessor, or other electronic components for performing the above-described etiology analysis method.
In another exemplary embodiment, a computer readable storage medium comprising program instructions which, when executed by a processor, implement the steps of the etiology analysis method described above is also provided. For example, the computer readable storage medium may be the memory 702 including program instructions described above, which are executable by the processor 701 of the electronic device 700 to perform the etiology analysis method described above.
In another exemplary embodiment, a computer program product is also provided, comprising a computer program executable by a programmable apparatus, the computer program having code portions for performing the above-described etiology analysis method when executed by the programmable apparatus.
The preferred embodiments of the present disclosure have been described in detail above with reference to the accompanying drawings, but the present disclosure is not limited to the specific details of the above embodiments, and various simple modifications may be made to the technical solutions of the present disclosure within the scope of the technical concept of the present disclosure, and all the simple modifications belong to the protection scope of the present disclosure.
In addition, the specific features described in the above embodiments may be combined in any suitable manner without contradiction. The various possible combinations are not described further in this disclosure in order to avoid unnecessary repetition.
Moreover, any combination between the various embodiments of the present disclosure is possible as long as it does not depart from the spirit of the present disclosure, which should also be construed as the disclosure of the present disclosure.

Claims (8)

1. A method of etiology analysis, the method comprising:
acquiring sample data of a control group and sample data of a case group, wherein the two groups of sample data comprise various attribute items of a sample and value data of the sample under each attribute item, and the symptoms of each case in the case group are the same;
Determining the data type of each attribute item according to the value data under each attribute item;
inputting the information of each attribute item of the control group and the case group into a data processing model to obtain a target attribute item which is output by the data processing model and is related to the symptoms;
the attribute item information comprises value data of an attribute item and a data type of the attribute item, and the data processing model is used for processing the value data of the attribute item according to a data processing algorithm corresponding to the data type of the attribute item;
the processing of the data processing model to the value data of each attribute item comprises the following steps:
for the attribute items with the data types being quantitative types, checking at least one of rank sum checking, T checking and T' checking to obtain a first intermediate attribute item;
for the attribute items with the data types being qualitative types, checking through a chi-square checking algorithm to obtain second intermediate attribute items, wherein the qualitative types comprise qualitative comparable types and qualitative incomparable types;
performing single factor analysis on the first intermediate attribute item and the second intermediate attribute item to obtain a first target attribute item related to the disorder, wherein the target attribute item comprises the first target attribute item;
The single factor analysis comprises the step of carrying out segmentation discretization processing on the value data of each attribute item in the first intermediate attribute item, wherein the segmentation process in the segmentation discretization processing comprises the following steps:
determining a numerical interval of the attribute item according to the maximum value and the minimum value of the attribute item;
segmenting the numerical value interval according to each super parameter in a preset super parameter space to obtain a segmented interval sequence set under all segmentation conditions;
and calculating a P value representing the statistical significance of each segmented interval sequence in the segmented interval sequence set, and taking the segmented interval sequence with the minimum P value as a segmentation result.
2. The method of claim 1, wherein determining the data type of each of the attribute items based on the value data under each of the attribute items comprises:
determining that the data types of the attribute items with two value types of the value data are qualitative comparability types;
determining that the value types of the value data are not two, the value data are numerical data, and the data types of the attribute items, of which the value data accord with normal distribution, are quantitative types;
Determining that the value types of the value data are not two, the value data are numerical data, and the data types of the attribute items, of which the value data do not accord with normal distribution, are the qualitative comparability types;
determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items in the knowledge base of the value data are qualitative incomparable types;
and determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items of which the value data exist in the knowledge base are the qualitative comparison types.
3. The method of claim 1, wherein the processing of the value data of each attribute item by the data processing model further comprises:
performing multi-factor analysis on the first target attribute item to obtain a second target attribute item, wherein the target attribute item comprises the second target attribute item;
wherein the multi-factor analysis comprises:
generating a corresponding number of dummy variables according to the type of the value data of each attribute item of which the data type is the qualitative incomparable type in the first target attribute item;
And generating a comparability coefficient corresponding to each value data under the attribute item according to each dummy variable of the attribute item.
4. A etiology analysis device, the device comprising:
the acquisition module is used for acquiring sample data of a control group and sample data of a case group, wherein the two groups of sample data comprise various attribute items of a sample and value data of the sample under each attribute item, and the symptoms of each case in the case group are the same;
the determining module is used for determining the data type of each attribute item according to the value data under each attribute item;
the input module is used for inputting the information of each attribute item of the control group and the case group into a data processing model to obtain a target attribute item which is output by the data processing model and is related to the symptoms;
the attribute item information comprises value data of an attribute item and a data type of the attribute item, and the data processing model is used for processing the value data of the attribute item according to a data processing algorithm corresponding to the data type of the attribute item;
the data processing model is used for:
for the attribute items with the data types being quantitative types, checking at least one of rank sum checking, T checking and T' checking to obtain a first intermediate attribute item;
For the attribute items with the data types being qualitative types, checking through a chi-square checking algorithm to obtain second intermediate attribute items, wherein the qualitative types comprise qualitative comparable types and qualitative incomparable types;
performing single factor analysis on the first intermediate attribute item and the second intermediate attribute item to obtain a first target attribute item related to the disorder, wherein the target attribute item comprises the first target attribute item;
the single factor analysis comprises the step of carrying out segmentation discretization processing on the value data of each attribute item in the first intermediate attribute item, wherein the segmentation process in the segmentation discretization processing comprises the following steps:
determining a numerical interval of the attribute item according to the maximum value and the minimum value of the attribute item;
segmenting the numerical value interval according to each super parameter in a preset super parameter space to obtain a segmented interval sequence set under all segmentation conditions;
and calculating a P value representing the statistical significance of each segmented interval sequence in the segmented interval sequence set, and taking the segmented interval sequence with the minimum P value as a segmentation result.
5. The apparatus of claim 4, wherein the means for determining comprises:
The first determination submodule is used for determining that the data types of the attribute items with two value types of the value data are qualitative comparability types;
the second determination submodule is used for determining that the value types of the value data are not two, the value data are numerical value data, and the data types of the attribute items, of which the value data accord with normal distribution, are quantitative types;
the third determining submodule is used for determining that the value types of the value data are not two, the value data are numerical value data, and the data types of the attribute items, of which the value data do not accord with normal distribution, are the qualitative comparability types;
the fourth determination submodule is used for determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items in the knowledge base of the value data are qualitative incomparable types;
and the fifth determination submodule is used for determining that the value types of the value data are not two, the value data are non-numerical data, and the data types of the attribute items of which the value data exist in the knowledge base are the qualitative comparison types.
6. The apparatus of claim 4, wherein the data processing model is further configured to:
Performing multi-factor analysis on the first target attribute item to obtain a second target attribute item, wherein the target attribute item comprises the second target attribute item;
wherein the multi-factor analysis comprises:
generating a corresponding number of dummy variables according to the type of the value data of each attribute item of which the data type is the qualitative incomparable type in the first target attribute item;
and generating a comparability coefficient corresponding to each value data under the attribute item according to each dummy variable of the attribute item.
7. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the steps of the method according to any of claims 1-3.
8. An electronic device, comprising:
a memory having a computer program stored thereon;
a processor for executing the computer program in the memory to carry out the steps of the method of any one of claims 1-3.
CN201911396700.6A 2019-12-30 2019-12-30 Etiology analysis method, device, storage medium and electronic equipment Active CN111199782B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911396700.6A CN111199782B (en) 2019-12-30 2019-12-30 Etiology analysis method, device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911396700.6A CN111199782B (en) 2019-12-30 2019-12-30 Etiology analysis method, device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN111199782A CN111199782A (en) 2020-05-26
CN111199782B true CN111199782B (en) 2023-09-29

Family

ID=70746475

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911396700.6A Active CN111199782B (en) 2019-12-30 2019-12-30 Etiology analysis method, device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN111199782B (en)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752064B2 (en) * 1999-07-01 2010-07-06 Nutech Solutions, Inc. System and method for infrastructure design
CN105260863A (en) * 2015-11-26 2016-01-20 国家电网公司 Fault single influence factor analysis method based on power cable fault information
CN107977550A (en) * 2017-12-29 2018-05-01 天津科技大学 A kind of quick analysis Disease-causing gene algorithm based on compression
CN108154906A (en) * 2018-01-17 2018-06-12 林沛杰 Electronic Case report no table system and electronic Case report no token recording method
WO2019065854A1 (en) * 2017-09-27 2019-04-04 株式会社レナテック Cancer risk evaluation method and cancer risk evaluation system

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
AU2006302031A1 (en) * 2005-10-11 2007-04-19 Tethys Bioscience, Inc. Diabetes-associated markers and methods of use thereof
US8399206B2 (en) * 2008-07-10 2013-03-19 Nodality, Inc. Methods for diagnosis, prognosis and methods of treatment
EP3566233A1 (en) * 2017-01-08 2019-11-13 The Henry M. Jackson Foundation for the Advancement of Military Medicine, Inc. Systems and methods for using supervised learning to predict subject-specific pneumonia outcomes

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7752064B2 (en) * 1999-07-01 2010-07-06 Nutech Solutions, Inc. System and method for infrastructure design
CN105260863A (en) * 2015-11-26 2016-01-20 国家电网公司 Fault single influence factor analysis method based on power cable fault information
WO2019065854A1 (en) * 2017-09-27 2019-04-04 株式会社レナテック Cancer risk evaluation method and cancer risk evaluation system
CN107977550A (en) * 2017-12-29 2018-05-01 天津科技大学 A kind of quick analysis Disease-causing gene algorithm based on compression
CN108154906A (en) * 2018-01-17 2018-06-12 林沛杰 Electronic Case report no table system and electronic Case report no token recording method

Also Published As

Publication number Publication date
CN111199782A (en) 2020-05-26

Similar Documents

Publication Publication Date Title
Birnbaum et al. Model-assisted cohort selection with bias analysis for generating large-scale cohorts from the EHR for oncology research
CN109804362B (en) Determining primary key-foreign key relationships by machine learning
CN108959236B (en) Medical literature classification model training method, medical literature classification method and device thereof
CN111159770B (en) Text data desensitization method, device, medium and electronic equipment
US20160188701A1 (en) File recognition system and method
US11152087B2 (en) Ensuring quality in electronic health data
CN113593709B (en) Disease coding method, system, readable storage medium and device
CN110471941B (en) Method and device for automatically positioning judgment basis and electronic equipment
US20180196924A1 (en) Computer-implemented method and system for diagnosis of biological conditions of a patient
WO2021223449A1 (en) Method and apparatus for acquiring flora marker, terminal, and storage medium
CN111400126A (en) Network service abnormal data detection method, device, equipment and medium
Gruber et al. Introduction to dartR
CN111755090A (en) Medical record searching method, medical record searching device, storage medium and electronic equipment
CN112307337A (en) Association recommendation method and device based on label knowledge graph and computer equipment
CN107657991B (en) Patient data screening method and device, storage medium and electronic equipment
CN111199782B (en) Etiology analysis method, device, storage medium and electronic equipment
US11709877B2 (en) Systems and methods for targeted annotation of data
CN112507075A (en) Case data searching method, system, equipment and storage medium
US11152121B2 (en) Generating clinical summaries using machine learning
CN111383768B (en) Medical data regression analysis method, device, electronic equipment and computer readable medium
Jafari et al. Computational approach test for inference about several correlation coefficients: equality and common
CN111666754B (en) Entity identification method and system based on electronic disease text and computer equipment
US20190279749A1 (en) Patient healthcare record linking system
León Palacio SILE: a method for the efficient management of smart genomic information
US11442650B2 (en) Generating predicted usage of storage capacity

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant