CN108805178A - Across granularity intelligence disorder in screening method and system - Google Patents
Across granularity intelligence disorder in screening method and system Download PDFInfo
- Publication number
- CN108805178A CN108805178A CN201810495222.3A CN201810495222A CN108805178A CN 108805178 A CN108805178 A CN 108805178A CN 201810495222 A CN201810495222 A CN 201810495222A CN 108805178 A CN108805178 A CN 108805178A
- Authority
- CN
- China
- Prior art keywords
- model
- disorder
- screening
- integrated
- classification
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/24—Classification techniques
-
- G—PHYSICS
- G16—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
- G16H—HEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
- G16H50/00—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
- G16H50/70—ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Theoretical Computer Science (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Medical Informatics (AREA)
- Health & Medical Sciences (AREA)
- Public Health (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Evolutionary Computation (AREA)
- Databases & Information Systems (AREA)
- Biomedical Technology (AREA)
- Primary Health Care (AREA)
- General Health & Medical Sciences (AREA)
- Epidemiology (AREA)
- Pathology (AREA)
- Investigating Or Analysing Biological Materials (AREA)
Abstract
The present invention provides a kind of across granularity intelligence disorder in screening method and system, to promote the comprehensive and accuracy rate of disorder in screening.This method includes:Multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different;According to target audit report, each layer of disorder in screening model is called successively, obtains the classification of diseases result of each layer of disorder in screening model output.
Description
Technical field
The present invention relates to field of medical technology, and in particular, to a kind of across granularity intelligence disorder in screening method and system.
Background technology
Traditionally the diagnosis to cancer (such as gastric cancer, breast cancer etc.) and screening rely on doctor to case history and audit report
Analysis.And due to increasingly heavy operating pressure, interminable case history and audit report, the working efficiency of doctor is generated larger
Influence and cancer diagnosis screening itself difficulty and base doctor itself professional standards limitation, cause to cancer
There are higher misdiagnosis rate and rates of missed diagnosis for the screening of disease.
With the progress of artificial intelligence technology in recent years, the analysis and research of data-driven are increasingly becoming clinical and biology neck
The strong support and supplement of domain cancer correlative study so that the screening of disease gradually tends to intelligent.For example, transporting in the related technology
With integrated learning approach, the data more than 1,400,000 diabetics are analyzed, in the neurological susceptibility of detection retinopathy (DR)
Aspect has very high accuracy, meanwhile, solve the problems, such as that retinopathy screening compliance is low.In another example the relevant technologies needle
Research to heart disease transfer operation Graft survival rate and predictive variable has used the multiple models of the integrated combination of weighted average
Prediction result improves the estimated performance of model, achieves preferable effect.
But the relevant technologies are not improved integrated approach itself, lead to the promotion to model prediction performance
It is uncontrollable, those skilled in the art do not do further consideration for how to further increase disorder in screening accuracy rate.
Invention content
The embodiment of the present invention provides a kind of across granularity intelligence disorder in screening method and system, to promote disorder in screening
Comprehensive and accuracy rate.
To achieve the goals above, first aspect present invention provides a kind of across granularity intelligence disorder in screening method, described
Method includes:
Multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different;
According to target audit report, each layer of disorder in screening model is called successively, obtains each layer of disorder in screening mould
The classification of diseases result of type output.
Optionally, each layer of disorder in screening model is established with the following method:
Structuring is handled after being carried out to audit report, obtains the corresponding sample data set of this layer of granularity of classification;
The training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classification models;
At least to the multiple homogeneous classification model using optimal tax weigh weighting integrated approach OWIA be weighted it is integrated, with
Obtain a disorder in screening model.
Optionally, structuring is handled after the progress to audit report, obtains the corresponding sample data of this layer of granularity of classification
Collection includes:
The pathologic finding report being subject in the audit report, Data Integration is carried out to the audit report;
Feature extraction is carried out to the audit report after integration, obtains representative and discrimination target data item;
Numeralization expression is carried out to the target data item, obtains the sample data set.
Optionally, the training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classifications
Model, including:
The sample data set is divided into training dataset and test data set according to preset ratio;
Sampling is concentrated to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is the positive integer more than 1,
In, specific sample mode can have the sampling put back to so that sampled data set size is identical with original data set size;
Using uniform machinery learning algorithm, the training and verification of model are carried out on the k parts of training dataset respectively, is obtained
To k homogeneous classification model.
Optionally, described that at least the multiple homogeneous classification model is carried out using optimal tax power weighting integrated approach OWIA
Weighting is integrated, to obtain a disorder in screening model, including:
Determine the weight composite set of the multiple homogeneous classification model;
It calculates separately under each weight combination in the weight composite set, the Performance Evaluation value of integrated model;
The corresponding weight combination of the optimal performance assessed value of the integrated model is combined as optimal weights, and uses institute
State optimal weights combination the multiple homogeneous classification model is weighted it is integrated.
Optionally, the weight composite set of the multiple homogeneous classification model of the determination, including:
All weight combinations are traversed at preset weight precision ε, the collection for obtaining ownership recombination is combined into Wn,k,
In, the weight composite set includesGroup weight combination, wherein ε=10 n=1/p, p is positive integer;
The method further includes:
The integrated model F (x) is indicated by following first formula:
Wherein, k is the number of homogeneous classification model, fiIndicate i-th of disaggregated model, wiIndicate the power of i-th of disaggregated model
Weight, wi ∈ (0,1), andThe output of F (x) is the probability that sample is under the jurisdiction of positive class, that is, is judged as that canceration occurs
Confidence level;
The performance of the integrated model is evaluated by following second formula:
Wherein, Test Data are test data set, and Q (F (x)) indicates integrated model F (x) AUC in test data set
Value.
Optionally, the method be applied to gastric cancer carry out screening, the audit report include pathologic finding report and
Gastrocopy is reported.
Across the granularity intelligence disorder in screening system of second aspect of the present invention one kind, including:
Model building module, for establishing multilayer disorder in screening model, the granularity of classification of each layer of disorder in screening model is not
Together;
Model calling module obtains every for according to target audit report, calling each layer of disorder in screening model successively
The classification of diseases result of one layer of disorder in screening model output.
Optionally, the model building module includes:
Structuring handles submodule afterwards, is handled for structuring after being carried out to audit report, obtains each layer of granularity of classification
Corresponding sample data set;
Model training submodule, the training for carrying out individual segregation model according to the sample data set, obtains multiple
Homogeneous classification model;
Model selectes submodule, at least using optimal tax power weighting integrated approach to the multiple homogeneous classification model
OWIA be weighted it is integrated, to obtain each layer of disorder in screening model.
Optionally, the rear structuring processing submodule includes:
Data Integration submodule, the pathologic finding report for being subject in the audit report, to the audit report
Carry out Data Integration;
Feature extraction submodule, for after integration audit report carry out feature extraction, obtain it is representative and
The target data item of discrimination;
Numeralization processing submodule obtains the sample data for carrying out numeralization expression to the target data item
Collection.
Optionally, the model training submodule includes:
Data divide submodule, for the sample data set to be divided into training dataset and test according to preset ratio
Data set;
Submodule is sampled, for concentrating sampling to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is
Positive integer more than 1;
Training submodule carries out model on the k parts of training dataset respectively for using uniform machinery learning algorithm
Training and verification, obtain k homogeneous classification model.
Optionally, the selected submodule of the model includes:
Weight combines determination sub-module, the weight composite set for determining the multiple homogeneous classification model;
Performance Evaluation submodule is integrated for calculating separately under the combination of each weight in the weight composite set
The Performance Evaluation value of model;
Weighting integrates submodule, for the corresponding weight combination of the optimal performance assessed value of the integrated model to be used as most
The combination of excellent weight, and using optimal weights combination the multiple homogeneous classification model is weighted integrated.
Using above-mentioned technical proposal, it can at least reach following technique effect:
The present invention, for various disease granularity of classification, establishes multilayer screening model, example when establishing disorder in screening model
Such as, first layer granularity can be:Cancer, without cancer, second layer granularity is finely divided for cancer or without cancer, and the granularity as cancer is segmented includes
The granularity of squamous carcinoma, gland cancer, cell cancer etc., no cancer subdivision includes inflammation, tumor, polyp, ulcer etc..In this way, for a certain new
Case, by calling each layer of disorder in screening model successively, can with it is first determined whether occur canceration (i.e. first layer grain
Degree), then judge next subdivision classification (i.e. second layer granularity), as canceration, determines whether that squamous carcinoma, whether there is or not gland cancer respectively
Deng improving the comprehensive and accuracy rate of screening.
Further, a kind of optimal entitled weighting integrated approach (OWIA, Optimal may be used in the present invention
Weighted Integrated Approach), it is ensured that used weight combination is optimal during weighted average is integrated
, be capable of the performance of maximized lift scheme, compared with prior art in be to the promotion of model prediction performance it is uncontrollable, this
The performance for inventing the promotion disorder in screening model that the technical solution provided can be controllable, further increases the accurate of disorder in screening
Rate.
Other features and advantages of the present invention will be described in detail in subsequent specific embodiment part.
Description of the drawings
Attached drawing is to be used to provide further understanding of the present invention, an and part for constitution instruction, with following tool
Body embodiment is used to explain the present invention together, but is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is a kind of flow diagram of across granularity intelligence disorder in screening method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of the construction method of disorder in screening model provided in an embodiment of the present invention.
Fig. 3 is the flow diagram of across the granularity intelligence disorder in screening method of another kind provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of across granularity intelligence disorder in screening system provided in an embodiment of the present invention;
Fig. 5 is the structural schematic diagram of across the granularity intelligence disorder in screening system of another kind provided in an embodiment of the present invention.
Specific implementation mode
The specific implementation mode of the present invention is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched
The specific implementation mode stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.
The embodiment of the present invention provides a kind of across granularity intelligence disorder in screening method, as shown in Figure 1, this method includes:
S101, multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different.
Wherein, establish training data that each layer of disorder in screening model uses can difference can also be identical, it is still, each
The tag along sort of layer training data is distinguished, and specifically, the granularity of classification of lower layer can be last layer granularity of classification into one
Step subdivision.
S102, according to target audit report, call each layer of disorder in screening model successively, obtain each layer disease sieve
Look into the classification of diseases result of model output.
The above method is directed to various disease granularity of classification, establishes multilayer screening model, for example, first layer granularity can be with
For:Cancer, without cancer, second layer granularity is finely divided for cancer or without cancer, and the granularity as cancer is segmented includes squamous carcinoma, gland cancer, cell cancer
Etc., the granularity of no cancer subdivision includes inflammation, tumor, polyp, ulcer etc..In this way, a certain new case is directed to, by adjusting successively
, can be with it is first determined whether canceration (i.e. first layer granularity) occur with each layer of disorder in screening model, then judge next subdivision
Classification (i.e. second layer granularity) determines whether that squamous carcinoma, whether there is or not gland cancer etc. respectively as canceration, improves the comprehensive of screening
Property and accuracy rate.
The method for building up of model is described in detail below.Optionally, the embodiment of the present invention may be used as shown in Figure 2
Method and step establish each layer of disorder in screening model, including:
S1011, structuring after audit report progress is handled, obtains the corresponding sample data set of this layer of granularity of classification.
It is worth noting that technical solution provided in an embodiment of the present invention can be used for building screening mould for various disease
Type.For example, gastric cancer, breast cancer, diabetes etc..
Below with gastric cancer for example, it refers to by integrating gastrocopy report and pathologic finding that structuring, which is handled, after above-mentioned
Report, construction shaped like<Gastrocopy data, pathological examination results>Data set, and to after integration audit report data carry out
The numeralization of feature extraction and report indicates, obtains the sample data set for modeling.
S1012, the training that individual segregation model is carried out according to the sample data set, obtain multiple homogeneous classification models.
S1013, at least the multiple homogeneous classification model is weighted using optimal tax power weighting integrated approach OWIA
It is integrated, to obtain a disorder in screening model.
Above-mentioned steps S1012 and S1013 is the training of model and integrates, wherein homogeneous classification model refers to by of the same race
The disaggregated model that learning algorithm learns.In addition, the learning algorithm that structure model uses for example can be support vector machines
(SVM, Support Vector Machine), multi-layer perception (MLP) (MLP, Multi-layer Perceptron), limit gradient
(XGBoost), neural network etc. are promoted, it is not limited in the embodiment of the present invention.
Those of ordinary skill in the art should know that integrated study is intended to by building and merging multiple machine learning models
Prediction improve precision of prediction, existing research work proves that integrated study can be obviously improved model performance.But mesh
Preceding more common Integrated Strategy has the method for average and ballot method, and the weighted average use in the method for average is more extensive.Scholars
Power usually is assigned to individual segregation model according to indexs such as AUC value, accuracys rate and carries out weighted average and is integrated, but this weight is chosen
Reliable theoretical foundation is not had, in some cases, simple average is even better than average weighted integrated, i.e. weighted average
The promotion for collecting pairs of model performance is uncontrollable.And the disclosure provides a kind of optimal entitled weighting integrated approach (OWIA),
Ensure that used weight combination is optimal during weighted average is integrated, be capable of the performance of maximized lift scheme,
And then improve the accuracy rate of disorder in screening.
In order to enable those of ordinary skill in the art that technical solution provided in an embodiment of the present invention is more clearly understood,
Above-mentioned steps are described in detail below.
Still it is illustrated with carrying out screening to gastric cancer, in this case, the inspection report described in step S1011
Announcement includes pathologic finding report and gastrocopy report.Structuring processing afterwards specifically includes following steps:
(1) data reporting is integrated:Gastrocopy report and pathologic finding report by same patient are integrated.It is clinical
On, pathological examination results are considered as goldstandard, the i.e. currently generally acknowledged most reliable standard method that diagnoses the illness, and can correctly distinguish
" ill " or " disease-free ".Therefore, pathological examination results are subject to during being integrated to data reporting,.
(2) feature extraction of audit report:From data reporting, representative and discrimination data item is extracted,
Positive descriptor or phrase, report time, the patient information of such as particular condition.
(3) numeralization of data reporting indicates:Data reporting is converted to according to characteristic item the process of numeric type data.
Further, above-mentioned steps S1012 includes:The sample data set is divided into training data according to preset ratio
Collection and test data set;Sampling is concentrated to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is just more than 1
Specifically the sample mode put back to may be used in integer so that sampled data set size is identical with original data set size;It adopts
With uniform machinery learning algorithm, the training and verification of model are carried out on the k parts of training dataset respectively, obtains k homogeneity
Disaggregated model.
Illustratively, data reporting structuring after warp handled presses 4:1 ratio cut partition is training dataset and test
Data set carries out training dataset, from sampling, to obtain the sampled data set of k parts of mutuals intersection;A kind of machine learning is selected to calculate
Method, such as logistic regression, support vector machines carry out the training and verification of model in k parts of sampled data sets respectively, obtain k
Homogeneous classification model, as individual segregation model used in next step model integrated.
Further, above-mentioned steps S1013 includes:Determine the weight composite set of the multiple homogeneous classification model;Point
It does not calculate under each weight combination in the weight composite set, the Performance Evaluation value of integrated model;By the integrated mould
The corresponding weight combination of optimal performance assessed value of type is combined as optimal weights, and is combined to described using the optimal weights
Multiple homogeneous classification models are weighted integrated.
It is worth noting that in two classification problems, area under ROC curve, i.e. AUC are widely used in assessment models performance
Quality, by AUC value of the disaggregated model on test set come the expectation Generalization Capability of averaging model.Again since intelligent gastric cancer is sieved
The looking into model of the task is to judge whether canceration occurs, and is typical two classification problems, therefore, for the screening of gastric cancer, the present invention
A kind of preferred index being achieved in that using AUC as assessment models Generalization Capability.
In addition, the weight composite set of above-mentioned the multiple homogeneous classification model of determination may include:In preset weight
All weight combinations are traversed under precision ε, the collection for obtaining ownership recombination is combined into Wn,k, wherein the weight composite set packet
ContainGroup weight combination, wherein ε=10 n=1/p, p is positive integer.In this way, the formula expression of integrated model F (x) is such as following
First formula:
Wherein, k is the number of homogeneous classification model, fiIndicate i-th of disaggregated model, wiIndicate the power of i-th of disaggregated model
Weight, wi∈ (0,1), andThe output of F (x) is the probability that sample is under the jurisdiction of positive class, that is, is judged as that canceration occurs
Confidence level;
Further, following second formula may be used in the performance of evaluation integrated model F (x):
Wherein, Test Data are test data set, and Q (F (x)) indicates integrated model F (x) AUC in test data set
Value.
Following table gives the pseudocode description of optimal entitled weighting integrated approach:
Shown in table as above, row 1) -2) be parameter initialization, 3) -5) be traversed under given weight precision it is all
Weight combination, 6) -7) provide integrated model Performance Evaluation function and calculate assessed value, 8) -11) find out and keep model performance optimal
Weight combination, with return optimal weights combination multiple homogeneous classification models are weighted it is integrated.
Fig. 3 shows the stream for establishing gastric cancer screening model for gastric cancer using technical solution provided in an embodiment of the present invention
Journey, as shown in figure 3, structure includes the rear structure for including to case control report and gastrocopy for the screening model of gastric cancer
Change is handled, and is specifically referred to the above-mentioned description handled rear structuring, details are not described herein again.Further, to treated
Audit report carries out data sampling, is used for disaggregated model training, obtains k homogeneous classification model.For each weighting weight
Combination carries out Performance Evaluation, wherein AUC value specifically can be used in Performance Evaluation to weighting integrated model.It is determined based on Performance Evaluation
Optimal weights combine, and tax power are carried out to model using optimal weights combination, to obtain final integrated model.
For the validity for the optimal entitled weighting integrated approach (OWIA) that the verification embodiment of the present invention is proposed, precise volume
Change promotion of this method to model performance, is following contrast experiment.
By to training dataset carry out from sample, obtain 4 parts of data sets, respectively use LR, SVM, MLP, XGB algorithm into
The training of row model, and on test set carry out model integrated compliance test result, using AUC value as individual segregation model and integrate
The evaluation index of model performance.
Experiment display, when using AUC value as the evaluation index of model performance, proposed optimal entitled weight-sets
At method to shown in the promotion of model performance table specific as follows:
When as seen from the above table, using LR as learning algorithm, model performance can promote 4.2% to 9.9%;Made using SVM
For learning algorithm when, model performance can promote 3.8% to 8.5%;When using MLP as learning algorithm, model performance can be promoted
2.1% to 3.7%;When using XGB as learning algorithm, model performance can promote 1.1% to 2.4%.
In addition, experiment shows the optimal entitled weighting integrated approach of proposed one kind on each learning algorithm, all
It is integrated better than average weighted.When towards gastric cancer screening the problem of, can maximumlly lift scheme estimated performance, and then improve
The accuracy rate of disorder in screening.
It is above-mentioned that only the optimal tax power weighting integrated approach OWIA in structure disorder in screening model construction process is carried out
It is described in detail, in the specific implementation, other relevant operations, such as misclassification cost can also carry out model according to actual demand
The selection etc. of threshold value, the present invention does not limit this.
Based on identical inventive concept, the embodiment of the present invention also provides a kind of across granularity intelligence disorder in screening system 30,
As shown in figure 4, the system 30 includes:
Model building module 301, for establishing multilayer disorder in screening model, the granularity of classification of each layer of disorder in screening model
It is different;
Model calling module 302, for according to target audit report, calling each layer of disorder in screening model successively, obtaining
The classification of diseases result exported to each layer of disorder in screening model.
The system is directed to various disease granularity of classification, multilayer screening model is established, for example, first layer granularity can be:
Cancer, without cancer, second layer granularity is finely divided for cancer or without cancer, such as the granularity of cancer subdivision includes squamous carcinoma, gland cancer, cell cancer
Deng the granularity of no cancer subdivision includes inflammation, tumor, polyp, ulcer etc..In this way, a certain new case is directed to, by calling successively
Each layer of disorder in screening model, can be it is first determined whether canceration (i.e. first layer granularity) occurs, then to judge next disaggregated classification
Not (i.e. second layer granularity) determines whether that squamous carcinoma, whether there is or not gland cancer etc. respectively as canceration, improves the comprehensive of screening
And accuracy rate.
Optionally, as shown in figure 4, the model building module 301 may include:
Structuring handles submodule 3011 afterwards, is handled for structuring after being carried out to audit report, obtains each layer of classification
The corresponding sample data set of granularity;
Model training submodule 3012, the training for carrying out individual segregation model according to the sample data set, obtains
Multiple homogeneous classification models;
Model selectes submodule 3013, at least being integrated using optimal tax power weighting to the multiple homogeneous classification model
Method OWIA be weighted it is integrated, to obtain each layer of disorder in screening model.
In this way, the system uses optimal entitled weighting integrated approach (OWIA, Optimal Weighted
Integrated Approach), it is ensured that used weight combination is optimal during weighted average is integrated, can be most
The performance for the lift scheme changed greatly, compared with prior art in be uncontrollable to the promotion of model prediction performance, which can
The performance of controllable promotion disorder in screening model, and then improve the accuracy rate of disorder in screening.
Optionally, the rear structuring processing submodule 3011 includes:
Data Integration submodule, the pathologic finding report for being subject in the audit report, to the audit report
Carry out Data Integration;
Feature extraction submodule, for after integration audit report carry out feature extraction, obtain it is representative and
The target data item of discrimination;
Numeralization processing submodule obtains the sample data for carrying out numeralization expression to the target data item
Collection.
Optionally, the model training submodule 3012 includes:
Data divide submodule, for the sample data set to be divided into training dataset and test according to preset ratio
Data set;
Submodule is sampled, for concentrating sampling to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is
Positive integer more than 1;
Training submodule carries out model on the k parts of training dataset respectively for using uniform machinery learning algorithm
Training and verification, obtain k homogeneous classification model.
Optionally, the model selectes submodule and 3013 includes:
Weight combines determination sub-module, the weight composite set for determining the multiple homogeneous classification model;
Performance Evaluation submodule is integrated for calculating separately under the combination of each weight in the weight composite set
The Performance Evaluation value of model;
Weighting integrates submodule, for the corresponding weight combination of the optimal performance assessed value of the integrated model to be used as most
The combination of excellent weight, and using optimal weights combination the multiple homogeneous classification model is weighted integrated.
Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each function mould
The division progress of block, can be as needed and by above-mentioned function distribution by different function modules for example, in practical application
It completes, i.e., the internal structure of system is divided into different function modules, to complete all or part of the functions described above.
The specific work process of foregoing description function module, can refer to corresponding processes in the foregoing method embodiment, no longer superfluous herein
It states.
The embodiment of the present invention also provides another across granularity intelligence disorder in screening system 40, as shown in figure 5, the system
40 include:
Processor (processor) 41, communication interface (Communications Interface) 42, memory
(memory) 43 and communication bus 44;Wherein, the processor 41, the communication interface 42 and the memory 43 pass through described
Communication bus 44 completes mutual communication.
Processor 41 may be a multi-core central processing unit CPU or specific integrated circuit ASIC
(Application Specific Integrated Circuit), or be arranged to implement the one of the embodiment of the present invention
A or multiple integrated circuits.
For memory 43 for storing program code, said program code includes computer-managed instruction and network flow graph.It deposits
Reservoir 43 may include high-speed RAM memory, it is also possible to and further include nonvolatile memory (non-volatile memory),
A for example, at least magnetic disk storage.Memory 43 can also be memory array.Memory 43 is also possible to by piecemeal, and institute
Virtual volume can be combined by certain rule by stating block.
The communication interface 42, for realizing the connection communication between these devices.
The processor 41 is used to execute the program code in the memory 43, to realize following operation:
Multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different;
According to target audit report, each layer of disorder in screening model is called successively, obtains each layer of disorder in screening mould
The classification of diseases result of type output.
Optionally, each layer of disorder in screening model is established with the following method:
Structuring is handled after being carried out to audit report, obtains the corresponding sample data set of this layer of granularity of classification;
The training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classification models;
At least to the multiple homogeneous classification model using optimal tax weigh weighting integrated approach OWIA be weighted it is integrated, with
Obtain a disorder in screening model.
Optionally, structuring is handled after the progress to audit report, obtains the corresponding sample data of this layer of granularity of classification
Collection includes:
The pathologic finding report being subject in the audit report, Data Integration is carried out to the audit report;
Feature extraction is carried out to the audit report after integration, obtains representative and discrimination target data item;
Numeralization expression is carried out to the target data item, obtains the sample data set.
Optionally, the training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classifications
Model, including:
The sample data set is divided into training dataset and test data set according to preset ratio;
Sampling is concentrated to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is the positive integer more than 1;
Using uniform machinery learning algorithm, the training and verification of model are carried out on the k parts of training dataset respectively, is obtained
To k homogeneous classification model.
Optionally, described that at least the multiple homogeneous classification model is carried out using optimal tax power weighting integrated approach OWIA
Weighting is integrated, to obtain the disorder in screening model, including:
Determine the weight composite set of the multiple homogeneous classification model;
It calculates separately under each weight combination in the weight composite set, the Performance Evaluation value of integrated model;
The corresponding weight combination of the optimal performance assessed value of the integrated model is combined as optimal weights, and uses institute
State optimal weights combination the multiple homogeneous classification model is weighted it is integrated.
Optionally, the weight composite set of the multiple homogeneous classification model of the determination, including:
All weight combinations are traversed at preset weight precision ε, the collection for obtaining ownership recombination is combined into Wn,k,
In, the weight composite set includesGroup weight combination, wherein ε=10 n=1/p, p is positive integer;
The method further includes:
The integrated model F (x) is indicated by following first formula:
Wherein, k is the number of homogeneous classification model, fiIndicate i-th of disaggregated model, wiIndicate the power of i-th of disaggregated model
Weight, wi ∈ (0,1), andThe output of F (x) is the probability that sample is under the jurisdiction of positive class, that is, is judged as that canceration occurs
Confidence level;
The performance of the integrated model is evaluated by following second formula:
Wherein, Test Data are test data set, and Q (F (x)) indicates integrated model F (x) AUC in test data set
Value.
Optionally, the method be applied to gastric cancer carry out screening, the audit report include pathologic finding report and
Gastrocopy is reported.
The preferred embodiment of the present invention is described in detail above in association with attached drawing, still, the present invention is not limited to above-mentioned realities
The detail in mode is applied, within the scope of the technical concept of the present invention, a variety of letters can be carried out to technical scheme of the present invention
Monotropic type, these simple variants all belong to the scope of protection of the present invention.
It is further to note that specific technical features described in the above specific embodiments, in not lance
In the case of shield, it can be combined by any suitable means.In order to avoid unnecessary repetition, the present invention to it is various can
The combination of energy no longer separately illustrates.
In addition, various embodiments of the present invention can be combined randomly, as long as it is without prejudice to originally
The thought of invention equally should be considered as the content that the present invention is invented.
Claims (10)
1. a kind of across granularity intelligence disorder in screening method, which is characterized in that the method includes:
Multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different;
According to target audit report, each layer of disorder in screening model is called successively, and it is defeated to obtain each layer of disorder in screening model
The classification of diseases result gone out.
2. according to the method described in claim 1, it is characterized in that, establishing each layer of disorder in screening mould with the following method
Type:
Structuring is handled after being carried out to audit report, obtains the corresponding sample data set of this layer of granularity of classification;
The training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classification models;
At least to the multiple homogeneous classification model using optimal tax weigh weighting integrated approach OWIA be weighted it is integrated, to obtain
One disorder in screening model.
3. according to the method described in claim 2, it is characterized in that, structuring processing after the progress to audit report, obtains
The corresponding sample data set of this layer of granularity of classification includes:
The pathologic finding report being subject in the audit report, Data Integration is carried out to the audit report;
Feature extraction is carried out to the audit report after integration, obtains representative and discrimination target data item;
Numeralization expression is carried out to the target data item, obtains the sample data set.
4. according to the method described in claim 2, it is characterized in that, described carry out individual segregation mould according to the sample data set
The training of type obtains multiple homogeneous classification models, including:
The sample data set is divided into training dataset and test data set according to preset ratio;
Sampling is concentrated to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is the positive integer more than 1;
Using uniform machinery learning algorithm, the training and verification of model are carried out on the k parts of training dataset respectively, obtains k
A homogeneous classification model.
5. method according to any one of claim 2 to 4, which is characterized in that described at least to the multiple homogeneity point
Class model using optimal tax weigh weighting integrated approach OWIA be weighted it is integrated, to obtain a disorder in screening model, including:
Determine the weight composite set of the multiple homogeneous classification model;
It calculates separately under each weight combination in the weight composite set, the Performance Evaluation value of integrated model;
The corresponding weight combination of the optimal performance assessed value of the integrated model is combined as optimal weights, and described in most
Excellent weight combination is weighted the multiple homogeneous classification model integrated.
6. according to the method described in claim 5, it is characterized in that, the weight group of the multiple homogeneous classification model of the determination
Intersection is closed, including:
All weight combinations are traversed at preset weight precision ε, the collection for obtaining ownership recombination is combined into Wn,k, wherein institute
Stating weight composite set includesGroup weight combination, wherein ε=10 n=1/p, p is positive integer;
The method further includes:
The integrated model F (x) is indicated by following first formula:
Wherein, k is the number of homogeneous classification model, fiIndicate i-th of disaggregated model, wiIndicate the weight of i-th of disaggregated model,
wi∈ (0,1), andThe output of F (x) is the probability that sample is under the jurisdiction of positive class, that is, is judged as that setting for canceration occurs
Reliability;
The performance of the integrated model is evaluated by following second formula:
Wherein, Test Data are test data set, and Q (F (x)) indicates integrated model F (x) AUC value in test data set.
7. a kind of across granularity intelligence disorder in screening system, which is characterized in that including:
Model building module, for establishing multilayer disorder in screening model, the granularity of classification of each layer of disorder in screening model is different;
Model calling module, for according to target audit report, calling each layer of disorder in screening model successively, obtaining each layer
Disorder in screening model output classification of diseases result.
8. system according to claim 7, which is characterized in that the model building module includes:
Structuring handles submodule afterwards, is handled for structuring after being carried out to audit report, obtains each layer of granularity of classification and corresponds to
Sample data set;
Model training submodule, the training for carrying out individual segregation model according to the sample data set, obtains multiple homogeneities
Disaggregated model;
Model selectes submodule, at least using optimal tax power weighting integrated approach OWIA to the multiple homogeneous classification model
Be weighted it is integrated, to obtain each layer of disorder in screening model.
9. system according to claim 8, which is characterized in that the model training submodule includes:
Data divide submodule, for the sample data set to be divided into training dataset and test data according to preset ratio
Collection;
Submodule is sampled, for concentrating sampling to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is more than 1
Positive integer;
Training submodule carries out the instruction of model on the k parts of training dataset respectively for using uniform machinery learning algorithm
Practice and verify, obtains k homogeneous classification model.
10. according to the system described in any one of claim 8 or 9, which is characterized in that the model selectes submodule and includes:
Weight combines determination sub-module, the weight composite set for determining the multiple homogeneous classification model;
Performance Evaluation submodule, for calculating separately under the combination of each weight in the weight composite set, integrated model
Performance Evaluation value;
Weighting integrates submodule, for regarding the corresponding weight combination of the optimal performance assessed value of the integrated model as optimal power
Recombination, and using the optimal weights combination the multiple homogeneous classification model is weighted it is integrated.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810495222.3A CN108805178B (en) | 2018-05-22 | 2018-05-22 | Cross-granularity intelligent disease screening system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810495222.3A CN108805178B (en) | 2018-05-22 | 2018-05-22 | Cross-granularity intelligent disease screening system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN108805178A true CN108805178A (en) | 2018-11-13 |
CN108805178B CN108805178B (en) | 2020-12-15 |
Family
ID=64092777
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810495222.3A Active CN108805178B (en) | 2018-05-22 | 2018-05-22 | Cross-granularity intelligent disease screening system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108805178B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112598084A (en) * | 2021-03-02 | 2021-04-02 | 深圳金三立视频科技股份有限公司 | Vehicle type identification method and terminal based on image processing |
CN112633601A (en) * | 2020-12-31 | 2021-04-09 | 天津开心生活科技有限公司 | Method, device, equipment and computer medium for predicting disease event occurrence probability |
Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103632168A (en) * | 2013-12-09 | 2014-03-12 | 天津工业大学 | Classifier integration method for machine learning |
CN107180155A (en) * | 2017-04-17 | 2017-09-19 | 中国科学院计算技术研究所 | A kind of disease forecasting method and system based on Manufacturing resource model |
-
2018
- 2018-05-22 CN CN201810495222.3A patent/CN108805178B/en active Active
Patent Citations (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103632168A (en) * | 2013-12-09 | 2014-03-12 | 天津工业大学 | Classifier integration method for machine learning |
CN107180155A (en) * | 2017-04-17 | 2017-09-19 | 中国科学院计算技术研究所 | A kind of disease forecasting method and system based on Manufacturing resource model |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN112633601A (en) * | 2020-12-31 | 2021-04-09 | 天津开心生活科技有限公司 | Method, device, equipment and computer medium for predicting disease event occurrence probability |
CN112598084A (en) * | 2021-03-02 | 2021-04-02 | 深圳金三立视频科技股份有限公司 | Vehicle type identification method and terminal based on image processing |
CN112598084B (en) * | 2021-03-02 | 2021-06-29 | 深圳金三立视频科技股份有限公司 | Vehicle type identification method and terminal based on image processing |
Also Published As
Publication number | Publication date |
---|---|
CN108805178B (en) | 2020-12-15 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109300121B (en) | A kind of construction method of cardiovascular disease diagnosis model, system and the diagnostic device | |
CN108766559A (en) | Clinical decision support method and system for intelligent disorder in screening | |
CN106599913A (en) | Cluster-based multi-label imbalance biomedical data classification method | |
CN109472784A (en) | Based on the recognition methods for cascading full convolutional network pathological image mitotic cell | |
CN107066781B (en) | Analysis method based on the relevant colorectal cancer data model of h and E | |
CN111860406A (en) | Blood cell microscopic image classification method based on regional confusion mechanism neural network | |
Rodrigues et al. | Optimizing a deep residual neural network with genetic algorithm for acute lymphoblastic leukemia classification | |
Sanida et al. | Tomato leaf disease identification via two–stage transfer learning approach | |
CN108805178A (en) | Across granularity intelligence disorder in screening method and system | |
CN112907604A (en) | Self-adaptive super-pixel FCM (pixel-frequency modulation) method for fundus velveteen speckle image segmentation | |
CN105279520B (en) | Optimal feature subset choosing method based on classification capacity structure vector complementation | |
CN115985503B (en) | Cancer prediction system based on ensemble learning | |
Jannat et al. | Efficient detection of crop leaf diseases: A lightweight convolutional neural network approach for enhanced agricultural productivity | |
CN110516741A (en) | Classification based on dynamic classifier selection is overlapped unbalanced data classification method | |
CN110363240A (en) | A kind of medical image classification method and system | |
Liu et al. | A complex chained P system based on evolutionary mechanism for image segmentation | |
Pu et al. | TA-BiDet: Task-aligned binary object detector | |
Xu et al. | Generative detect for occlusion object based on occlusion generation and feature completing | |
Rahman et al. | Deep Learning-Based Left Ventricular Ejection Fraction Estimation from Echocardiographic Videos | |
Xie et al. | Using SVM and PSO-NN Models to Predict Breast Cancer | |
Dong et al. | White blood cell classification based on a novel ensemble convolutional neural network framework | |
Shouryadhar et al. | Multilevel Ensemble Method to Identify Risks in Chronic Kidney Disease Using Hybrid Synthetic Data | |
CN113344044B (en) | Cross-species medical image classification method based on field self-adaption | |
CN114782397B (en) | Artificial intelligence tumor diagnosis system based on medical image and machine learning | |
Huang et al. | Multi-level Knowledge Integration with Graph Convolutional Network for Cancer Molecular Subtype Classification |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |