CN108805178A - Across granularity intelligence disorder in screening method and system - Google Patents

Across granularity intelligence disorder in screening method and system Download PDF

Info

Publication number
CN108805178A
CN108805178A CN201810495222.3A CN201810495222A CN108805178A CN 108805178 A CN108805178 A CN 108805178A CN 201810495222 A CN201810495222 A CN 201810495222A CN 108805178 A CN108805178 A CN 108805178A
Authority
CN
China
Prior art keywords
model
disorder
screening
integrated
classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201810495222.3A
Other languages
Chinese (zh)
Other versions
CN108805178B (en
Inventor
丁帅
胡世康
杨善林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201810495222.3A priority Critical patent/CN108805178B/en
Publication of CN108805178A publication Critical patent/CN108805178A/en
Application granted granted Critical
Publication of CN108805178B publication Critical patent/CN108805178B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G16INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR SPECIFIC APPLICATION FIELDS
    • G16HHEALTHCARE INFORMATICS, i.e. INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR THE HANDLING OR PROCESSING OF MEDICAL OR HEALTHCARE DATA
    • G16H50/00ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics
    • G16H50/70ICT specially adapted for medical diagnosis, medical simulation or medical data mining; ICT specially adapted for detecting, monitoring or modelling epidemics or pandemics for mining of medical data, e.g. analysing previous cases of other patients

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Medical Informatics (AREA)
  • Health & Medical Sciences (AREA)
  • Public Health (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Computation (AREA)
  • Databases & Information Systems (AREA)
  • Biomedical Technology (AREA)
  • Primary Health Care (AREA)
  • General Health & Medical Sciences (AREA)
  • Epidemiology (AREA)
  • Pathology (AREA)
  • Investigating Or Analysing Biological Materials (AREA)

Abstract

The present invention provides a kind of across granularity intelligence disorder in screening method and system, to promote the comprehensive and accuracy rate of disorder in screening.This method includes:Multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different;According to target audit report, each layer of disorder in screening model is called successively, obtains the classification of diseases result of each layer of disorder in screening model output.

Description

Across granularity intelligence disorder in screening method and system
Technical field
The present invention relates to field of medical technology, and in particular, to a kind of across granularity intelligence disorder in screening method and system.
Background technology
Traditionally the diagnosis to cancer (such as gastric cancer, breast cancer etc.) and screening rely on doctor to case history and audit report Analysis.And due to increasingly heavy operating pressure, interminable case history and audit report, the working efficiency of doctor is generated larger Influence and cancer diagnosis screening itself difficulty and base doctor itself professional standards limitation, cause to cancer There are higher misdiagnosis rate and rates of missed diagnosis for the screening of disease.
With the progress of artificial intelligence technology in recent years, the analysis and research of data-driven are increasingly becoming clinical and biology neck The strong support and supplement of domain cancer correlative study so that the screening of disease gradually tends to intelligent.For example, transporting in the related technology With integrated learning approach, the data more than 1,400,000 diabetics are analyzed, in the neurological susceptibility of detection retinopathy (DR) Aspect has very high accuracy, meanwhile, solve the problems, such as that retinopathy screening compliance is low.In another example the relevant technologies needle Research to heart disease transfer operation Graft survival rate and predictive variable has used the multiple models of the integrated combination of weighted average Prediction result improves the estimated performance of model, achieves preferable effect.
But the relevant technologies are not improved integrated approach itself, lead to the promotion to model prediction performance It is uncontrollable, those skilled in the art do not do further consideration for how to further increase disorder in screening accuracy rate.
Invention content
The embodiment of the present invention provides a kind of across granularity intelligence disorder in screening method and system, to promote disorder in screening Comprehensive and accuracy rate.
To achieve the goals above, first aspect present invention provides a kind of across granularity intelligence disorder in screening method, described Method includes:
Multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different;
According to target audit report, each layer of disorder in screening model is called successively, obtains each layer of disorder in screening mould The classification of diseases result of type output.
Optionally, each layer of disorder in screening model is established with the following method:
Structuring is handled after being carried out to audit report, obtains the corresponding sample data set of this layer of granularity of classification;
The training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classification models;
At least to the multiple homogeneous classification model using optimal tax weigh weighting integrated approach OWIA be weighted it is integrated, with Obtain a disorder in screening model.
Optionally, structuring is handled after the progress to audit report, obtains the corresponding sample data of this layer of granularity of classification Collection includes:
The pathologic finding report being subject in the audit report, Data Integration is carried out to the audit report;
Feature extraction is carried out to the audit report after integration, obtains representative and discrimination target data item;
Numeralization expression is carried out to the target data item, obtains the sample data set.
Optionally, the training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classifications Model, including:
The sample data set is divided into training dataset and test data set according to preset ratio;
Sampling is concentrated to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is the positive integer more than 1, In, specific sample mode can have the sampling put back to so that sampled data set size is identical with original data set size;
Using uniform machinery learning algorithm, the training and verification of model are carried out on the k parts of training dataset respectively, is obtained To k homogeneous classification model.
Optionally, described that at least the multiple homogeneous classification model is carried out using optimal tax power weighting integrated approach OWIA Weighting is integrated, to obtain a disorder in screening model, including:
Determine the weight composite set of the multiple homogeneous classification model;
It calculates separately under each weight combination in the weight composite set, the Performance Evaluation value of integrated model;
The corresponding weight combination of the optimal performance assessed value of the integrated model is combined as optimal weights, and uses institute State optimal weights combination the multiple homogeneous classification model is weighted it is integrated.
Optionally, the weight composite set of the multiple homogeneous classification model of the determination, including:
All weight combinations are traversed at preset weight precision ε, the collection for obtaining ownership recombination is combined into Wn,k, In, the weight composite set includesGroup weight combination, wherein ε=10 n=1/p, p is positive integer;
The method further includes:
The integrated model F (x) is indicated by following first formula:
Wherein, k is the number of homogeneous classification model, fiIndicate i-th of disaggregated model, wiIndicate the power of i-th of disaggregated model Weight, wi ∈ (0,1), andThe output of F (x) is the probability that sample is under the jurisdiction of positive class, that is, is judged as that canceration occurs Confidence level;
The performance of the integrated model is evaluated by following second formula:
Wherein, Test Data are test data set, and Q (F (x)) indicates integrated model F (x) AUC in test data set Value.
Optionally, the method be applied to gastric cancer carry out screening, the audit report include pathologic finding report and Gastrocopy is reported.
Across the granularity intelligence disorder in screening system of second aspect of the present invention one kind, including:
Model building module, for establishing multilayer disorder in screening model, the granularity of classification of each layer of disorder in screening model is not Together;
Model calling module obtains every for according to target audit report, calling each layer of disorder in screening model successively The classification of diseases result of one layer of disorder in screening model output.
Optionally, the model building module includes:
Structuring handles submodule afterwards, is handled for structuring after being carried out to audit report, obtains each layer of granularity of classification Corresponding sample data set;
Model training submodule, the training for carrying out individual segregation model according to the sample data set, obtains multiple Homogeneous classification model;
Model selectes submodule, at least using optimal tax power weighting integrated approach to the multiple homogeneous classification model OWIA be weighted it is integrated, to obtain each layer of disorder in screening model.
Optionally, the rear structuring processing submodule includes:
Data Integration submodule, the pathologic finding report for being subject in the audit report, to the audit report Carry out Data Integration;
Feature extraction submodule, for after integration audit report carry out feature extraction, obtain it is representative and The target data item of discrimination;
Numeralization processing submodule obtains the sample data for carrying out numeralization expression to the target data item Collection.
Optionally, the model training submodule includes:
Data divide submodule, for the sample data set to be divided into training dataset and test according to preset ratio Data set;
Submodule is sampled, for concentrating sampling to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is Positive integer more than 1;
Training submodule carries out model on the k parts of training dataset respectively for using uniform machinery learning algorithm Training and verification, obtain k homogeneous classification model.
Optionally, the selected submodule of the model includes:
Weight combines determination sub-module, the weight composite set for determining the multiple homogeneous classification model;
Performance Evaluation submodule is integrated for calculating separately under the combination of each weight in the weight composite set The Performance Evaluation value of model;
Weighting integrates submodule, for the corresponding weight combination of the optimal performance assessed value of the integrated model to be used as most The combination of excellent weight, and using optimal weights combination the multiple homogeneous classification model is weighted integrated.
Using above-mentioned technical proposal, it can at least reach following technique effect:
The present invention, for various disease granularity of classification, establishes multilayer screening model, example when establishing disorder in screening model Such as, first layer granularity can be:Cancer, without cancer, second layer granularity is finely divided for cancer or without cancer, and the granularity as cancer is segmented includes The granularity of squamous carcinoma, gland cancer, cell cancer etc., no cancer subdivision includes inflammation, tumor, polyp, ulcer etc..In this way, for a certain new Case, by calling each layer of disorder in screening model successively, can with it is first determined whether occur canceration (i.e. first layer grain Degree), then judge next subdivision classification (i.e. second layer granularity), as canceration, determines whether that squamous carcinoma, whether there is or not gland cancer respectively Deng improving the comprehensive and accuracy rate of screening.
Further, a kind of optimal entitled weighting integrated approach (OWIA, Optimal may be used in the present invention Weighted Integrated Approach), it is ensured that used weight combination is optimal during weighted average is integrated , be capable of the performance of maximized lift scheme, compared with prior art in be to the promotion of model prediction performance it is uncontrollable, this The performance for inventing the promotion disorder in screening model that the technical solution provided can be controllable, further increases the accurate of disorder in screening Rate.
Other features and advantages of the present invention will be described in detail in subsequent specific embodiment part.
Description of the drawings
Attached drawing is to be used to provide further understanding of the present invention, an and part for constitution instruction, with following tool Body embodiment is used to explain the present invention together, but is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is a kind of flow diagram of across granularity intelligence disorder in screening method provided in an embodiment of the present invention;
Fig. 2 is a kind of flow diagram of the construction method of disorder in screening model provided in an embodiment of the present invention.
Fig. 3 is the flow diagram of across the granularity intelligence disorder in screening method of another kind provided in an embodiment of the present invention;
Fig. 4 is a kind of structural schematic diagram of across granularity intelligence disorder in screening system provided in an embodiment of the present invention;
Fig. 5 is the structural schematic diagram of across the granularity intelligence disorder in screening system of another kind provided in an embodiment of the present invention.
Specific implementation mode
The specific implementation mode of the present invention is described in detail below in conjunction with attached drawing.It should be understood that this place is retouched The specific implementation mode stated is merely to illustrate and explain the present invention, and is not intended to restrict the invention.
The embodiment of the present invention provides a kind of across granularity intelligence disorder in screening method, as shown in Figure 1, this method includes:
S101, multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different.
Wherein, establish training data that each layer of disorder in screening model uses can difference can also be identical, it is still, each The tag along sort of layer training data is distinguished, and specifically, the granularity of classification of lower layer can be last layer granularity of classification into one Step subdivision.
S102, according to target audit report, call each layer of disorder in screening model successively, obtain each layer disease sieve Look into the classification of diseases result of model output.
The above method is directed to various disease granularity of classification, establishes multilayer screening model, for example, first layer granularity can be with For:Cancer, without cancer, second layer granularity is finely divided for cancer or without cancer, and the granularity as cancer is segmented includes squamous carcinoma, gland cancer, cell cancer Etc., the granularity of no cancer subdivision includes inflammation, tumor, polyp, ulcer etc..In this way, a certain new case is directed to, by adjusting successively , can be with it is first determined whether canceration (i.e. first layer granularity) occur with each layer of disorder in screening model, then judge next subdivision Classification (i.e. second layer granularity) determines whether that squamous carcinoma, whether there is or not gland cancer etc. respectively as canceration, improves the comprehensive of screening Property and accuracy rate.
The method for building up of model is described in detail below.Optionally, the embodiment of the present invention may be used as shown in Figure 2 Method and step establish each layer of disorder in screening model, including:
S1011, structuring after audit report progress is handled, obtains the corresponding sample data set of this layer of granularity of classification.
It is worth noting that technical solution provided in an embodiment of the present invention can be used for building screening mould for various disease Type.For example, gastric cancer, breast cancer, diabetes etc..
Below with gastric cancer for example, it refers to by integrating gastrocopy report and pathologic finding that structuring, which is handled, after above-mentioned Report, construction shaped like<Gastrocopy data, pathological examination results>Data set, and to after integration audit report data carry out The numeralization of feature extraction and report indicates, obtains the sample data set for modeling.
S1012, the training that individual segregation model is carried out according to the sample data set, obtain multiple homogeneous classification models.
S1013, at least the multiple homogeneous classification model is weighted using optimal tax power weighting integrated approach OWIA It is integrated, to obtain a disorder in screening model.
Above-mentioned steps S1012 and S1013 is the training of model and integrates, wherein homogeneous classification model refers to by of the same race The disaggregated model that learning algorithm learns.In addition, the learning algorithm that structure model uses for example can be support vector machines (SVM, Support Vector Machine), multi-layer perception (MLP) (MLP, Multi-layer Perceptron), limit gradient (XGBoost), neural network etc. are promoted, it is not limited in the embodiment of the present invention.
Those of ordinary skill in the art should know that integrated study is intended to by building and merging multiple machine learning models Prediction improve precision of prediction, existing research work proves that integrated study can be obviously improved model performance.But mesh Preceding more common Integrated Strategy has the method for average and ballot method, and the weighted average use in the method for average is more extensive.Scholars Power usually is assigned to individual segregation model according to indexs such as AUC value, accuracys rate and carries out weighted average and is integrated, but this weight is chosen Reliable theoretical foundation is not had, in some cases, simple average is even better than average weighted integrated, i.e. weighted average The promotion for collecting pairs of model performance is uncontrollable.And the disclosure provides a kind of optimal entitled weighting integrated approach (OWIA), Ensure that used weight combination is optimal during weighted average is integrated, be capable of the performance of maximized lift scheme, And then improve the accuracy rate of disorder in screening.
In order to enable those of ordinary skill in the art that technical solution provided in an embodiment of the present invention is more clearly understood, Above-mentioned steps are described in detail below.
Still it is illustrated with carrying out screening to gastric cancer, in this case, the inspection report described in step S1011 Announcement includes pathologic finding report and gastrocopy report.Structuring processing afterwards specifically includes following steps:
(1) data reporting is integrated:Gastrocopy report and pathologic finding report by same patient are integrated.It is clinical On, pathological examination results are considered as goldstandard, the i.e. currently generally acknowledged most reliable standard method that diagnoses the illness, and can correctly distinguish " ill " or " disease-free ".Therefore, pathological examination results are subject to during being integrated to data reporting,.
(2) feature extraction of audit report:From data reporting, representative and discrimination data item is extracted, Positive descriptor or phrase, report time, the patient information of such as particular condition.
(3) numeralization of data reporting indicates:Data reporting is converted to according to characteristic item the process of numeric type data.
Further, above-mentioned steps S1012 includes:The sample data set is divided into training data according to preset ratio Collection and test data set;Sampling is concentrated to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is just more than 1 Specifically the sample mode put back to may be used in integer so that sampled data set size is identical with original data set size;It adopts With uniform machinery learning algorithm, the training and verification of model are carried out on the k parts of training dataset respectively, obtains k homogeneity Disaggregated model.
Illustratively, data reporting structuring after warp handled presses 4:1 ratio cut partition is training dataset and test Data set carries out training dataset, from sampling, to obtain the sampled data set of k parts of mutuals intersection;A kind of machine learning is selected to calculate Method, such as logistic regression, support vector machines carry out the training and verification of model in k parts of sampled data sets respectively, obtain k Homogeneous classification model, as individual segregation model used in next step model integrated.
Further, above-mentioned steps S1013 includes:Determine the weight composite set of the multiple homogeneous classification model;Point It does not calculate under each weight combination in the weight composite set, the Performance Evaluation value of integrated model;By the integrated mould The corresponding weight combination of optimal performance assessed value of type is combined as optimal weights, and is combined to described using the optimal weights Multiple homogeneous classification models are weighted integrated.
It is worth noting that in two classification problems, area under ROC curve, i.e. AUC are widely used in assessment models performance Quality, by AUC value of the disaggregated model on test set come the expectation Generalization Capability of averaging model.Again since intelligent gastric cancer is sieved The looking into model of the task is to judge whether canceration occurs, and is typical two classification problems, therefore, for the screening of gastric cancer, the present invention A kind of preferred index being achieved in that using AUC as assessment models Generalization Capability.
In addition, the weight composite set of above-mentioned the multiple homogeneous classification model of determination may include:In preset weight All weight combinations are traversed under precision ε, the collection for obtaining ownership recombination is combined into Wn,k, wherein the weight composite set packet ContainGroup weight combination, wherein ε=10 n=1/p, p is positive integer.In this way, the formula expression of integrated model F (x) is such as following First formula:
Wherein, k is the number of homogeneous classification model, fiIndicate i-th of disaggregated model, wiIndicate the power of i-th of disaggregated model Weight, wi∈ (0,1), andThe output of F (x) is the probability that sample is under the jurisdiction of positive class, that is, is judged as that canceration occurs Confidence level;
Further, following second formula may be used in the performance of evaluation integrated model F (x):
Wherein, Test Data are test data set, and Q (F (x)) indicates integrated model F (x) AUC in test data set Value.
Following table gives the pseudocode description of optimal entitled weighting integrated approach:
Shown in table as above, row 1) -2) be parameter initialization, 3) -5) be traversed under given weight precision it is all Weight combination, 6) -7) provide integrated model Performance Evaluation function and calculate assessed value, 8) -11) find out and keep model performance optimal Weight combination, with return optimal weights combination multiple homogeneous classification models are weighted it is integrated.
Fig. 3 shows the stream for establishing gastric cancer screening model for gastric cancer using technical solution provided in an embodiment of the present invention Journey, as shown in figure 3, structure includes the rear structure for including to case control report and gastrocopy for the screening model of gastric cancer Change is handled, and is specifically referred to the above-mentioned description handled rear structuring, details are not described herein again.Further, to treated Audit report carries out data sampling, is used for disaggregated model training, obtains k homogeneous classification model.For each weighting weight Combination carries out Performance Evaluation, wherein AUC value specifically can be used in Performance Evaluation to weighting integrated model.It is determined based on Performance Evaluation Optimal weights combine, and tax power are carried out to model using optimal weights combination, to obtain final integrated model.
For the validity for the optimal entitled weighting integrated approach (OWIA) that the verification embodiment of the present invention is proposed, precise volume Change promotion of this method to model performance, is following contrast experiment.
By to training dataset carry out from sample, obtain 4 parts of data sets, respectively use LR, SVM, MLP, XGB algorithm into The training of row model, and on test set carry out model integrated compliance test result, using AUC value as individual segregation model and integrate The evaluation index of model performance.
Experiment display, when using AUC value as the evaluation index of model performance, proposed optimal entitled weight-sets At method to shown in the promotion of model performance table specific as follows:
When as seen from the above table, using LR as learning algorithm, model performance can promote 4.2% to 9.9%;Made using SVM For learning algorithm when, model performance can promote 3.8% to 8.5%;When using MLP as learning algorithm, model performance can be promoted 2.1% to 3.7%;When using XGB as learning algorithm, model performance can promote 1.1% to 2.4%.
In addition, experiment shows the optimal entitled weighting integrated approach of proposed one kind on each learning algorithm, all It is integrated better than average weighted.When towards gastric cancer screening the problem of, can maximumlly lift scheme estimated performance, and then improve The accuracy rate of disorder in screening.
It is above-mentioned that only the optimal tax power weighting integrated approach OWIA in structure disorder in screening model construction process is carried out It is described in detail, in the specific implementation, other relevant operations, such as misclassification cost can also carry out model according to actual demand The selection etc. of threshold value, the present invention does not limit this.
Based on identical inventive concept, the embodiment of the present invention also provides a kind of across granularity intelligence disorder in screening system 30, As shown in figure 4, the system 30 includes:
Model building module 301, for establishing multilayer disorder in screening model, the granularity of classification of each layer of disorder in screening model It is different;
Model calling module 302, for according to target audit report, calling each layer of disorder in screening model successively, obtaining The classification of diseases result exported to each layer of disorder in screening model.
The system is directed to various disease granularity of classification, multilayer screening model is established, for example, first layer granularity can be: Cancer, without cancer, second layer granularity is finely divided for cancer or without cancer, such as the granularity of cancer subdivision includes squamous carcinoma, gland cancer, cell cancer Deng the granularity of no cancer subdivision includes inflammation, tumor, polyp, ulcer etc..In this way, a certain new case is directed to, by calling successively Each layer of disorder in screening model, can be it is first determined whether canceration (i.e. first layer granularity) occurs, then to judge next disaggregated classification Not (i.e. second layer granularity) determines whether that squamous carcinoma, whether there is or not gland cancer etc. respectively as canceration, improves the comprehensive of screening And accuracy rate.
Optionally, as shown in figure 4, the model building module 301 may include:
Structuring handles submodule 3011 afterwards, is handled for structuring after being carried out to audit report, obtains each layer of classification The corresponding sample data set of granularity;
Model training submodule 3012, the training for carrying out individual segregation model according to the sample data set, obtains Multiple homogeneous classification models;
Model selectes submodule 3013, at least being integrated using optimal tax power weighting to the multiple homogeneous classification model Method OWIA be weighted it is integrated, to obtain each layer of disorder in screening model.
In this way, the system uses optimal entitled weighting integrated approach (OWIA, Optimal Weighted Integrated Approach), it is ensured that used weight combination is optimal during weighted average is integrated, can be most The performance for the lift scheme changed greatly, compared with prior art in be uncontrollable to the promotion of model prediction performance, which can The performance of controllable promotion disorder in screening model, and then improve the accuracy rate of disorder in screening.
Optionally, the rear structuring processing submodule 3011 includes:
Data Integration submodule, the pathologic finding report for being subject in the audit report, to the audit report Carry out Data Integration;
Feature extraction submodule, for after integration audit report carry out feature extraction, obtain it is representative and The target data item of discrimination;
Numeralization processing submodule obtains the sample data for carrying out numeralization expression to the target data item Collection.
Optionally, the model training submodule 3012 includes:
Data divide submodule, for the sample data set to be divided into training dataset and test according to preset ratio Data set;
Submodule is sampled, for concentrating sampling to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is Positive integer more than 1;
Training submodule carries out model on the k parts of training dataset respectively for using uniform machinery learning algorithm Training and verification, obtain k homogeneous classification model.
Optionally, the model selectes submodule and 3013 includes:
Weight combines determination sub-module, the weight composite set for determining the multiple homogeneous classification model;
Performance Evaluation submodule is integrated for calculating separately under the combination of each weight in the weight composite set The Performance Evaluation value of model;
Weighting integrates submodule, for the corresponding weight combination of the optimal performance assessed value of the integrated model to be used as most The combination of excellent weight, and using optimal weights combination the multiple homogeneous classification model is weighted integrated.
Those skilled in the art can be understood that, for convenience and simplicity of description, only with above-mentioned each function mould The division progress of block, can be as needed and by above-mentioned function distribution by different function modules for example, in practical application It completes, i.e., the internal structure of system is divided into different function modules, to complete all or part of the functions described above. The specific work process of foregoing description function module, can refer to corresponding processes in the foregoing method embodiment, no longer superfluous herein It states.
The embodiment of the present invention also provides another across granularity intelligence disorder in screening system 40, as shown in figure 5, the system 40 include:
Processor (processor) 41, communication interface (Communications Interface) 42, memory (memory) 43 and communication bus 44;Wherein, the processor 41, the communication interface 42 and the memory 43 pass through described Communication bus 44 completes mutual communication.
Processor 41 may be a multi-core central processing unit CPU or specific integrated circuit ASIC (Application Specific Integrated Circuit), or be arranged to implement the one of the embodiment of the present invention A or multiple integrated circuits.
For memory 43 for storing program code, said program code includes computer-managed instruction and network flow graph.It deposits Reservoir 43 may include high-speed RAM memory, it is also possible to and further include nonvolatile memory (non-volatile memory), A for example, at least magnetic disk storage.Memory 43 can also be memory array.Memory 43 is also possible to by piecemeal, and institute Virtual volume can be combined by certain rule by stating block.
The communication interface 42, for realizing the connection communication between these devices.
The processor 41 is used to execute the program code in the memory 43, to realize following operation:
Multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different;
According to target audit report, each layer of disorder in screening model is called successively, obtains each layer of disorder in screening mould The classification of diseases result of type output.
Optionally, each layer of disorder in screening model is established with the following method:
Structuring is handled after being carried out to audit report, obtains the corresponding sample data set of this layer of granularity of classification;
The training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classification models;
At least to the multiple homogeneous classification model using optimal tax weigh weighting integrated approach OWIA be weighted it is integrated, with Obtain a disorder in screening model.
Optionally, structuring is handled after the progress to audit report, obtains the corresponding sample data of this layer of granularity of classification Collection includes:
The pathologic finding report being subject in the audit report, Data Integration is carried out to the audit report;
Feature extraction is carried out to the audit report after integration, obtains representative and discrimination target data item;
Numeralization expression is carried out to the target data item, obtains the sample data set.
Optionally, the training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classifications Model, including:
The sample data set is divided into training dataset and test data set according to preset ratio;
Sampling is concentrated to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is the positive integer more than 1;
Using uniform machinery learning algorithm, the training and verification of model are carried out on the k parts of training dataset respectively, is obtained To k homogeneous classification model.
Optionally, described that at least the multiple homogeneous classification model is carried out using optimal tax power weighting integrated approach OWIA Weighting is integrated, to obtain the disorder in screening model, including:
Determine the weight composite set of the multiple homogeneous classification model;
It calculates separately under each weight combination in the weight composite set, the Performance Evaluation value of integrated model;
The corresponding weight combination of the optimal performance assessed value of the integrated model is combined as optimal weights, and uses institute State optimal weights combination the multiple homogeneous classification model is weighted it is integrated.
Optionally, the weight composite set of the multiple homogeneous classification model of the determination, including:
All weight combinations are traversed at preset weight precision ε, the collection for obtaining ownership recombination is combined into Wn,k, In, the weight composite set includesGroup weight combination, wherein ε=10 n=1/p, p is positive integer;
The method further includes:
The integrated model F (x) is indicated by following first formula:
Wherein, k is the number of homogeneous classification model, fiIndicate i-th of disaggregated model, wiIndicate the power of i-th of disaggregated model Weight, wi ∈ (0,1), andThe output of F (x) is the probability that sample is under the jurisdiction of positive class, that is, is judged as that canceration occurs Confidence level;
The performance of the integrated model is evaluated by following second formula:
Wherein, Test Data are test data set, and Q (F (x)) indicates integrated model F (x) AUC in test data set Value.
Optionally, the method be applied to gastric cancer carry out screening, the audit report include pathologic finding report and Gastrocopy is reported.
The preferred embodiment of the present invention is described in detail above in association with attached drawing, still, the present invention is not limited to above-mentioned realities The detail in mode is applied, within the scope of the technical concept of the present invention, a variety of letters can be carried out to technical scheme of the present invention Monotropic type, these simple variants all belong to the scope of protection of the present invention.
It is further to note that specific technical features described in the above specific embodiments, in not lance In the case of shield, it can be combined by any suitable means.In order to avoid unnecessary repetition, the present invention to it is various can The combination of energy no longer separately illustrates.
In addition, various embodiments of the present invention can be combined randomly, as long as it is without prejudice to originally The thought of invention equally should be considered as the content that the present invention is invented.

Claims (10)

1. a kind of across granularity intelligence disorder in screening method, which is characterized in that the method includes:
Multilayer disorder in screening model is established, the granularity of classification of each layer of disorder in screening model is different;
According to target audit report, each layer of disorder in screening model is called successively, and it is defeated to obtain each layer of disorder in screening model The classification of diseases result gone out.
2. according to the method described in claim 1, it is characterized in that, establishing each layer of disorder in screening mould with the following method Type:
Structuring is handled after being carried out to audit report, obtains the corresponding sample data set of this layer of granularity of classification;
The training that individual segregation model is carried out according to the sample data set, obtains multiple homogeneous classification models;
At least to the multiple homogeneous classification model using optimal tax weigh weighting integrated approach OWIA be weighted it is integrated, to obtain One disorder in screening model.
3. according to the method described in claim 2, it is characterized in that, structuring processing after the progress to audit report, obtains The corresponding sample data set of this layer of granularity of classification includes:
The pathologic finding report being subject in the audit report, Data Integration is carried out to the audit report;
Feature extraction is carried out to the audit report after integration, obtains representative and discrimination target data item;
Numeralization expression is carried out to the target data item, obtains the sample data set.
4. according to the method described in claim 2, it is characterized in that, described carry out individual segregation mould according to the sample data set The training of type obtains multiple homogeneous classification models, including:
The sample data set is divided into training dataset and test data set according to preset ratio;
Sampling is concentrated to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is the positive integer more than 1;
Using uniform machinery learning algorithm, the training and verification of model are carried out on the k parts of training dataset respectively, obtains k A homogeneous classification model.
5. method according to any one of claim 2 to 4, which is characterized in that described at least to the multiple homogeneity point Class model using optimal tax weigh weighting integrated approach OWIA be weighted it is integrated, to obtain a disorder in screening model, including:
Determine the weight composite set of the multiple homogeneous classification model;
It calculates separately under each weight combination in the weight composite set, the Performance Evaluation value of integrated model;
The corresponding weight combination of the optimal performance assessed value of the integrated model is combined as optimal weights, and described in most Excellent weight combination is weighted the multiple homogeneous classification model integrated.
6. according to the method described in claim 5, it is characterized in that, the weight group of the multiple homogeneous classification model of the determination Intersection is closed, including:
All weight combinations are traversed at preset weight precision ε, the collection for obtaining ownership recombination is combined into Wn,k, wherein institute Stating weight composite set includesGroup weight combination, wherein ε=10 n=1/p, p is positive integer;
The method further includes:
The integrated model F (x) is indicated by following first formula:
Wherein, k is the number of homogeneous classification model, fiIndicate i-th of disaggregated model, wiIndicate the weight of i-th of disaggregated model, wi∈ (0,1), andThe output of F (x) is the probability that sample is under the jurisdiction of positive class, that is, is judged as that setting for canceration occurs Reliability;
The performance of the integrated model is evaluated by following second formula:
Wherein, Test Data are test data set, and Q (F (x)) indicates integrated model F (x) AUC value in test data set.
7. a kind of across granularity intelligence disorder in screening system, which is characterized in that including:
Model building module, for establishing multilayer disorder in screening model, the granularity of classification of each layer of disorder in screening model is different;
Model calling module, for according to target audit report, calling each layer of disorder in screening model successively, obtaining each layer Disorder in screening model output classification of diseases result.
8. system according to claim 7, which is characterized in that the model building module includes:
Structuring handles submodule afterwards, is handled for structuring after being carried out to audit report, obtains each layer of granularity of classification and corresponds to Sample data set;
Model training submodule, the training for carrying out individual segregation model according to the sample data set, obtains multiple homogeneities Disaggregated model;
Model selectes submodule, at least using optimal tax power weighting integrated approach OWIA to the multiple homogeneous classification model Be weighted it is integrated, to obtain each layer of disorder in screening model.
9. system according to claim 8, which is characterized in that the model training submodule includes:
Data divide submodule, for the sample data set to be divided into training dataset and test data according to preset ratio Collection;
Submodule is sampled, for concentrating sampling to obtain the sampled data set that k parts of mutuals are intersected from the training data, k is more than 1 Positive integer;
Training submodule carries out the instruction of model on the k parts of training dataset respectively for using uniform machinery learning algorithm Practice and verify, obtains k homogeneous classification model.
10. according to the system described in any one of claim 8 or 9, which is characterized in that the model selectes submodule and includes:
Weight combines determination sub-module, the weight composite set for determining the multiple homogeneous classification model;
Performance Evaluation submodule, for calculating separately under the combination of each weight in the weight composite set, integrated model Performance Evaluation value;
Weighting integrates submodule, for regarding the corresponding weight combination of the optimal performance assessed value of the integrated model as optimal power Recombination, and using the optimal weights combination the multiple homogeneous classification model is weighted it is integrated.
CN201810495222.3A 2018-05-22 2018-05-22 Cross-granularity intelligent disease screening system Active CN108805178B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810495222.3A CN108805178B (en) 2018-05-22 2018-05-22 Cross-granularity intelligent disease screening system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810495222.3A CN108805178B (en) 2018-05-22 2018-05-22 Cross-granularity intelligent disease screening system

Publications (2)

Publication Number Publication Date
CN108805178A true CN108805178A (en) 2018-11-13
CN108805178B CN108805178B (en) 2020-12-15

Family

ID=64092777

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810495222.3A Active CN108805178B (en) 2018-05-22 2018-05-22 Cross-granularity intelligent disease screening system

Country Status (1)

Country Link
CN (1) CN108805178B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112598084A (en) * 2021-03-02 2021-04-02 深圳金三立视频科技股份有限公司 Vehicle type identification method and terminal based on image processing
CN112633601A (en) * 2020-12-31 2021-04-09 天津开心生活科技有限公司 Method, device, equipment and computer medium for predicting disease event occurrence probability

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103632168A (en) * 2013-12-09 2014-03-12 天津工业大学 Classifier integration method for machine learning
CN107180155A (en) * 2017-04-17 2017-09-19 中国科学院计算技术研究所 A kind of disease forecasting method and system based on Manufacturing resource model

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103632168A (en) * 2013-12-09 2014-03-12 天津工业大学 Classifier integration method for machine learning
CN107180155A (en) * 2017-04-17 2017-09-19 中国科学院计算技术研究所 A kind of disease forecasting method and system based on Manufacturing resource model

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112633601A (en) * 2020-12-31 2021-04-09 天津开心生活科技有限公司 Method, device, equipment and computer medium for predicting disease event occurrence probability
CN112598084A (en) * 2021-03-02 2021-04-02 深圳金三立视频科技股份有限公司 Vehicle type identification method and terminal based on image processing
CN112598084B (en) * 2021-03-02 2021-06-29 深圳金三立视频科技股份有限公司 Vehicle type identification method and terminal based on image processing

Also Published As

Publication number Publication date
CN108805178B (en) 2020-12-15

Similar Documents

Publication Publication Date Title
CN109300121B (en) A kind of construction method of cardiovascular disease diagnosis model, system and the diagnostic device
CN108766559A (en) Clinical decision support method and system for intelligent disorder in screening
CN106599913A (en) Cluster-based multi-label imbalance biomedical data classification method
CN109472784A (en) Based on the recognition methods for cascading full convolutional network pathological image mitotic cell
CN107066781B (en) Analysis method based on the relevant colorectal cancer data model of h and E
CN111860406A (en) Blood cell microscopic image classification method based on regional confusion mechanism neural network
Rodrigues et al. Optimizing a deep residual neural network with genetic algorithm for acute lymphoblastic leukemia classification
Sanida et al. Tomato leaf disease identification via two–stage transfer learning approach
CN108805178A (en) Across granularity intelligence disorder in screening method and system
CN112907604A (en) Self-adaptive super-pixel FCM (pixel-frequency modulation) method for fundus velveteen speckle image segmentation
CN105279520B (en) Optimal feature subset choosing method based on classification capacity structure vector complementation
CN115985503B (en) Cancer prediction system based on ensemble learning
Jannat et al. Efficient detection of crop leaf diseases: A lightweight convolutional neural network approach for enhanced agricultural productivity
CN110516741A (en) Classification based on dynamic classifier selection is overlapped unbalanced data classification method
CN110363240A (en) A kind of medical image classification method and system
Liu et al. A complex chained P system based on evolutionary mechanism for image segmentation
Pu et al. TA-BiDet: Task-aligned binary object detector
Xu et al. Generative detect for occlusion object based on occlusion generation and feature completing
Rahman et al. Deep Learning-Based Left Ventricular Ejection Fraction Estimation from Echocardiographic Videos
Xie et al. Using SVM and PSO-NN Models to Predict Breast Cancer
Dong et al. White blood cell classification based on a novel ensemble convolutional neural network framework
Shouryadhar et al. Multilevel Ensemble Method to Identify Risks in Chronic Kidney Disease Using Hybrid Synthetic Data
CN113344044B (en) Cross-species medical image classification method based on field self-adaption
CN114782397B (en) Artificial intelligence tumor diagnosis system based on medical image and machine learning
Huang et al. Multi-level Knowledge Integration with Graph Convolutional Network for Cancer Molecular Subtype Classification

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant