WO2021120587A1 - 基于oct的视网膜分类方法、装置、计算机设备及存储介质 - Google Patents

基于oct的视网膜分类方法、装置、计算机设备及存储介质 Download PDF

Info

Publication number
WO2021120587A1
WO2021120587A1 PCT/CN2020/099518 CN2020099518W WO2021120587A1 WO 2021120587 A1 WO2021120587 A1 WO 2021120587A1 CN 2020099518 W CN2020099518 W CN 2020099518W WO 2021120587 A1 WO2021120587 A1 WO 2021120587A1
Authority
WO
WIPO (PCT)
Prior art keywords
classification
split
classification feature
node
feature
Prior art date
Application number
PCT/CN2020/099518
Other languages
English (en)
French (fr)
Inventor
王关政
王立龙
王瑞
范栋轶
吕传峰
Original Assignee
平安科技(深圳)有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 平安科技(深圳)有限公司 filed Critical 平安科技(深圳)有限公司
Publication of WO2021120587A1 publication Critical patent/WO2021120587A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/46Descriptors for shape, contour or point-related descriptors, e.g. scale invariant feature transform [SIFT] or bags of words [BoW]; Salient regional features
    • G06V10/462Salient features, e.g. scale invariant feature transforms [SIFT]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V2201/00Indexing scheme relating to image or video recognition or understanding
    • G06V2201/03Recognition of patterns in medical or anatomical images

Definitions

  • This application relates to artificial intelligence, and in particular to an OCT-based retinal classification method, device, computer equipment, and storage medium.
  • OCT optical coherence tomography
  • Recognizing and classifying the parameter values of GCC helps doctors diagnose the retina in combination with the classification, and improves the efficiency and accuracy of diagnosis.
  • the inventor realizes that the traditional way of identifying and classifying GCC parameter values is mainly The deep neural network model is used to identify and classify, and there are certain differences between the features extracted by the deep neural network model and the doctor's diagnosis logic, which leads to the low accuracy of the deep neural network model's identification and classification, which affects the accuracy of the doctor's diagnosis , Reduce the doctor's work efficiency.
  • the embodiments of the present application provide an OCT-based retinal classification method, device, computer equipment, and storage medium to solve the problem that the traditional method of identifying and classifying GCC parameters is not accurate, which affects the accuracy of target user diagnosis and reduces work efficiency The problem.
  • An OCT-based retinal classification method including:
  • sample data set is composed of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
  • a random forest algorithm is used to construct a decision tree to obtain a retinal classification model
  • An OCT-based retinal classification device includes:
  • the first acquisition module is configured to acquire a sample data set from a preset database, wherein the sample data set is composed of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
  • the construction module is used to construct a decision tree based on the training samples in the sample data set using a random forest algorithm to obtain a retinal classification model;
  • the second acquisition module is used to acquire the GCC parameters to be identified which are obtained by the user through OCT scanning from the preset user library;
  • the feature extraction module is configured to perform feature extraction on the GCC parameters to be identified to obtain y data features, where y is a positive integer greater than 1;
  • the classification module is configured to import the y data features into the retinal classification model for classification, and output the classification result corresponding to the GCC parameter to be identified.
  • a computer device includes a memory, a processor, and computer-readable instructions stored in the memory and capable of running on the processor.
  • the processor implements the following OCT-based instructions when the processor executes the computer-readable instructions Steps of the retina classification method:
  • sample data set is composed of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
  • a random forest algorithm is used to construct a decision tree to obtain a retinal classification model
  • a non-volatile computer-readable storage medium stores computer-readable instructions, and the above-mentioned OCT-based retinal classification is realized when the computer-readable instructions are executed by a processor
  • sample data set is composed of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
  • a random forest algorithm is used to construct a decision tree to obtain a retinal classification model
  • the above-mentioned OCT-based retinal classification method, device, computer equipment, and storage medium use the acquired sample data set to construct a decision tree to obtain a retinal classification model, and then obtain the GCC parameters to be identified obtained by the user through OCT scanning, and the to-be-identified GCC parameters GCC parameters are feature extracted to obtain data features, and finally the data features are imported into the retinal classification model for classification, and the classification results corresponding to the GCC parameters to be identified are obtained.
  • the retinal classification model can be trained with data features similar to the target user's diagnostic logic, which improves the accuracy of the recognition and classification of the retinal classification model and ensures the effectiveness of the classification results. Therefore, it is beneficial to improve the accuracy of the target user's diagnosis based on the classification result, and further improve the target user's work efficiency.
  • FIG. 1 is a flowchart of retinal classification based on OCT images provided by an embodiment of the present application
  • step S2 is a flowchart of step S2 in OCT image-based retinal classification according to an embodiment of the present application
  • step S25 is a flowchart of step S25 in OCT image-based retinal classification according to an embodiment of the present application
  • step S253 is a flowchart of step S253 in OCT image-based retinal classification according to an embodiment of the present application
  • Fig. 5 is a flow chart of calculating the target Gini index and performing splitting in the retinal classification based on OCT images provided by an embodiment of the present application;
  • FIG. 6 is a schematic diagram of an OCT image-based retinal classification device provided by an embodiment of the present application.
  • Fig. 7 is a basic structural block diagram of a computer device provided by an embodiment of the present application.
  • the OCT-based retinal classification method provided in this application is applied to the server, and the server can be implemented by an independent server or a server cluster composed of multiple servers.
  • a method for retinal classification based on OCT is provided, which includes the following steps:
  • S1 Obtain a sample data set from a preset database, where the sample data set consists of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1.
  • the sample data set is directly obtained from a preset database, where the preset database refers to a database dedicated to storing sample data sets.
  • the sample data set contains q training samples
  • the training samples are GCC parameters
  • each training sample has its corresponding classification feature
  • the classification feature is mainly the disease category set by the user.
  • the training samples are mainly GCC parameters scanned by the OCT device.
  • the GCC parameters are composed of five data features corresponding to the GCC thickness, which are: All Avg, Sup Avg, Inf Avg, FLV, and GLV.
  • multiple training samples are randomly selected from the sample data set.
  • random sampling may be adopted.
  • the random sampling is random sampling with replacement.
  • K rounds of sampling are repeated in the sample data set.
  • the extracted result is used as a sub-training set, and K sub-training sets are obtained.
  • the K sub-training sets are independent of each other, and there may be repeated training samples in the sub-training sets.
  • the number of training samples drawn can be obtained based on historical experience, or appropriate training samples can be drawn according to specific business needs, and used as a sub-training set for machine model training. Although the more training sample data, the more accurate it is. However, the higher the training cost and the more difficult the implementation, the specific number can be extracted according to the needs of the actual application, and there is no limitation here.
  • the random forest algorithm is used to construct a decision tree, a decision tree is constructed for each sub-training set, and K decision trees are obtained, and then a random forest is constructed according to the generated K decision trees to obtain a retinal classification model.
  • the preset user database refers to a database dedicated to storing GCC parameters to be identified.
  • the GCC parameters to be identified include different parameters and identification information corresponding to the parameters, and the identification information is mainly GCC thickness and non-GCC thickness.
  • S4 Perform feature extraction on the GCC parameters to be identified to obtain y data features, where y is a positive integer greater than 1.
  • the identification information corresponding to the parameter in the GCC parameter to be identified is identified. If the identification information is identified as the GCC thickness, the parameter corresponding to the identification information is extracted, and each extracted parameter As the data feature, finally y data features are extracted; if the identification information is identified as non-GCC thickness, no processing is done.
  • the GCC parameters to be identified may specifically include 9 non-GCC thicknesses and 5 GCC thicknesses, where the parameters corresponding to the 5 GCC thicknesses are All Avg, Sup Avg, Inf Avg, FLV, and GLV, respectively.
  • the 5 GCC thicknesses are used to determine the type of the retina corresponding to the GCC parameters to be identified, and the results of 9 non-GCC thicknesses and 5 GCC thicknesses can also be combined to determine the type of retina corresponding to the GCC to be identified. .
  • the above data features can also be stored in a node of a blockchain.
  • the y data features are imported into the retina classification model, and the retina classification model will classify the data features after receiving the data features, and output the classification features corresponding to the data features as the classification results corresponding to the GCC parameters to be identified.
  • the obtained sample data set is used to construct a decision tree to obtain a retinal classification model, and then obtain the GCC parameters to be recognized obtained by the user through OCT scanning, and perform feature extraction on the GCC parameters to be recognized to obtain the data features, and finally The data features are imported into the retina classification model for classification, and the classification results corresponding to the GCC parameters to be identified are obtained.
  • the retinal classification model can be trained with data features similar to the target user's diagnostic logic, which improves the accuracy of the recognition and classification of the retinal classification model and ensures the effectiveness of the classification results. Therefore, it is beneficial to improve the accuracy of the target user's diagnosis based on the classification result, and further improve the target user's work efficiency.
  • the training samples include classification features.
  • a random forest algorithm is used to construct a decision tree to obtain a retinal classification model including the following steps:
  • S21 Use random sampling to extract training samples from the sample data set to construct K sub-training sets, where K is a positive integer greater than 1.
  • random sampling is used to extract training samples from the sample data set
  • the random sampling method can use resampling technology to extract training samples from the sample data set.
  • the resampling technology is implemented in the sample data set. Sampling, each training sample in the sample data set has the same probability of being drawn each time. K rounds of extraction are repeated in the sample data set. The result of each round of extraction is used as a sub-training set to obtain K sub-training sets. Among them, sub-training The number of training samples in the set is less than or equal to the number of training samples in the sample data set.
  • X is the classification feature
  • H(X) is the information entropy of the classification feature
  • i 1, 2,...,n
  • x i is the i-th classification feature
  • p(x i ) is the i-th classification feature Probability of eigenvalues.
  • gain is the information gain of the classification feature
  • H(c) is the information entropy before the classification feature X is split
  • X) is the information entropy after the classification feature X is split.
  • IntI is the penalty factor of the classification feature
  • D is the total number of training samples in the sample data set
  • W X is the number of training samples of the classification feature
  • gr is the information gain ratio of the classification feature.
  • the information gain ratio of the classification feature the information gain of the classification feature/the information gain of the classification feature Punishment factor.
  • S25 Select the classification feature corresponding to the largest information gain ratio as the split node, use the classification features corresponding to other information gain ratios as the node to be split, and use the split node for splitting.
  • the C4.5 algorithm is used to construct the decision tree, the penalty factor of the classification feature is calculated according to formula (4), the information gain ratio of each classification feature is calculated using formula (3), and the maximum information
  • the classification feature corresponding to the gain ratio is used as the split node, and the classification feature corresponding to the other information gain ratio is used as the node to be split, and the split node is used for splitting.
  • the decision tree construction tends to select the classification feature with larger information gain as the split node.
  • the information gain of the classification feature will be relatively large, but there are multiple classifications in the training set.
  • the prediction accuracy of the trained decision tree is low, and the information gain ratio is calculated according to the penalty factor of the classification feature, and the classification feature corresponding to the maximum information gain ratio is used as the split node to split , Can effectively avoid the adverse effects of evenly distributed attributes on the decision tree splitting, and improve the quality of decision tree construction.
  • step S26 For the classification features corresponding to the node to be split, return to step S22 to continue execution, until all the classification features are used as split nodes to complete the split, and K decision trees are obtained.
  • the classification feature corresponding to the node to be split return to step S22 for each sub-training set, and continue to calculate the information entropy of the classification feature until all the classification features are used as split nodes to complete the split.
  • S27 Construct a random forest based on K decision trees to obtain a retinal classification model.
  • the K decision trees are combined into a random forest to obtain a retina classification model, which is used to evaluate the type of retina corresponding to the GCC parameter.
  • the maximum information gain ratio is used as the split node, which can effectively avoid the adverse effects of uniformly distributed classification features on the decision tree splitting, and improve
  • the quality of decision tree construction, and the construction of a random forest from multiple decision trees enhances the classification and prediction capabilities of the machine model and improves the accuracy of the retinal classification model, which is conducive to improving the accuracy of the target user's diagnosis based on the classification results obtained by the retinal classification model Performance, and further improve the work efficiency of target users.
  • the classification feature corresponding to the largest information gain ratio is selected as the split node, and the classification features corresponding to other information gain ratios are used as the node to be split.
  • Splitting using the split node includes the following step:
  • S251 Select the classification feature corresponding to the largest information gain ratio as the split node, and use the classification feature corresponding to other information gain ratios as the node to be split.
  • the classification feature corresponding to the largest information gain ratio is selected as the split node, and the classification feature corresponding to other information gain ratios is selected as the node to be split.
  • S252 Calculate the Gini index of the split node by using the Gini index formula.
  • G(p) is the Gini index
  • e is the preset classification condition corresponding to the split node
  • pk is the proportion of the same input category in a specific group.
  • S253 Compare the Gini index with the preset index, and split according to the comparison result.
  • the Gini index is compared with a preset index, the comparison result is compared with the description information in the preset rule library, and the setting rule matching the description information is selected for splitting.
  • the preset rule library refers to a database specifically used to store different description information and setting rules corresponding to the description information.
  • the comparison result is that the Gini index is less than or equal to the preset index
  • the description information in the preset rule database is that the Gini index is less than or equal to the preset index
  • the corresponding setting rule is Rule A
  • the existence description information is that the Gini index is greater than the preset index.
  • Index, and its corresponding setting rule is rule B; by comparing the comparison result with the description information, rule A is selected to split.
  • the classification feature corresponding to the largest information gain ratio is selected as the split node, and the classification features corresponding to other information gain ratios are used as the node to be split, and formula (5) is used to calculate the Gini index corresponding to the split node, and finally The Gini index is compared with the preset index and split according to the comparison result.
  • the Gini index can be used to further split some of the decision trees, improve the accuracy of the decision tree, and thereby improve the accuracy of the subsequent retinal classification model training.
  • step S253 comparing the Gini index with a preset index, and determining the decision tree according to the comparison result includes the following steps:
  • the Gini index is compared with a preset index.
  • step S2531 if the Gini index is less than or equal to the preset index, it means that the split node corresponding to the Gini index has a good classification effect, and the splitting is not being performed.
  • step S2531 if the Gini index is greater than the preset index, it means that the split node classification effect corresponding to the Gini index is poor, and the preset classification condition is used to split the split node until each After the split, the Gini index corresponding to each node is less than or equal to the preset index or reaches the preset number of splits, the split ends.
  • the preset classification condition refers to setting the condition for classifying the sample data set according to the actual needs of the user.
  • the preset index can be specifically 0.2, or it can be set according to the actual needs of the user, and there is no restriction here.
  • the preset number of splits refers to the number of times the user sets to stop the splitting node from splitting.
  • the Gini index by comparing the Gini index with the preset index, when the Gini index is less than or equal to the preset index, no splitting is performed; when the Gini index is greater than the preset index, the splitting is performed using the preset classification conditions.
  • the nodes are classified until the preset cut-off condition is reached, and the splitting is stopped. Determining whether the split node needs to be split further under different comparison results can effectively avoid the inaccurate split due to calculation errors, thereby ensuring the accuracy of the splitting process and further ensuring the accuracy of subsequent retinal classification model training.
  • the OCT-based retinal classification further includes the following steps:
  • S6 Sort the Gini indices corresponding to all decision trees in ascending order to obtain the sorting result.
  • the Gini indices corresponding to all decision trees are sorted in ascending order, that is, the smallest Gini index is taken as the first place, and the largest Gini index is taken as the last place, and the corresponding ranking result is obtained.
  • the top a of sorting refers to the first to a-th sorted in the sorting result obtained in step S6, and the b-th sorting position refers to the last to the bottom bth in the sorting result obtained in step S6.
  • the Gini index at position a before sorting is selected as the first Gini index
  • the Gini index corresponding to position a before sorting is selected as the first Gini index
  • the Gini index corresponding to position b after sorting is selected as the first Gini index.
  • the second Gini index is calculated by doubling each first Gini index according to the preset first weight, and the result of the doubled calculation is used as the target Gini index; according to the preset second weight, each second Gini index is calculated
  • the halving calculation is calculated, and the result after the halving calculation is used as the target Gini index.
  • doubling the Gini index of the top a and halving the Gini index of the bottom b can improve the accuracy of classification, where a and b are both positive integers greater than 1, and there are cases where a and b are the same , Its specific value can be set according to the actual needs of the user, and there is no restriction here.
  • the calculated Gini index is doubled when calculating the Gini index, that is, the Gini index value is multiplied by the preset first weight, for example, the original Gini index value is 1, and the preset first weight Is 2, and the Gini index value is 2 after doubling. That is to say, the cost of important feature classification errors is greater, and it is necessary to reduce the situation of important feature classification errors.
  • the calculated Gini index is halved when calculating the Gini index, that is, the Gini index value is multiplied by the preset second weight, for example, the original Gini index value is 1, and the preset second The weight is 0.5, and the Gini index value is 0.5 after the halving. That is to say, the cost of misclassification of unimportant features is small, and there is no need to pay attention to the situation of misclassification of unimportant features.
  • the target Gini index is compared with the preset index, and if the target Gini index is less than or equal to the preset index, the decision tree corresponding to the target Gini index is obtained; if the target Gini index is greater than the preset index, If the index is set, the decision tree corresponding to the target Gini index is split using the preset classification conditions, until the target Gini index corresponding to each decision tree is less than or equal to the preset index or the preset target split times is reached, the split ends, and the result is obtained The decision tree after the split ends.
  • the Gini indices corresponding to all decision trees are sorted in ascending order to obtain the sorting result, and the Gini indices of the a-position before the sorting and the b-position after the sorting are selected from the Gini indices for weight calculation to obtain the target Gini. Index.
  • the decision tree corresponding to the Gini index of the first a position and the b position after the sort is split to obtain the split decision tree.
  • the classification features are further optimized, the analysis and calculation of important classification features are increased, and the analysis and calculation of unimportant classification features are reduced, thereby improving the accuracy of the decision tree, and further Ensure the accuracy of subsequent retinal classification model training.
  • a retina classification device based on OCT images corresponds to the retina classification method based on OCT images in the above-mentioned embodiment one-to-one.
  • the apparatus for retinal classification based on OCT images includes a first acquisition module 61, a construction module 62, a second acquisition module 63, a feature extraction module 64 and a classification module 65.
  • the detailed description of each functional module is as follows:
  • the first acquisition module 61 is configured to acquire a sample data set from a preset database, where the sample data set consists of q training samples, the training samples are GCC parameters, and q is a positive integer greater than 1;
  • the construction module 62 is used to construct a decision tree based on the training samples in the sample data set using the random forest algorithm to obtain a retinal classification model;
  • the second obtaining module 63 is configured to obtain the GCC parameters to be recognized that the user obtains through OCT scanning from the preset user library;
  • the feature extraction module 64 is used to perform feature extraction on the GCC parameters to be identified to obtain y data features, where y is a positive integer greater than 1. It should be emphasized that in order to further ensure the privacy and security of the above data features, the above Data characteristics can also be stored in a node of a blockchain;
  • the classification module 65 is configured to import the y data features into the retinal classification model for classification, and output the classification results corresponding to the GCC parameters to be identified.
  • the building module 62 includes:
  • the sub-training set construction sub-module is used to extract training samples from the sample data set by random sampling to construct K sub-training sets, where K is a positive integer greater than 1;
  • the information entropy calculation sub-module is used to calculate the information entropy of each classification feature according to formula (1) for each sub-training set:
  • X is the classification feature
  • H(X) is the information entropy of the classification feature
  • i 1, 2,...,n
  • x i is the i-th classification feature
  • p(x i ) is the i-th classification feature Probability of eigenvalues
  • the information gain calculation sub-module is used to calculate the information gain of each classification feature according to the information entropy according to formula (2):
  • gain is the information gain of the classification feature
  • H(c) is the information entropy before splitting according to the classification feature X
  • X) is the information entropy after splitting according to the classification feature X
  • the information gain ratio calculation sub-module is used to calculate the information gain ratio of each classification feature according to formula (3) and formula (4) according to the information gain:
  • IntI is the penalty factor of the classification feature
  • D is the total number of training samples in the sample data set
  • W X is the number of training samples of the classification feature
  • gr is the information gain ratio of the classification feature
  • the split node selection sub-module is used to select the classification feature corresponding to the maximum information gain ratio as the split node, and the classification feature corresponding to the other information gain ratio as the node to be split, and the split node is used for splitting;
  • the decision tree generation sub-module is used to return to step S22 for the classification feature corresponding to the node to be split and continue execution until all the classification features are used as split nodes to complete the splitting, and K decision trees are obtained;
  • the retina classification model construction sub-module is used to construct a random forest based on K decision trees to obtain the retina classification model.
  • split node selection sub-module includes:
  • the split node determination unit is used to select the classification feature corresponding to the largest information gain ratio as the split node, and use the classification features corresponding to other information gain ratios as the node to be split;
  • the Gini index calculation unit is used to calculate the Gini index of the split node by using the Gini index formula
  • the splitting unit is used to compare the Gini index with the preset index, and split according to the comparison result.
  • the splitting unit includes:
  • the comparison sub-unit is used to compare the Gini index with the preset index
  • the first comparison subunit is used for if the Gini index is less than or equal to the preset index, then no splitting is performed;
  • the second comparison subunit is used to if the Gini index is greater than the preset index, split the split node using the preset classification condition, and stop the split until the preset cut-off condition is reached.
  • the retina classification device based on OCT images further includes:
  • the sorting module is used to sort the Gini indices corresponding to all decision trees in ascending order to obtain the sorting result;
  • the weight calculation module is used to select the Gini index of the a-position before the sorting and the b-position after the sorting from the sorting results to perform weight calculations to obtain the target Gini index, where a and b are both positive integers greater than 1;
  • the second splitting module is used to split the decision tree corresponding to the Gini index of the a-position before the sorting and the b-position after the sorting according to the target Gini index to obtain the split decision tree.
  • FIG. 7 is a block diagram of the basic structure of the computer device 90 in an embodiment of the present application.
  • the computer device 90 includes a memory 91, a processor 92, and a network interface 93 that are communicatively connected to each other through a system bus. It should be pointed out that FIG. 7 only shows a computer device 90 with components 91-93, but it should be understood that it is not required to implement all of the illustrated components, and more or fewer components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • Its hardware includes, but is not limited to, a microprocessor, a dedicated Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable GateArray, FPGA), Digital Processor (Digital Signal Processor, DSP), embedded equipment, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable GateArray
  • DSP Digital Processor
  • the computer device may be a computing device such as a desktop computer, a notebook, a palmtop computer, and a cloud server.
  • the computer device can interact with the user through a keyboard, a mouse, a remote control, a touch panel, or a voice control device.
  • the memory 91 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static memory Random access memory (SRAM), read only memory (ROM), electrically erasable programmable read only memory (EEPROM), programmable read only memory (PROM), magnetic memory, magnetic disk, optical disk, etc.
  • the memory 91 may be an internal storage unit of the computer device 90, such as a hard disk or memory of the computer device 90.
  • the memory 91 may also be an external storage device of the computer device 90, for example, a plug-in hard disk equipped on the computer device 90, a smart memory card (Smart Media Card, SMC), and a secure digital (Secure Digital, SD) card, Flash Card, etc.
  • the memory 91 may also include both an internal storage unit of the computer device 90 and an external storage device thereof.
  • the memory 91 is generally used to store the operating system and various application software installed in the computer device 90, such as the program code of the OCT image-based retinal classification method.
  • the memory 91 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 92 may be a central processing unit (Central Processing Unit, CPU), a controller, a microcontroller, a microprocessor, or other data processing chips in some embodiments.
  • the processor 92 is generally used to control the overall operation of the computer device 90.
  • the processor 92 is configured to run the program code or process data stored in the memory 91, for example, run the program code of the OCT image-based retinal classification method.
  • the network interface 93 may include a wireless network interface or a wired network interface, and the network interface 93 is generally used to establish a communication connection between the computer device 90 and other electronic devices.
  • This application also provides another implementation manner, that is, to provide a non-volatile computer-readable storage medium, the non-volatile computer-readable storage medium stores a data feature information entry process, and the data feature
  • the information entry process can be executed by at least one processor, so that the at least one processor executes the steps of any of the above-mentioned OCT image-based retinal classification methods.
  • the above data features can also be stored in a node of a blockchain
  • the method of the above embodiments can be implemented by means of software plus the necessary general hardware platform. Of course, it can also be implemented by hardware, but in many cases the former is better. ⁇
  • the technical solution of this application essentially or the part that contributes to the existing technology can be embodied in the form of a software product, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, The optical disc) includes a number of instructions to enable a computer device (which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in the various embodiments of the present application.
  • a computer device which can be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.
  • the blockchain referred to in this application is a new application mode of computer technology such as distributed data storage, point-to-point transmission, consensus mechanism, and encryption algorithm.
  • Blockchain essentially a decentralized database, is a series of data blocks associated with cryptographic methods. Each data block contains a batch of network transaction information for verification. The validity of the information (anti-counterfeiting) and the generation of the next block.
  • the blockchain can include the underlying platform of the blockchain, the platform product service layer, and the application service layer.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Physics & Mathematics (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Multimedia (AREA)
  • Eye Examination Apparatus (AREA)

Abstract

一种基于OCT的视网膜分类方法、装置、计算机设备及存储介质,涉及人工智能,所述基于OCT的视网膜分类方法包括:从预设数据库中获取样本数据集,其中,样本数据集由q个训练样本构成,训练样本为GCC参数,q为大于1的正整数(S1);针对样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型(S2);从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数(S3);对待识别的GCC参数进行特征提取,得到y个数据特征, 其中,y为大于1的正整数(S4);将y个所述数据特征导入到视网膜分类模型中进行分类,输出待识别的GCC参数对应的分类结果(S5)。该方法还涉及区块链技术,所述数据特征可存储于区块链中。该方法可以提高GCC参数归类识别的准确性。

Description

基于OCT的视网膜分类方法、装置、计算机设备及存储介质
本申请要求于2020年5月29日提交中国专利局、申请号为202010475698.8,发明名称为“基于OCT的视网膜分类方法、装置、计算机设备及存储介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及人工智能,尤其涉及一种基于OCT的视网膜分类方法、装置、计算机设备及存储介质。
背景技术
目前,眼科疾病患者的常规检查主要是基于光学相干断层扫描仪(OCT),该设备可以安全的、非接触的获得检查者的眼底视网膜黄斑区GCC的参数值,
通过对GCC的参数值进行识别归类,有助于医生结合归类对视网膜进行诊断,提高诊断效率及准确性,但发明人意识到传统针对GCC的参数值进行识别归类的方式,主要是通过深度神经网络模型进行识别归类,而深度神经网络模型所提取的特征与医生诊断逻辑之间存在一定差异,导致深度神经网络模型识别归类的准确性不高,从而影响医生诊断的准确性,降低医生的工作效率。
发明内容
本申请实施例提供一种基于OCT的视网膜分类方法、装置、计算机设备及存储介质,以解决传统针对GCC参数进行识别归类的方法准确性不高,影响目标用户诊断的准确性以及降低工作效率的问题。
一种基于OCT的视网膜分类方法,包括:
从预设数据库中获取样本数据集,其中,所述样本数据集由q个训练样本构成,所述训练样本为GCC参数,q为大于1的正整数;
针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型;
从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数;
对所述待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数;
将y个所述数据特征导入到所述视网膜分类模型中进行分类,输出所述待识别的GCC参数对应的分类结果。
一种基于OCT的视网膜分类装置,包括:
第一获取模块,用于从预设数据库中获取样本数据集,其中,所述样本数据集由q个训练样本构成,所述训练样本为GCC参数,q为大于1的正整数;
构建模块,用于针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型;
第二获取模块,用于从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数;
特征提取模块,用于对所述待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数;
分类模块,用于将y个所述数据特征导入到所述视网膜分类模型中进行分类,输出所述待识别的GCC参数对应的分类结果。
一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,所述处理器执行所述计算机可读指令时实现下述基于OCT的视网膜分类方法的步骤:
从预设数据库中获取样本数据集,其中,所述样本数据集由q个训练样本构成,所述训练样本为GCC参数,q为大于1的正整数;
针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型;
从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数;
对所述待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数;
将y个所述数据特征导入到所述视网膜分类模型中进行分类,输出所述待识别的GCC参数对应的分类结果。
一种非易失性的计算机可读存储介质,所述非易失性的计算机可读存储介质存储有计算机可读指令,所述计算机可读指令被处理器执行时实现上述基于OCT的视网膜分类方法的步骤:
从预设数据库中获取样本数据集,其中,所述样本数据集由q个训练样本构成,所述训练样本为GCC参数,q为大于1的正整数;
针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型;
从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数;
对所述待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数;
将y个所述数据特征导入到所述视网膜分类模型中进行分类,输出所述待识别的GCC参数对应的分类结果。
上述基于OCT的视网膜分类方法、装置、计算机设备及存储介质,利用获取到的样本数据集进行决策树构建,得到视网膜分类模型,再获取用户经过OCT扫描得到的待识别的GCC参数,对待识别的GCC参数进行特征提取得到数据特征,最后将数据特征导入到视网膜分类模型中进行分类,得到待识别的GCC参数对应的分类结果。通过利用样本数据集进行决策树构建以得到视网膜分类模型的方式,能够利用与目标用户诊断逻辑相似的数据特征对视网膜分类模型进行训练,提高视网膜分类模型识别分类的准确性,保证分类结果的有效性,从而有利于提高目标用户根据分类结果进行诊断的准确性,进一步提高目标用户的工作效率。
附图说明
为了更清楚地说明本申请实施例的技术方案,下面将对本申请实施例的描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动性的前提下,还可以根据这些附图获得其他的附图。
图1是本申请实施例提供的基于OCT影像的视网膜分类的流程图;
图2是本申请实施例提供的基于OCT影像的视网膜分类中步骤S2的流程图;
图3是本申请实施例提供的基于OCT影像的视网膜分类中步骤S25的流程图;
图4是本申请实施例提供的基于OCT影像的视网膜分类中步骤S253的流程图;
图5是本申请实施例提供的基于OCT影像的视网膜分类中计算目标基尼指数并进行分 裂的流程图;
图6是本申请实施例提供的基于OCT影像的视网膜分类装置的示意图;
图7是本申请实施例提供的计算机设备的基本机构框图。
具体实施方式
下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请提供的基于OCT的视网膜分类方法应用于服务端,服务端具体可以用独立的服务器或者多个服务器组成的服务器集群实现。在一实施例中,如图1所示,提供一种基于OCT的视网膜分类方法,包括如下步骤:
S1:从预设数据库中获取样本数据集,其中,样本数据集由q个训练样本构成,训练样本为GCC参数,q为大于1的正整数。
在本申请实施例中,通过直接从预设数据库中获取样本数据集,其中,预设数据库是指专门用于存储样本数据集的数据库。
需要说明的是,样本数据集包含q个训练样本,训练样本为GCC参数,每个训练样本有其对应的分类特征,且分类特征主要为用户设定的病种类别。
进一步地,训练样本主要是由OCT设备扫描得到的GCC参数,GCC参数由5个GCC厚度对应的数据特征构成,分别为:All Avg、Sup Avg、Inf Avg、FLV、GLV。
S2:针对样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型。
在本申请实施例中,从样本数据集中随机抽取多个训练样本,具体可以采取随机采样的方式,该随机抽样为有放回的随机抽样,重复在样本数据集中进行K轮抽取,每一轮抽取的结果作为一个子训练集,得到K个子训练集,其中,K个子训练集之间相互独立,子训练集中可以存在重复的训练样本。
需要说明的是,抽取训练样本的数量具体可以根据历史经验进行获取,或者根据具体的业务需要进行抽取合适的训练样本,作为子训练集进行机器模型训练,虽然训练的样本数据越多越准确,但是训练成本也越高而且实现方式越难,其具体数量可以根据实际应用的需要进行抽取,此处不作限制。
进一步地,使用随机森林算法进行决策树构建,针对每一个子训练集构建一棵决策树,得到K棵决策树,再根据生成的K棵决策树构造随机森林,得到视网膜分类模型。
S3:从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数。
具体地,通过直接从预设用户库中获取用户经过OCT设备扫描得到的待识别的GCC参数,且在获取到待识别的GCC参数后,将待识别的GCC参数从预设用户库中进行删除处理。其中,预设用户库是指专门用于存储待识别的GCC参数的数据库。
需要说明的是,待识别的GCC参数包含不同的参数及参数对应的标识信息,标识信息主要为GCC厚度和非GCC厚度。
S4:对待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数。
在本申请实施例中,通过对待识别的GCC参数中参数对应的标识信息进行识别,若识别到标识信息为GCC厚度,则对该标识信息对应的参数进行提取,并将提取到的每个参数作为数据特征,最终提取y个数据特征;若识别到标识信息为非GCC厚度,则不做处理。
需要说明的是,待识别的GCC参数中具体可以包含9个非GCC厚度和5个GCC厚度,其中,5个GCC厚度对应的参数分别为All Avg、Sup Avg、Inf Avg、FLV、GLV。
进一步地,通过5个GCC厚度判断待识别的GCC参数对应的视网膜属于何种类型,也可以结合9个非GCC厚度及5个GCC厚度的结果来判断待识别的GCC对应的视网膜属于何种类型。
需要强调的是,为进一步保证上述数据特征的私密和安全性,上述数据特征还可以存储于一区块链的节点中。
S5:将y个数据特征导入到视网膜分类模型中进行分类,输出待识别的GCC参数对应的分类结果。
具体地,将y个数据特征导入到视网膜分类模型中,视网膜分类模型在接收到数据特征后将对数据特征进行分类,并输出数据特征对应的分类特征作为待识别的GCC参数对应的分类结果。
本实施例中,利用获取到的样本数据集进行决策树构建,得到视网膜分类模型,再获取用户经过OCT扫描得到的待识别的GCC参数,对待识别的GCC参数进行特征提取得到数据特征,最后将数据特征导入到视网膜分类模型中进行分类,得到待识别的GCC参数对应的分类结果。通过利用样本数据集进行决策树构建以得到视网膜分类模型的方式,能够利用与目标用户诊断逻辑相似的数据特征对视网膜分类模型进行训练,提高视网膜分类模型识别分类的准确性,保证分类结果的有效性,从而有利于提高目标用户根据分类结果进行诊断的准确性,进一步提高目标用户的工作效率。
在一实施例中,训练样本包含分类特征,如图2所示,步骤S2中,即针对样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型包括如下步骤:
S21:使用随机抽样的方式从样本数据集中抽取训练样本,构建K个子训练集,其中K为大于1的正整数。
在本申请实施例中,使用随机抽样的方式从样本数据集中抽取训练样本,随机采样的方式可以使用重采样技术从样本数据集中抽取训练样本,重采样技术是在样本数据集中进行有放回的抽样,样本数据集中每个训练样本每次被抽到的概率相等,重复在样本数据集中进行K轮抽取,每一轮抽取的结果作为一个子训练集,得到K个子训练集,其中,子训练集中的训练样本数量小于或等于样本数据集中的训练样本数量。
S22:针对每个子训练集,按照公式(1)计算每个分类特征的信息熵:
H(X)=-∑p(x i)log(2,p(x i))  公式(1)
其中,X为分类特征,H(X)为分类特征的信息熵,i=1,2,...,n,x i为第i个分类特征,p(x i)为第i个分类特征的特征值概率。
S23:根据信息熵,按照公式(2)计算每个分类特征的信息增益:
gain=H(c)-H(c|X)   公式(2)
其中,gain为分类特征的信息增益,H(c)为按照分类特征X进行分裂之前的信息熵,H(c|X)为按照分类特征X分裂之后的信息熵。
S24:根据信息增益,按照公式(3)和公式(4)计算每个分类特征的信息增益比:
Figure PCTCN2020099518-appb-000001
Figure PCTCN2020099518-appb-000002
其中,IntI为分类特征的惩罚因子,D为样本数据集中训练样本的总量,W X为分类特征的训练样本数量,gr为分类特征的信息增益比。
具体地,通过先利用公式(4)计算出分类特征对应的惩罚因子,再采用公式(3)计 算分类特征的信息增益比,即分类特征的信息增益比=分类特征的信息增益/分类特征的惩罚因子。
S25:选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用分裂节点进行分裂。
在本申请实施例中,使用C4.5算法进行构建决策树,根据公式(4)计算得到分类特征的惩罚因子,使用公式(3)计算每个分类特征的信息增益比,并按照最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用分裂节点进行分裂。
需要说明的是,若按照信息增益作为分裂节点进行分裂,决策树的构建倾向于选择信息增益较大的分类特征作为分裂节点,分类特征的信息增益会比较大,但是对于训练集中存在多个分类特征并且有多种取值的情况下,训练得到的决策树的预测准确率较低,而根据分类特征的惩罚因子计算信息增益比,按照最大的信息增益比对应的分类特征作为分裂节点进行分裂,能够有效的规避分布均匀的属性对决策树分裂产生的不利影响,提高决策树构建的质量。
S26:针对待分裂节点对应的分类特征,返回步骤S22继续执行,直到所有分类特征均作为分裂节点完成分裂为止,得到K棵决策树。
在本申请实施例中,针对待分裂节点对应的分类特征,返回步骤S22提及的针对每个子训练集,计算分类特征的信息熵处继续执行,直到所有分类特征均作为分裂节点完成分裂为止,分裂成决策树的多个分支,以递归方式建立K棵决策树。
S27:根据K棵决策树构造随机森林,得到视网膜分类模型。
具体地,根据步骤S22至步骤S26生成的K棵决策树,将K棵决策树组合成为随机森林,得到视网膜分类模型,用于评估GCC参数对应的视网膜属于何种类型。
本实施例中,通过使用有放回的随机抽样的方式从样本数据集集中抽取训练样本,构建多个子训练集,用于进行机器模型训练,增强用于模型训练的数据的不确定性,提高数据特征分类质量;针对每个子训练集,计算分类特征的信息增益比,每次选取最大的信息增益比对应的分类特征作为分裂节点进行分裂,直到所有分类特征均作为分裂节点完成分裂为止,得到K棵决策树,根据生成的多棵决策树构造随机森林,得到视网膜分类模型,使用最大的信息增益比作为分裂节点,能够有效的规避分布均匀的分类特征对决策树分裂产生的不利影响,提高决策树构建的质量,并且由多棵决策树构造随机森林,使得机器模型的分类预测能力增强,提高视网膜分类模型的准确性,从而有利于提高目标用户根据视网膜分类模型获取分类结果进行诊断的准确性,进一步提高目标用户的工作效率。
在一实施例中,如图3所示,S25中,选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用分裂节点进行分裂包括如下步骤:
S251:选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点。
具体地,选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点。
S252:利用基尼指数公式计算分裂节点的基尼指数。
具体地,利用公式(5)计算分裂节点的基尼指数:
Figure PCTCN2020099518-appb-000003
其中,G(p)为基尼指数,e为分裂节点对应的预设分类条件,pk为特定分组中相同输入类别所占的比例。
S253:将基尼指数与预设指数进行比较,并根据比较结果进行分裂。
具体地,将基尼指数与预设指数进行比较,并将比较结果与预设规则库中的描述信息进行比较,选取与描述信息相匹配的设定规则进行分裂。其中,预设规则库是指专门用于存储不同的描述信息及描述信息对应的设定规则的数据库。
例如,存在比较结果为基尼指数小于等于预设指数,预设规则库中存在描述信息为基尼指数小于等于预设指数,其对应的设定规则为A规则;存在描述信息为基尼指数大于预设指数,其对应的设定规则为B规则;通过将比较结果与描述信息进行比较,选取A规则进行分裂。
本实施例中,通过选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,再利用公式(5)计算分裂节点对应的基尼指数,最后将基尼指数与预设指数进行比较,并根据比较结果进行分裂。通过结合基尼指数计算的方式,能够在得到所有决策树的情况下,再一次利用基尼指数进一步对部分决策树进行分裂,提高决策树的精确度,进而提高保证后续视网膜分类模型训练的准确性。
在一实施例中,如图4所示,步骤S253中,即将基尼指数与预设指数进行比较,并根据比较结果确定决策树包括如下步骤:
S2531:将基尼指数与预设指数进行比较。
具体地,将基尼指数与预设指数进行比较。
S2532:若基尼指数小于等于预设指数,则不在进行分裂。
在本申请实施例中,根据步骤S2531的比较方式,若基尼指数小于等于预设指数,则表示该基尼指数对应的分裂节点的分类效果好,不在进行分裂。
S2533:若基尼指数大于预设指数,则利用预设分类条件对分裂节点进行分裂,直到达到预设截止条件为止,停止分裂。
在本申请实施例中,根据步骤S2531的比较方式,若基尼指数大于预设指数,表示该基尼指数对应的分裂节点分类效果差,则利用预设分类条件对该分裂节点进行分裂,直到每个分裂后每个节点对应的基尼指数小于等于预设指数或达到预设分裂次数时,分裂结束。
其中,预设分类条件是指根据用户实际需求设定对样本数据集进行分类的条件。
预设指数具体可以是0.2,也可以根据用户实际需求进行设置,此处不做限制。
预设分裂次数是指用户设定停止分裂节点进行分裂的次数。
例如,存在决策树的某个分裂节点为flv<5则分类为U,否则分类为R。如果有100个训练样本根据该分裂节点被分为U,这100个训练样本的标签都是U,那么pk=1,基尼指数就为0,说明该节点分类效果很好,那么就确定了flv<5为该决策树的分裂节点。如果有100个样本根据该分裂节点被分为U,但这100个训练样本中只有50个的标签是U,那么pk=0.5,基尼指数就比较大,说明该分裂节点分类效果很差,则利用预设分类条件对该分裂节点进行分裂。
本实施例中,通过将基尼指数与预设指数进行比较,在基尼指数小于等于预设指数的情况下,不在进行分裂;在基尼指数大于预设指数的情况下,利用预设分类条件对分裂节点进行分类,直到达到预设截止条件为止,停止分裂。在不同的比较结果下确定分裂节点是否需要进一步分裂,能够有效避免存在计算失误导致分裂不准确的情况,从而分裂过程的准确性,进一步保证后续视网膜分类模型训练的准确性。
在一实施例中,如图5所示,S26之后,该基于OCT的视网膜分类还包括如下步骤:
S6:将所有决策树对应的基尼指数按照从小到大的顺序进行排序,得到排序结果。
在本申请实施例中,将所有决策树对应的基尼指数按照从小到大的顺序进行排序,即将最小的基尼指数作为第一位,将最大的基尼指数作为最后一位,得到对应的排序结果。
S7:从排序结果中选取排序前a位和排序后b位的基尼指数分别进行权重计算,得到目标基尼指数,其中a和b均为大于1的正整数。
在本申请实施例中,排序前a位是指在步骤S6得到的排序结果中排序第一至第a,排 序后b位是指在步骤S6得到的排序结果中排序最后至倒数第b。
具体地,根据步骤S6得到的排序结果,选取排序前a位的基尼指数作为第一基尼指数,选取排序前a位对应的基尼指数作为第一基尼指数,选取排序后b位对应的基尼指数作为第二基尼指数,根据预设第一权重,对每个第一基尼指数进行加倍计算,并将加倍计算后的结果作为目标基尼指数;根据预设第二权重,对每个第二基尼指数进行减半计算,并将减半计算后的结果作为目标基尼指数。
需要说明的是,将排名前a的基尼指数加倍,排名后b的基尼指数减半,能够提高分类的精度,其中,a和b均为大于1的正整数,且存在a与b相同的情况,其具体取值可根据用户实际需求进行设置,此处不做限制。
当一个基尼指数为排名前a的特征时,计算基尼指数时将计算的基尼指数加倍,也就是基尼指数值乘以预设第一权重,例如,原基尼指数值为1,预设第一权重为2,加倍后基尼指数值为2。也就是说重要特征分类错误的代价更大,需要减少重要特征分类错误的情况。
当一个基尼指数为排名后b的特征时,计算基尼指数时将计算的基尼指数减半,也就是基尼指数值乘以预设第二权重,例如,原基尼指数值为1,预设第二权重为0.5,减半后基尼指数值为0.5。也就是说不重要特征分类错误的代价较小,不需要关注不重要的特征分类错误的情况。
S8:根据目标基尼指数对排序前a位和排序后b位的基尼指数对应的决策树进行分裂,得到分裂后的决策树。
具体地,根据步骤S7得到的目标基尼指数,将目标基尼指数与预设指数进行比较,若目标基尼指数小于等于预设指数,则获取该目标基尼指数对应的决策树;若目标基尼指数大于预设指数,则利用预设分类条件对该目标基尼指数对应的决策树进行分裂,直到每个决策树对应的目标基尼指数小于等于预设指数或达到预设目标分裂次数时,分裂结束,并获取分裂结束后的决策树。
本申请实施例中,将所有决策树对应的基尼指数按照从小到大的顺序进行排序以得到排序结果,并从中选取排序前a位和排序后b位的基尼指数分别进行权重计算,得到目标基尼指数,最后根据目标基尼指数对排序前a位和排序后b位的基尼指数对应的决策树进行分裂,得到分裂后的决策树。通过计算目标基尼指数并利用目标基尼指数对决策树进行分裂的方式,分类特征做进一步优化,增加重要分类特征的分析计算,减少不重要分类特征的分析计算,从而提高决策树的准确率,进一步保证后续视网膜分类模型训练的准确性。
应理解,上述实施例中各步骤的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在一实施例中,提供一种基于OCT影像的视网膜分类装置,该基于OCT影像的视网膜分类装置与上述实施例中基于OCT影像的视网膜分类方法一一对应。如图6所示,该基于OCT影像的视网膜分类装置包括第一获取模块61,构建模块62,第二获取模块63,特征提取模块64和分类模块65。各功能模块详细说明如下:
第一获取模块61,用于从预设数据库中获取样本数据集,其中,样本数据集由q个训练样本构成,训练样本为GCC参数,q为大于1的正整数;
构建模块62,用于针对样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型;
第二获取模块63,用于从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数;
特征提取模块64,用于对待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数,需要强调的是,为进一步保证上述数据特征的私密和安全性,上述数据特征还可以存储于一区块链的节点中;
分类模块65,用于将y个数据特征导入到视网膜分类模型中进行分类,输出待识别的GCC参数对应的分类结果。
进一步地,构建模块62包括:
子训练集构建子模块,用于使用随机抽样的方式从样本数据集中抽取训练样本,构建K个子训练集,其中K为大于1的正整数;
信息熵计算子模块,用于针对每个子训练集,按照公式(1)计算每个分类特征的信息熵:
H(X)=-∑p(x i)log(2,p(x i))   公式(1)
其中,X为分类特征,H(X)为分类特征的信息熵,i=1,2,...,n,x i为第i个分类特征,p(x i)为第i个分类特征的特征值概率;
信息增益计算子模块,用于根据信息熵,按照公式(2)计算每个分类特征的信息增益:
gain=H(c)-H(c|X)   公式(2)
其中,gain为分类特征的信息增益,H(c)为按照分类特征X进行分裂之前的信息熵,H(c|X)为按照分类特征X分裂之后的信息熵;
信息增益比计算子模块,用于根据信息增益,按照公式(3)和公式(4)计算每个分类特征的信息增益比:
Figure PCTCN2020099518-appb-000004
Figure PCTCN2020099518-appb-000005
其中,IntI为分类特征的惩罚因子,D为样本数据集中训练样本的总量,W X为分类特征的训练样本数量,gr为分类特征的信息增益比;
分裂节点选取子模块,用于选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用分裂节点进行分裂;
决策树生成子模块,用于针对待分裂节点对应的分类特征,返回步骤S22继续执行,直到所有分类特征均作为分裂节点完成分裂为止,得到K棵决策树;
视网膜分类模型构建子模块,用于根据K棵决策树构造随机森林,得到视网膜分类模型。
进一步地,分裂节点选取子模块包括:
分裂节点确定单元,用于选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点;
基尼指数计算单元,用于利用基尼指数公式计算分裂节点的基尼指数;
分裂单元,用于将基尼指数与预设指数进行比较,并根据比较结果进行分裂。
进一步地,分裂单元包括:
比较子单元,用于将基尼指数与预设指数进行比较;
第一比较子单元,用于若基尼指数小于等于预设指数,则不在进行分裂;
第二比较子单元,用于若基尼指数大于预设指数,则利用预设分类条件对分裂节点进行分裂,直到达到预设截止条件为止,停止分裂。
进一步地,该基于OCT影像的视网膜分类装置还包括:
排序模块,用于将所有决策树对应的基尼指数按照从小到大的顺序进行排序,得到排序结果;
权重计算模块,用于从排序结果中选取排序前a位和排序后b位的基尼指数分别进行 权重计算,得到目标基尼指数,其中a和b均为大于1的正整数;
二次分裂模块,用于根据目标基尼指数对排序前a位和排序后b位的基尼指数对应的决策树进行分裂,得到分裂后的决策树。
本申请的一些实施例公开了计算机设备。具体请参阅图7,为本申请的一实施例中计算机设备90基本结构框图。
如图7中所示意的,所述计算机设备90包括通过系统总线相互通信连接存储器91、处理器92、网络接口93。需要指出的是,图7中仅示出了具有组件91-93的计算机设备90,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable GateArray,FPGA)、数字处理器(Digital Signal Processor,DSP)、嵌入式设备等。
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。
所述存储器91至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器91可以是所述计算机设备90的内部存储单元,例如该计算机设备90的硬盘或内存。在另一些实施例中,所述存储器91也可以是所述计算机设备90的外部存储设备,例如该计算机设备90上配备的插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)等。当然,所述存储器91还可以既包括所述计算机设备90的内部存储单元也包括其外部存储设备。本实施例中,所述存储器91通常用于存储安装于所述计算机设备90的操作系统和各类应用软件,例如所述基于OCT影像的视网膜分类方法的程序代码等。此外,所述存储器91还可以用于暂时地存储已经输出或者将要输出的各类数据。
所述处理器92在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器92通常用于控制所述计算机设备90的总体操作。本实施例中,所述处理器92用于运行所述存储器91中存储的程序代码或者处理数据,例如运行所述基于OCT影像的视网膜分类方法的程序代码。
所述网络接口93可包括无线网络接口或有线网络接口,该网络接口93通常用于在所述计算机设备90与其他电子设备之间建立通信连接。
本申请还提供了另一种实施方式,即提供一种非易失性的计算机可读存储介质,所述非易失性的计算机可读存储介质存储有数据特征信息录入流程,所述数据特征信息录入流程可被至少一个处理器执行,以使所述至少一个处理器执行上述任意一种基于OCT影像的视网膜分类方法的步骤。
需要强调的是,为进一步保证上述数据特征的私密和安全性,上述数据特征还可以存储于一区块链的节点中
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台计算机设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。
本申请所指区块链是分布式数据存储、点对点传输、共识机制、加密算法等计算机技术的新型应用模式。区块链(Blockchain),本质上是一个去中心化的数据库,是一串使用密码学方法相关联产生的数据块,每一个数据块中包含了一批次网络交易的信息,用于验证其信息的有效性(防伪)和生成下一个区块。区块链可以包括区块链底层平台、平台产品服务层以及应用服务层等。
最后应说明的是,显然以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。

Claims (20)

  1. 一种基于OCT的视网膜分类方法,其中,所述基于OCT的视网膜分类方法包括:
    从预设数据库中获取样本数据集,其中,所述样本数据集由q个训练样本构成,所述训练样本为GCC参数,q为大于1的正整数;
    针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型;
    从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数;
    对所述待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数;
    将y个所述数据特征导入到所述视网膜分类模型中进行分类,输出所述待识别的GCC参数对应的分类结果。
  2. 如权利要求1所述的基于OCT的视网膜分类方法,其中,所述训练样本包含所述分类特征,所述针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型的步骤包括:
    使用随机抽样的方式从所述样本数据集中抽取所述训练样本,构建K个子训练集,其中K为大于1的正整数;
    针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵:
    H(X)=-Σp(x i)log(2,p(x i))
    其中,X为所述分类特征,H(X)为所述分类特征的信息熵,i=1,2,...,n,xi为第i个所述分类特征,p(x i)为第i个所述分类特征的特征值概率;
    根据所述信息熵,按照如下公式计算每个所述分类特征的信息增益:
    gain=H(c)-H(c|X)
    其中,gain为所述分类特征的信息增益,H(c)为按照分类特征X进行分裂之前的信息熵,H(c|X)为按照所述分类特征X分裂之后的信息熵;
    根据所述信息增益,按照如下公式计算每个所述分类特征的信息增益比:
    Figure PCTCN2020099518-appb-100001
    Figure PCTCN2020099518-appb-100002
    其中,IntI为分类特征的惩罚因子,D为所述样本数据集中训练样本的总量,W X为分类特征的训练样本数量,gr为所述分类特征的信息增益比;
    选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用所述分裂节点进行分裂;
    针对所述待分裂节点对应的分类特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵的步骤继续执行,直到所有所述分类特征均作为所述分裂节点完成分裂为止,得到K棵决策树;
    根据所述K棵决策树构造随机森林,得到视网膜分类模型。
  3. 如权利要求2所述的基于OCT的视网膜分类方法,其中,所述选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用所述分裂节点进行分裂进行分裂的步骤包括:
    选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点;
    利用基尼指数公式计算所述分裂节点的基尼指数;
    将所述基尼指数与预设指数进行比较,并根据比较结果进行分裂。
  4. 如权利要求3所述的基于OCT的视网膜分类方法,其中,所述将所述基尼指数与预设指数进行比较,并根据比较结果确定所述决策树的步骤包括:
    将所述基尼指数与预设指数进行比较;
    若所述基尼指数小于等于预设指数,则不在进行分裂;
    若所述基尼指数大于预设指数,则利用预设分类条件对所述分裂节点进行分裂,直到达到预设截止条件为止,停止分裂。
  5. 如权利要求2所述的基于OCT影像的视网膜分类方法,其中,所述针对所述待分裂节点对应的分类特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵的步骤继续执行,直到所有所述分类特征均作为所述分裂节点完成分裂为止,得到K棵决策树的步骤之后,所述基于OCT的视网膜分类方法还包括:
    将所有所述决策树对应的基尼指数按照从小到大的顺序进行排序,得到排序结果;
    从所述排序结果中选取排序前a位和排序后b位的所述基尼指数分别进行权重计算,得到目标基尼指数,其中a和b均为大于1的正整数;
    根据所述目标基尼指数对排序前a位和排序后b位的所述基尼指数对应的决策树进行分裂,得到分裂后的决策树。
  6. 一种基于OCT影像的视网膜分类装置,其中,所述基于OCT影像的视网膜分类装置包括:
    第一获取模块,用于从预设数据库中获取样本数据集,其中,所述样本数据集由q个训练样本构成,所述训练样本为GCC参数,q为大于1的正整数;
    构建模块,用于针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型;
    第二获取模块,用于从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数;
    特征提取模块,用于对所述待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数;
    分类模块,用于将y个所述数据特征导入到所述视网膜分类模型中进行分类,输出所述待识别的GCC参数对应的分类结果。
  7. 如权利要求6所述的基于OCT影像的视网膜分类装置,其中,所述构建模块包括:
    子训练集构建子模块,用于使用随机抽样的方式从所述样本数据集中抽取所述训练样本,构建K个子训练集,其中K为大于1的正整数;
    信息熵计算子模块,用于针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵:
    H(X)=-Σp(x i)log(2,p(x i))
    其中,X为所述分类特征,H(X)为所述分类特征的信息熵,i=1,2,...,n,x i为第i个所述分类特征,p(x i)为第i个所述分类特征的特征值概率;
    信息增益计算子模块,用于根据所述信息熵,按照如下公式计算每个所述分类特征的信息增益:
    gain=H(c)-H(c|X)
    其中,gain为所述分类特征的信息增益,H(c)为按照分类特征X进行分裂之前的信息熵,H(c|X)为按照所述分类特征X分裂之后的信息熵;
    信息增益比计算子模块,用于根据所述信息增益,按照如下公式计算每个所述分类特征的信息增益比:
    Figure PCTCN2020099518-appb-100003
    Figure PCTCN2020099518-appb-100004
    其中,IntI为分类特征的惩罚因子,D为所述样本数据集中训练样本的总量,W X为分类特征的训练样本数量,gr为所述分类特征的信息增益比;
    分裂节点选取子模块,用于选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用所述分裂节点进行分裂;
    决策树生成子模块,用于针对所述待分裂节点对应的分类特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵的步骤继续执行,直到所有所述分类特征均作为所述分裂节点完成分裂为止,得到K棵决策树;
    视网膜分类模型构建子模块,用于根据所述K棵决策树构造随机森林,得到视网膜分类模型。
  8. 如权利要求7所述的基于OCT影像的视网膜分类装置,其中,所述分裂节点选取子模块包括:
    分裂节点确定单元,用于选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点;
    基尼指数计算单元,用于利用基尼指数公式计算所述分裂节点的基尼指数;
    分裂单元,用于将所述基尼指数与预设指数进行比较,并根据比较结果进行分裂。
  9. 如权利要求8所述的基于OCT影像的视网膜分类装置,其中,所述分裂单元包括:
    比较子单元,用于将所述基尼指数与预设指数进行比较;
    第一比较子单元,用于若所述基尼指数小于等于预设指数,则不在进行分裂;
    第二比较子单元,用于若所述基尼指数大于预设指数,则利用预设分类条件对所述分裂节点进行分裂,直到达到预设截止条件为止,停止分裂。
  10. 如权利要求7所述的基于OCT影像的视网膜分类装置,其中,还包括:
    排序模块,用于将所有所述决策树对应的基尼指数按照从小到大的顺序进行排序,得到排序结果;
    权重计算模块,用于从所述排序结果中选取排序前a位和排序后b位的所述基尼指数分别进行权重计算,得到目标基尼指数,其中a和b均为大于1的正整数;
    二次分裂模块,用于根据所述目标基尼指数对排序前a位和排序后b位的所述基尼指数对应的决策树进行分裂,得到分裂后的决策树。
  11. 一种计算机设备,包括存储器、处理器以及存储在所述存储器中并可在所述处理器上运行的计算机可读指令,其中,所述处理器执行所述计算机可读指令时实现如下步骤:
    从预设数据库中获取样本数据集,其中,所述样本数据集由q个训练样本构成,所述训练样本为GCC参数,q为大于1的正整数;
    针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型;
    从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数;
    对所述待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数;
    将y个所述数据特征导入到所述视网膜分类模型中进行分类,输出所述待识别的GCC参数对应的分类结果。
  12. 如权利要求11所述的计算机设备,其中,所述训练样本包含所述分类特征,所述针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型的步骤包括:
    使用随机抽样的方式从所述样本数据集中抽取所述训练样本,构建K个子训练集,其中K为大于1的正整数;
    针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵:
    H(X)=-Σp(x i)log(2,p(x i))
    其中,X为所述分类特征,H(X)为所述分类特征的信息熵,i=1,2,...,n,x i为第i个所述分类特征,p(x i)为第i个所述分类特征的特征值概率;
    根据所述信息熵,按照如下公式计算每个所述分类特征的信息增益:
    gain=H(c)-H(c|X)
    其中,gain为所述分类特征的信息增益,H(c)为按照分类特征X进行分裂之前的信息熵,H(c|X)为按照所述分类特征X分裂之后的信息熵;
    根据所述信息增益,按照如下公式计算每个所述分类特征的信息增益比:
    Figure PCTCN2020099518-appb-100005
    Figure PCTCN2020099518-appb-100006
    其中,IntI为分类特征的惩罚因子,D为所述样本数据集中训练样本的总量,W X为分类特征的训练样本数量,gr为所述分类特征的信息增益比;
    选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用所述分裂节点进行分裂;
    针对所述待分裂节点对应的分类特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵的步骤继续执行,直到所有所述分类特征均作为所述分裂节点完成分裂为止,得到K棵决策树;
    根据所述K棵决策树构造随机森林,得到视网膜分类模型。
  13. 如权利要求12所述的计算机设备,其中,所述选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用所述分裂节点进行分裂进行分裂的步骤包括:
    选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点;
    利用基尼指数公式计算所述分裂节点的基尼指数;
    将所述基尼指数与预设指数进行比较,并根据比较结果进行分裂。
  14. 如权利要求13所述的计算机设备,其中,所述将所述基尼指数与预设指数进行比较,并根据比较结果确定所述决策树的步骤包括:
    将所述基尼指数与预设指数进行比较;
    若所述基尼指数小于等于预设指数,则不在进行分裂;
    若所述基尼指数大于预设指数,则利用预设分类条件对所述分裂节点进行分裂,直到达到预设截止条件为止,停止分裂。
  15. 如权利要求12所述的计算机设备,其中,所述针对所述待分裂节点对应的分类特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵的步骤继续执行,直到所有所述分类特征均作为所述分裂节点完成分裂为止,得到K棵决策树的步骤之后,所述处理器执行所述计算机可读指令时还包括实现如下步骤:
    将所有所述决策树对应的基尼指数按照从小到大的顺序进行排序,得到排序结果;
    从所述排序结果中选取排序前a位和排序后b位的所述基尼指数分别进行权重计算,得到目标基尼指数,其中a和b均为大于1的正整数;
    根据所述目标基尼指数对排序前a位和排序后b位的所述基尼指数对应的决策树进行分裂,得到分裂后的决策树。
  16. 一种非易失性的计算机可读存储介质,所述非易失性的计算机可读存储介质存储有计算机可读指令,其中,所述计算机可读指令被一种处理器执行时使得所述一种处理器执行如下步骤:
    从预设数据库中获取样本数据集,其中,所述样本数据集由q个训练样本构成,所述训练样本为GCC参数,q为大于1的正整数;
    针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型;
    从预设用户库中获取用户经过OCT扫描得到的待识别的GCC参数;
    对所述待识别的GCC参数进行特征提取,得到y个数据特征,其中,y为大于1的正整数;
    将y个所述数据特征导入到所述视网膜分类模型中进行分类,输出所述待识别的GCC参数对应的分类结果。
  17. 如权利要求16所述的非易失性的计算机可读存储介质,其中,所述训练样本包含所述分类特征,所述针对所述样本数据集中的训练样本,使用随机森林算法进行决策树构建,得到视网膜分类模型的步骤包括:
    使用随机抽样的方式从所述样本数据集中抽取所述训练样本,构建K个子训练集,其中K为大于1的正整数;
    针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵:
    H(X)=-Σp(x i)log(2,p(x i))
    其中,X为所述分类特征,H(X)为所述分类特征的信息熵,i=1,2,...,n,x i为第i个所述分类特征,p(x i)为第i个所述分类特征的特征值概率;
    根据所述信息熵,按照如下公式计算每个所述分类特征的信息增益:
    gain=H(c)-H(c|X)
    其中,gain为所述分类特征的信息增益,H(c)为按照分类特征X进行分裂之前的信息熵,H(c|X)为按照所述分类特征X分裂之后的信息熵;
    根据所述信息增益,按照如下公式计算每个所述分类特征的信息增益比:
    Figure PCTCN2020099518-appb-100007
    Figure PCTCN2020099518-appb-100008
    其中,IntI为分类特征的惩罚因子,D为所述样本数据集中训练样本的总量,W X为分类特征的训练样本数量,gr为所述分类特征的信息增益比;
    选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用所述分裂节点进行分裂;
    针对所述待分裂节点对应的分类特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵的步骤继续执行,直到所有所述分类特征均作为所述分裂节点完成分裂为止,得到K棵决策树;
    根据所述K棵决策树构造随机森林,得到视网膜分类模型。
  18. 如权利要求17所述的非易失性的计算机可读存储介质,其中,所述选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点,采用所述分裂节点进行分裂进行分裂的步骤包括:
    选取最大的信息增益比对应的分类特征作为分裂节点,将其他信息增益比对应的分类特征作为待分裂节点;
    利用基尼指数公式计算所述分裂节点的基尼指数;
    将所述基尼指数与预设指数进行比较,并根据比较结果进行分裂。
  19. 如权利要求18所述的非易失性的计算机可读存储介质,其中,所述将所述基尼指数与预设指数进行比较,并根据比较结果确定所述决策树的步骤包括:
    将所述基尼指数与预设指数进行比较;
    若所述基尼指数小于等于预设指数,则不在进行分裂;
    若所述基尼指数大于预设指数,则利用预设分类条件对所述分裂节点进行分裂,直到达到预设截止条件为止,停止分裂。
  20. 如权利要求17所述的非易失性的计算机可读存储介质,其中,所述针对所述待分裂节点对应的分类特征,返回所述针对每个所述子训练集,按照如下公式计算每个所述分类特征的信息熵的步骤继续执行,直到所有所述分类特征均作为所述分裂节点完成分裂为止,得到K棵决策树的步骤之后,所述计算机可读指令被一种处理器执行时,使得所述一种处理器还执行如下步骤:
    将所有所述决策树对应的基尼指数按照从小到大的顺序进行排序,得到排序结果;
    从所述排序结果中选取排序前a位和排序后b位的所述基尼指数分别进行权重计算,得到目标基尼指数,其中a和b均为大于1的正整数;
    根据所述目标基尼指数对排序前a位和排序后b位的所述基尼指数对应的决策树进行分裂,得到分裂后的决策树。
PCT/CN2020/099518 2020-05-29 2020-06-30 基于oct的视网膜分类方法、装置、计算机设备及存储介质 WO2021120587A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010475698.8A CN111783830A (zh) 2020-05-29 2020-05-29 基于oct的视网膜分类方法、装置、计算机设备及存储介质
CN202010475698.8 2020-05-29

Publications (1)

Publication Number Publication Date
WO2021120587A1 true WO2021120587A1 (zh) 2021-06-24

Family

ID=72754073

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/099518 WO2021120587A1 (zh) 2020-05-29 2020-06-30 基于oct的视网膜分类方法、装置、计算机设备及存储介质

Country Status (2)

Country Link
CN (1) CN111783830A (zh)
WO (1) WO2021120587A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642599A (zh) * 2021-06-28 2021-11-12 中国铁道科学研究院集团有限公司 收入预测方法、运输系统以及电子设备

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114677751B (zh) * 2022-05-26 2022-09-09 深圳市中文路教育科技有限公司 学习状态的监控方法、监控装置及存储介质
CN116910669A (zh) * 2023-09-13 2023-10-20 深圳市智慧城市科技发展集团有限公司 数据分类方法、装置、电子设备及可读存储介质

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363226A (zh) * 2019-06-21 2019-10-22 平安科技(深圳)有限公司 基于随机森林的眼科病种分类识别方法、装置及介质

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108665159A (zh) * 2018-05-09 2018-10-16 深圳壹账通智能科技有限公司 一种风险评估方法、装置、终端设备及存储介质
CN110717524B (zh) * 2019-09-20 2021-04-06 浙江工业大学 一种老年人热舒适预测方法

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110363226A (zh) * 2019-06-21 2019-10-22 平安科技(深圳)有限公司 基于随机森林的眼科病种分类识别方法、装置及介质

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
ASAOKA RYO; HIRASAWA KAZUNORI; IWASE AIKO; FUJINO YURI; MURATA HIROSHI; SHOJI NOBUYUKI; ARAIE MAKOTO: "Validating the Usefulness of the "Random Forests" Classifier to Diagnose Early Glaucoma With Optical Coherence Tomography", AMERICAN JOURNAL OF OPHTHALMOLOGY, vol. 174, 9 November 2016 (2016-11-09), AMSTERDAM, NL, pages 95 - 103, XP029890576, ISSN: 0002-9394, DOI: 10.1016/j.ajo.2016.11.001 *
SINGH SONIA, GUPTA PRIYANKA: "COMPARATIVE STUDY ID3, CART AND C4.5 DECISION TREE ALGORITHM: A SURVEY", INTERNATIONAL JOURNAL OF ADVANCED INFORMATION SCIENCE AND TECHNOLOGY, vol. 27, no. 27, 31 July 2014 (2014-07-31), pages 97 - 103, XP055822892 *
SUGIMOTO KOICHIRO, MURATA HIROSHI, HIRASAWA HIROYO, AIHARA MAKOTO, MAYAMA CHIHIRO, ASAOKA RYO: "Cross-sectional study: Does combining optical coherence tomography measurements using the ‘Random Forest’ decision tree classifier improve the prediction of the presence of perimetric deterioration in glaucoma suspects?", BMJ OPEN, vol. 3, no. 10, 1 October 2013 (2013-10-01), pages e003114, XP055822890, ISSN: 2044-6055, DOI: 10.1136/bmjopen-2013-003114 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113642599A (zh) * 2021-06-28 2021-11-12 中国铁道科学研究院集团有限公司 收入预测方法、运输系统以及电子设备

Also Published As

Publication number Publication date
CN111783830A (zh) 2020-10-16

Similar Documents

Publication Publication Date Title
US11816078B2 (en) Automatic entity resolution with rules detection and generation system
WO2021120587A1 (zh) 基于oct的视网膜分类方法、装置、计算机设备及存储介质
US11327975B2 (en) Methods and systems for improved entity recognition and insights
WO2020181805A1 (zh) 糖尿病的预测方法及装置、存储介质、计算机设备
CN111612041A (zh) 异常用户识别方法及装置、存储介质、电子设备
US20090287503A1 (en) Analysis of individual and group healthcare data in order to provide real time healthcare recommendations
WO2021217867A1 (zh) 基于XGBoost的数据分类方法、装置、计算机设备及存储介质
WO2021135449A1 (zh) 基于深度强化学习的数据分类方法、装置、设备及介质
WO2021151327A1 (zh) 分诊数据处理方法、装置、设备及介质
CN105740808B (zh) 人脸识别方法和装置
CN111785384A (zh) 基于人工智能的异常数据识别方法及相关设备
CN109886334A (zh) 一种隐私保护的共享近邻密度峰聚类方法
CN111784040A (zh) 政策模拟分析的优化方法、装置及计算机设备
CN111883251A (zh) 医疗误诊检测方法、装置、电子设备及存储介质
WO2021114818A1 (zh) 基于傅里叶变换的oct图像质量评估方法、系统及装置
CN108268886A (zh) 用于识别外挂操作的方法及系统
CN116259415A (zh) 一种基于机器学习的患者服药依从性预测方法
CN111639077A (zh) 数据治理方法、装置、电子设备、存储介质
US20210357808A1 (en) Machine learning model generation system and machine learning model generation method
WO2021174923A1 (zh) 概念词序列生成方法、装置、计算机设备及存储介质
CN113707304A (zh) 分诊数据处理方法、装置、设备及存储介质
WO2023083051A1 (zh) 生物特征识别方法、装置、设备和存储介质
CN110414562B (zh) X光片的分类方法、装置、终端及存储介质
WO2021114626A1 (zh) 一种病历数据的质量检测方法和相关装置
CN116150690A (zh) DRGs决策树构建方法及装置、电子设备、存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20902195

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20902195

Country of ref document: EP

Kind code of ref document: A1