CN109145965A

CN109145965A - Cell recognition method and device based on random forest disaggregated model

Info

Publication number: CN109145965A
Application number: CN201810872456.5A
Authority: CN
Inventors: 郏东耀; 李玉娟; 曾强; 庄重
Original assignee: Shenzhen Brilliant Yaoqiang Technology Co Ltd
Current assignee: Shenzhen Brilliant Yaoqiang Technology Co Ltd
Priority date: 2018-08-02
Filing date: 2018-08-02
Publication date: 2019-01-04

Abstract

This application discloses a kind of cell recognition method and device based on random forest disaggregated model.This method comprises: testing the random forest disaggregated model using initial cell image sample set training random forest disaggregated model, obtaining optimal sample accuracy rate；Using the corresponding random forest disaggregated model of the optimal sample accuracy rate as fitness value calculation function, optimal Artificial Fish individual is obtained using artificial fish-swarm algorithm, the initial value for updating the default several initial value and character subset number parameter of random forest disaggregated model, steps be repeated alternatively until best eigenvalue to until no longer changing；Classified using random forest disaggregated model corresponding with the best eigenvalue to the cell in image to be detected.The application carries out feature selecting to random forest grader using artificial fish-swarm algorithm, solves the problems, such as that feature redundancy in the model, the generalization ability of entire classifier is insufficient.

Description

Cell recognition method and device based on random forest disaggregated model

Technical field

This application involves image recognition algorithms and machine learning field, more particularly to the uterine neck based on Random Forest model Epithelial cell recognition methods and device.

Background technique

About cell recognition, classifier commonly used in the prior art includes: decision tree, random forest etc.；Feature Selection is normal Algorithm includes: artificial fish-swarm algorithm (AFSA) etc..Wherein, the scale of random forest determines the multiplicity of sample subspace Property, but its scale it is excessive with it is too small all improper.Meanwhile be increase proper subspace diversity, from total characteristic with The formal character feature of machine is for single decision tree learning.If but character subset size choose it is improper, it is possible that special Levy redundancy, the nicety of grading of single decision tree reduces, the generalization ability deficiency of entire classifier etc. influences.

Summary of the invention

Aiming to overcome that the above problem or at least being partially solved or extenuate for the application solves the above problems.

According to the first aspect of the application, a kind of cell recognition method based on random forest disaggregated model is provided, Include:

Parameter initialization step: default based on a random forest disaggregated model several and character subset number, described in setting The range of default several range and the character subset number；

Model obtains step:, will be described pre- in the range of default several range and the character subset number If the several and character subset number random combine is at characteristic value pair, the characteristic value is to formation characteristic value to set；For Each of characteristic value collection characteristic value pair utilizes initial cell image sample set training random forest classification mould Type tests the random forest disaggregated model, obtains sample accuracy rate, using optimal sample accuracy rate as the feature The sample accuracy rate of value set；

Parameter updates step: using the corresponding random forest disaggregated model of the optimal sample accuracy rate as fitness value meter Function is calculated, using the sample accuracy rate of the characteristic value collection as the fitness value of artificial fish-swarm algorithm, by the characteristic value pair Set be converted to Artificial Fish individual and be input to artificial fish-swarm algorithm, optimal Artificial Fish individual is obtained, by the optimal Artificial Fish Individual be converted to best eigenvalue to and as the random forest disaggregated model default several initial value and character subset The initial value of number parameter repeats the parameter initialization step, until best eigenvalue is to no longer changing；With

Classifying step: using random forest disaggregated model corresponding with the best eigenvalue to thin in image to be detected Born of the same parents classify.

Random forest grader model of the present processes based on artificial fish school algorithm, utilizes artificial fish-swarm algorithm Feature selecting is carried out to random forest grader, while the parameter in random forest grader model being optimized, is solved Feature redundancy, the problem of the generalization ability deficiency of entire classifier in the model, improve the nicety of grading of decision tree.

Optionally, it is obtained in step in the model, it is described to utilize initial cell image sample set training random forest Disaggregated model includes:

Sampling step: several are randomly selected from original sample set with putting back to an equal amount of with the original sample set Training sample set；

Decision tree training step: the decision in the training sample set training random forest disaggregated model is utilized Tree without putting back to chooses the character subset for meeting the characteristic value centering feature number of subsets to institute in decision tree division Decision tree is stated to be trained；With

Decision tree forest generation step: repeated sampling step and decision tree training step, until generating has described preset The random forest disaggregated model of the several decision tree of decision tree.

Optionally, it is obtained in step in the model, it is described that the random forest disaggregated model is tested, obtain sample This accuracy rate includes:

Testing procedure: using the sample that do not chosen by any training sample set in the original sample set as test specimens This, is input to all decision trees for each of test sample sample, obtains the classification results of each decision tree；

Ballot step: the classification results of all decision trees are subjected to simple majority ballot, using voting results as the test The classification results of sample；With

Accuracy rate calculates step: using the correct quantity of classification results account for the ratio of the total sample number of the test sample as The sample accuracy rate.

Optionally, the parameter update step includes:

Artificial Fish individual forming step: by each of characteristic value collection characteristic value to carry out binary coding, Artificial Fish individual is formed, by more than two Artificial Fish random groups of individuals at multiple shoals of fish；

Optimal Artificial Fish individual obtains step: operator of bunching successively is executed to the shoal of fish, the operator and looking for food of knocking into the back and is counted operator The maximum Artificial Fish individual of fitness value is calculated as optimal Artificial Fish individual；With

Initial value updates step: the optimal Artificial Fish individual is converted to best eigenvalue to and as described random gloomy The initial value of the default several initial value and character subset number parameter of standing forest class model repeats the parameter initialization step Suddenly, until best eigenvalue is to no longer changing.

According to the second aspect of the application, a kind of cell recognition device based on random forest disaggregated model is provided, Include:

Parameter initialization module is disposed for the default several and character subset number based on random forest disaggregated model The range of default several range and the character subset number is arranged in mesh；

Model obtains module, is disposed for the range in default several range and the character subset number Interior, by the default several and character subset number random combine at characteristic value pair, the characteristic value is to formation characteristic value To set；It is random using the training of initial cell image sample set for each of characteristic value collection characteristic value pair Forest classified model tests the random forest disaggregated model, obtains sample accuracy rate, and optimal sample accuracy rate is made For the sample accuracy rate of the characteristic value collection；

Parameter update module is disposed for making the corresponding random forest disaggregated model of the optimal sample accuracy rate It will using the sample accuracy rate of the characteristic value collection as the fitness value of artificial fish-swarm algorithm for fitness value calculation function The set of the characteristic value pair is converted to Artificial Fish individual and is input to artificial fish-swarm algorithm, optimal Artificial Fish individual is obtained, by institute State optimal Artificial Fish individual be converted to best eigenvalue to and as default of the random forest disaggregated model it is several initial The initial value of value and character subset number parameter repeats the parameter initialization step, until best eigenvalue is to no longer changing Until；With

Categorization module is disposed for using random forest disaggregated model corresponding with the best eigenvalue to be checked Cell in altimetric image is classified.

Random forest grader model of the device of the application based on artificial fish school algorithm, utilizes artificial fish-swarm algorithm Feature selecting is carried out to random forest grader, while the parameter in random forest grader model being optimized, is solved Feature redundancy, the problem of the generalization ability deficiency of entire classifier in the model, improve the nicety of grading of decision tree.

Optionally, the model acquisition module includes:

Sampling module is disposed for randomly selecting several and the original sample collection from original sample set with putting back to Close an equal amount of training sample set；

Decision tree training module is disposed for utilizing the training sample set training random forest classification mould Decision tree in type without putting back to chooses the spy for meeting the characteristic value centering feature number of subsets in decision tree division Sign subset is trained the decision tree；With

Decision tree forest generation module, is disposed for repeated sampling module and decision tree training module, until generating Random forest disaggregated model with the several decision tree of the default decision tree.

Optionally, the model acquisition module includes:

Test module is disposed for the sample that will not chosen by any training sample set in the original sample set As test sample, each of test sample sample is input to all decision trees, obtains each decision tree Classification results；

Vote module is disposed for the classification results of all decision trees carrying out simple majority ballot, ballot is tied Classification results of the fruit as the test sample；With

Accuracy rate computing module, the sample for being disposed for accounting for classification results correct quantity the test sample are total Several ratios is as the sample accuracy rate.

Optionally, the parameter update module includes:

Artificial Fish individual forms module, is disposed for each of characteristic value collection characteristic value to progress Binary coding forms Artificial Fish individual, by more than two Artificial Fish random groups of individuals at multiple shoals of fish；

Optimal Artificial Fish individual obtains module, be disposed for successively executing the shoal of fish operator of bunching, knock into the back operator and It looks for food and operator and calculates the maximum Artificial Fish individual of fitness value as optimal Artificial Fish individual；With

Initial value update module, be disposed for being converted to the optimal Artificial Fish individual best eigenvalue to and make For the initial value of the default several initial value and character subset number parameter of the random forest disaggregated model, the ginseng is repeated Initialization step is measured, until best eigenvalue is to no longer changing.

According in terms of the third of the application, providing a kind of calculating equipment, including memory, processor and it is stored in institute State the computer program that can be run in memory and by the processor, wherein the processor executes the computer program Shi Shixian method as described above.

According to the 4th of the application the aspect, provide a kind of computer readable storage medium, it is preferably non-volatile can Storage medium is read, is stored with computer program, the computer program is realized as described above when executed by the processor Method.

According to the accompanying drawings to the detailed description of the specific embodiment of the application, those skilled in the art will be more Above-mentioned and other purposes, the advantages and features of the application are illustrated.

Detailed description of the invention

Some specific embodiments of the application are described in detail by way of example and not limitation with reference to the accompanying drawings hereinafter. Identical appended drawing reference denotes same or similar part or part in attached drawing.It should be appreciated by those skilled in the art that these What attached drawing was not necessarily drawn to scale.In attached drawing:

Fig. 1 is the schematic stream according to one embodiment of the cell recognition method of the random forest disaggregated model of the application Cheng Tu；

Fig. 2 is the schematic block diagram according to the training step of the present processes；

Fig. 3 is the schematic block diagram according to the testing procedure of the present processes；

Fig. 4 is according to the schematic of another embodiment of the cell recognition method of the random forest disaggregated model of the application Flow chart；

Fig. 5 is the schematic frame according to one embodiment of the cell recognition device of the random forest disaggregated model of the application Figure；

Fig. 6 is the block diagram of one embodiment of the calculating equipment of the application；

Fig. 7 is the block diagram of one embodiment of the computer readable storage medium of the application.

Specific embodiment

For decision tree as single classifier, classification effectiveness is very high, but its classification results often will appear local optimum Solution, and globally optimal solution cannot be obtained；In the training process of decision tree, the phenomenon that being easy to appear over-fitting.Random forest is calculated Method is composed of a series of mutually independent decision trees, each decision tree constitutes entire random forests algorithm Minimum composition.Its expression-form can be write as R={ h (x, θ_k), k=1,2 ... K }, wherein { θ_kIt is randomness vector, it obeys In independent same distribution, K is the quantity of independent decision tree in entire classifier.When random forest grader gives an independent variable X Afterwards, each decision tree independently can carry out independent judgment to input, and it is optimal to select entire classifier eventually by ballot Classification results.Individual decision tree decision-making capability is often weaker, but a series of decision trees are carried out organic set, Decision-making capability will be very powerful.

One random forest with N decision tree of creation, needs N number of training sample set.In order to enable not generating part Optimal solution, random forests algorithm create N number of training sample set using there is bagging (Bagging) method put back to sampling.This method is taken out While taking training sample, the OOB sample that part is not drawn can be generated.When original sample amount is bigger, can generate about 36.8% OOB sample.Therefore, random forests algorithm can be directly based upon outer (Out_of_bag, the OOB) sample of bag and calculate error Precision.When OOB test sample terminates, and accuracy λ reaches stable, random forest training is finished.

Random forests algorithm has a series of advantage compared to other classifiers.It is that it fits input data first Should be able to power it is stronger, input data can be the binary feature, numerical characteristic, high position data etc. without any scaling；Then with For machine forest algorithm using simply, training speed is fast, high-efficient；Meanwhile random forests algorithm introduces two randomnesss, is each The randomness of training subset is generated when decision tree sample drawn, each decision tree itself constructs the randomness of attribute subspace, makes Score appliances have very strong anti-noise ability, and the problem of do not have over-fitting.Due to point as identification cervical epithelial cells Class device needs to handle a large amount of cell image, therefore can choose the trained and faster random forests algorithm of processing speed.

The building process of random forests algorithm, which is broadly divided into, to be extracted training set, training decision tree and algorithm creation and executes. Wherein the forest scale nTree and sub-set size k of attributive character is the important parameter in training process.The scale of forest is big The quantity of base classifier in small presentation class device, the sub-set size of attributive character refer to the node of decision tree divided when It waits, for carrying out calculating the feature quantity of best attributes.When decision tree is classified, it will usually in all features randomly Select log₂M+1 orA feature is used to calculate best Split Attribute.Wherein M is the quantity of input variable, the Partial Feature Without participating in calculating, only it is responsible for the calculating of best Split Attribute, its purpose is to reduce the correlation between tree, improves every The classification accuracy of tree.

Parameter nTree is bigger, then the decision tree in random forest is more, and the diversity of random forest grader is better, point Class precision is higher；But classifying quality tends to constant after nTree reaches certain value, causes the time and space meter of classifier instead Calculation amount is big, explanatory reduction.NTree is too small, then causes the reduction of classifier diversity, classification performance is deteriorated, precision reduces.K table During showing creation random forest, without being sampled from total characteristic concentration with putting back to formula when node split, obtained character subset Size.Under normal conditions, k value is constant when creating decision tree, and is much smaller than the size of total characteristic collection, and meaning is can be to prevent Only there is over-fitting in classifier, while increasing the diversity between decision tree.When k value is excessive, it will cause more between decision tree Sample is low, reduces classifying quality；When k value is too small, although diversity is very high between base classifier, the nicety of grading of classifier, Generalization ability can all reduce.

It can be seen that the parameter for influencing random forests algorithm calculating speed and classifying quality specifically includes that random forest Scale nTree, the sub-set size k of attributive character.The sub-set size k of attributive character relatively common value has 1,log₂M+1, when M value is smaller, usually selection log₂M+1 algorithm performance is preferable, but that does not fix takes Value method.

If in fact, a large amount of studies have shown that an Ensemble classifier algorithm is in training, base classifier can reach compared with High nicety of grading, and be independently independent of each other between each base classifier, then the Ensemble classifier algorithm can achieve very ideal Classification results.Therefore, when improving to random forest network, it is only necessary to improve the classification essence of single decision tree as much as possible Degree, while guaranteeing to set and be independent of each other between setting.Therefore, to the performance of raising random forests algorithm, need to guarantee sample The validity and diversity in space and proper subspace.

As the above analysis, the scale of random forest determines the diversity of sample subspace, but its scale is excessive With it is too small all improper.Meanwhile to increase the diversity of proper subspace, supplied from total characteristic with random formal character feature Single decision tree learning.If but character subset size choose it is improper, it is possible that feature redundancy, single decision tree Nicety of grading reduction, generalization ability deficiency of entire classifier etc. influence.

This application provides a kind of random forest grader model based on artificial fish school algorithm utilizes Artificial Fish Group's algorithm carries out feature selecting to random forest grader, while optimizing to parameters such as forest scales.

One embodiment of the application discloses a kind of cell recognition method based on random forest disaggregated model.Fig. 1 is According to the schematic flow chart of one embodiment of the cell recognition method of the random forest disaggregated model of the application.This method can To include:

S100 parameter initialization step: the default several and character subset number based on random forest disaggregated model, setting The range of default several range and the character subset number；

S200 model obtains step: in the range of default several range and the character subset number, by institute The default several and character subset number random combine is stated into characteristic value pair, the characteristic value is to formation characteristic value to set； For each of characteristic value collection characteristic value pair, classified using initial cell image sample set training random forest Model tests the random forest disaggregated model, obtains sample accuracy rate, using optimal sample accuracy rate as the spy The sample accuracy rate of value indicative set；

S300 parameter updates step: using the corresponding random forest disaggregated model of the optimal sample accuracy rate as fitness Value calculates function, using the sample accuracy rate of the characteristic value collection as the fitness value of artificial fish-swarm algorithm, by the feature The set of value pair is converted to Artificial Fish individual and is input to artificial fish-swarm algorithm, optimal Artificial Fish individual is obtained, by the optimal people Work fish individual be converted to best eigenvalue to and as the random forest disaggregated model default several initial value and feature The initial value of number of subsets parameter repeats the parameter initialization step, until best eigenvalue is to no longer changing；

S400 classifying step: using random forest disaggregated model corresponding with the best eigenvalue in image to be detected Cell classify.

In S100 parameter initialization step, the default decision tree number of random forest disaggregated model can be initialized NTree, character subset number k etc. parameters.And initialize the model of default several range and the character subset number It encloses.Optionally, iteration maximum times Maxgen and binary features value { Attribute can also be initialized_i| i=1,2 ..., M}.Optionally, the parameter of artificial fish-swarm algorithm can also be initialized in this step, for example, Artificial Fish quantity N, fish school location X =(x₁,x₂,…x_d)^T, the visual field Visual of Artificial Fish, maximum step-length step, crowding factor delta；The maximum times that behavior is attempted Try_number etc..

Optionally, it may include training step and testing procedure that the S200 model, which obtains step,.Fig. 2 and Fig. 3 are respectively According to the schematic block diagram of the training step of the present processes and testing procedure.

Wherein, which may include:

This method can be generated and be trained to each decision tree of random forest, using there is the sample collection put back to Method can reduce the data volume requirement to original sample set, is more fully trained using available data to model.

In this step, it is assumed that original sample collection is combined into (X, Y), wherein (x₁,y₁),(x₂,y₂),…(x_N,y_N) ∈ (X, Y), The sample size of original sample set is N.Original sample set can be the image sample including cervical epithelial cells and/or lymphocyte This set.

It can be used and pull out boots (Bootstrap) sampling method randomly select N number of sample (x with putting back to_i,y_i), as training sample This set (X^*,Y^*).By training sample set (X^*,Y^*) input in random forest disaggregated model, k is chosen in each tree division A character subset is trained instruction decision tree.Recycle the step, until generate the decision tree forest of default number, that is, complete with The preliminary foundation of machine forest classified device.

Optionally, the testing procedure may include:

This method can test model using the outer sample of bag, simplify data source, user is not only able to from one Training data is obtained in a original sample set, moreover it is possible to test data be obtained according to training data, and then evaluated model.

Optionally, which can also include:

Judge whether the number of repetition gen > Maxgen of above-mentioned each step is true, after establishment then exports optimization The default decision tree number nTree of random forest disaggregated model, character subset number k, binary features value { Attribute_i|i =1,2 ..., M }；If invalid, S300 step is executed.

Optionally, the S300 parameter update step may include:

This method improves random forests algorithm using Artificial Fish algorithm, using artificial fish-swarm algorithm to random gloomy Woods classifier carries out feature selecting, while optimizing to parameters such as forest scales, and the parameter for improving random forests algorithm is set The accuracy and specific aim set reduce the number of modification parameter and the number of repetition training and test, using machine learning The accuracy of disaggregated model is improved with the mode of artificial intelligence.

In this application, variable to be optimized may include: nTree, k, { Attribute_i| i=1,2 ..., M }；Target Function are as follows: f (nTree^*,k^*,{Attribute_i| i=1,2 ..., M })=argmin (avg (OOBerror)).NTree is set Value range is [1, N], and the value range of k is [1, M].Wherein, N can be less than or equal to some setting value, such as 100, M The sample size of entire sample space can be less than or equal to.By in value range, by nTree and k random combine, Mei Gete For sign to that can carry out once training and test to decision tree, each feature is individual to that can form one or more Artificial Fishs.People In work fish-swarm algorithm calculating process, the quantity of state of Artificial Fish individual is all made of as binary coding, using the binary coding as The binary features value of training sample set.

In an optional embodiment, characteristic value is subjected to binary expression to nTree and k, and be put in respectively two into 2 segments of section processed, for example, in nTree=1, in the case of k=5, with binary representation nTree=001, k=101, then the people The quantity of state of work fish individual is 001101.At this point, a feature is to one Artificial Fish individual of formation.

In another optional embodiment, the quantity of state of Artificial Fish individual includes three sections, and characteristic value is to the state of adding The characteristic value Attribute of amount_i, wherein { Attribute_i| i=1,2 ..., M } 0 in segment indicate feature at the position Unselected, 1 indicates that the feature of position at this is selected, and Prescribed Properties k≤sum (Attribute_i=1).For example, NTree=1, k=5, in the case that the sample size of entire sample space is 10, in the sample space containing 10 samples with Machine selects 5 samples, this 5 samples are the sample at the 1st, 2,3,4,6 positions, Attribute respectively_i=1111010000 With nTree=001 if binary representation, the characteristic value of k=101 quantity of state is 0011011111010000.At this point, a spy Sign is individual to that can form multiple Artificial Fishs.

Assuming that having quantity in a certain d dimension space is the Artificial Fish of N, vector X=(x₁,x₂,…x_d)^TFor institute in the shoal of fish There is the state position of Artificial Fish, Visual is the field range of worker fish, and step is maximum step-length of the worker fish in travelling.? Certain moment, the Artificial Fish randomly choose a state X within the vision_v=(x_v1,x_v2,…,x_vd), if state X_vBetter than X, then The fish is to state X_vDirection is moved to X_j；Otherwise the Artificial Fish is randomly choosing other states and movement within sweep of the eye.d_i,j=| |X_i-X_y| | indicate the space length between two Artificial Fishs.Y=f (X) indicates the food concentration at the perceived X of certain Artificial Fish, That is target function value.δ indicates the crowding factor of the shoal of fish in certain area of space.

The following are the core behaviors of artificial fish-swarm:

(1) foraging behavior: when Artificial Fish perceives X_jLocate food concentration and is higher than X_i, Artificial Fish is then according to public shown in following formula Mobile operator is moved.

Wherein, Rand () is to obey equally distributed random number, and value range is (- 1,1).

If X_{i_next}State is not so good as X_i, then new movement is continued to attempt to, makes repeated attempts and reaches default number of attempt try_ After number, if proper states cannot be found, randomized act shown in following formula is executed.

X_j=X_i+Visual*Rand()

(2) bunch behavior: the shoal of fish is when finding that local environment is threatened or somewhere has abundance of food, in order to improve Mortality and feed efficiency will do it behavior of bunching.Assuming that there is the state X of certain Artificial Fish_i, count the field range d of the fish The quantity n of whole fishes in≤Visual_fAnd within the scope of this shoal of fish center position X_c.If Y_c/n_f> δ Y_iIt sets up, then Functional value at shoal of fish center is higher and the Artificial Fish near Artificial Fish density it is lower, the fish will as the following formula shown in mobile calculate Son moves about.If instead behavior condition of bunching is invalid, then the Artificial Fish will carry out foraging behavior.

(3) knock into the back behavior: individual fish when moving, due to the need of trend and collective far from natural enemy to food direction It wants, can be moved because of the movement of other parts fish, this behavior is referred to as the behavior of knocking into the back.Assuming that there is the state X of certain Artificial Fish_i, Count the quantity n of other fishes in field range d≤Visual of the fish_f, and periphery food concentration maximum is found in the range Y_maxArtificial Fish X_max.If Y_max/n_f> δ Y_iIt sets up, illustrates Artificial Fish X this moment_maxThe density of periphery fish is not high, and there are also continue The space bunched, therefore Artificial Fish X_iMobile operator travelling according to the following formula.If the condition that knocks into the back is invalid, continue to look for food Behavior.

The algorithm experimental for solving optimal value is usually to roll over cross validation by p to traverse k, then according to calculated Minimum error values or maximum AUC determine optimal value, and the time complexity of this algorithm is higher, therefore be not suitable for big data quantity Feature set.Because solving optimized parameter is actually to solve to minimize extensive error, in feature selecting and parameter optimization In the process, time loss when two-category data can be with OOB error substitution cross validation, such time complexity then become 1/ p.And among the process classified, to use cross validation.

In an optional embodiment, operator of bunching successively is executed to the shoal of fish, the operator and looking for food of knocking into the back and is counted operator The maximum Artificial Fish individual of fitness value is calculated as optimal Artificial Fish individual.

In another optional embodiment, bunch operator and calculation of knocking into the back can be executed respectively to the current shoal of fish first Son.Aggregation Operators can make shoal of fish center aggregation of the Artificial Fish into visual range.Knocking into the back operator being capable of working as according to Artificial Fish Front position and fitness value search in its sensing range the maximum Artificial Fish of fitness value and its fitness value in all partners.Such as Fruit fitness value is greater than the sample accuracy rate of the characteristic value collection, i.e. the first fitness value, just with the maximum people of fitness value The Artificial Fish in its sensing range is searched for centered on work fish.If Y_max/n_f> δ Y_iSet up, show the position it is more excellent and its around Less crowded, then Artificial Fish is to the maximum Artificial Fish X of fitness value_maxDirection take a step forward.

Judge that Artificial Fish executes whether bunch operator and fitness value after the operator that knocks into the back improve respectively, if it is, comparing The fitness value for reuniting the Artificial Fish of group operator and the operator that knocks into the back is executed, selects fitness value biggish as final and executes calculation Son.If operator of attempting to bunch respectively is with after the operator that knocks into the back, Artificial Fish fitness value is not improved, then executes operator of looking for food.

Operator of looking for food can find the behavior of more excellent position according to current location, and carry out position transfer.

After location updating, the fitness value of artificial fish-swarm is recalculated, and records optimum individual, if optimum individual is suitable Angle value is answered no longer to change, circulation terminates, and exports optimal solution.In one alternate embodiment, if optimum individual fitness value also It is improving, is illustrating to find optimal solution not yet, then continue iterative cycles, until finding optimal solution.In another alternative embodiment In, if cycle-index reaches default number of attempt, current optimal solution is exported.Optimal solution is decoded, is obtained initial Change the default decision tree number nTree of random forest disaggregated model, the parameters such as character subset number k.Repeating parameter initialization When step, respective range is determined according to the decision tree number nTree and character subset number k, for example, centered on nTree, It is in the range of [k-3, k+3], to re-execute model centered on k and walked in the range of [nTree-5, nTree+5] Rapid and parameter updates step, until best eigenvalue is to no longer changing in parameter update step.

Fig. 4 is according to the schematic of another embodiment of the cell recognition method of the random forest disaggregated model of the application Flow chart.After input original sample collection carries out parameter initialization, training sample set is extracted with putting back to using bootstrap, Tree division is to train single decision tree without k character subset is randomly choosed with putting back to.The step is recycled, until forest scale reaches To ntree.OOB error is tested as the fitness value Y of AFSA and judges whether cycle-index gen is greater than Maxgen, no feelings Artificial fish-swarm algorithm is executed under condition, random forest disaggregated model will be utilized using ntree, k, attribute as state value X Fitness of the OOB error as Artificial Fish, Artificial Fish looked for food respectively, bunched, behavior of knocking into the back, and evaluates fitness respectively, More fitness updates global optimum's Artificial Fish state, and the process is repeated, until number of attempt Try_number reaches predetermined Value exports optimized parameter, and repeats the parameter initialization of Random Forest model, when cycle-index reaches Maxgen, output Optimized parameter obtains random forest disaggregated model.Using the model to including cervical epithelial cells image and lymphocyte image Test sample collection carry out feature extraction, be then input to classifier and cell identified.The present processes can be accurate Identify cervical epithelial cells and lymphocyte.Therefore, the random forest disaggregated model after parameter optimization is to data classification knot The confidence level of fruit is higher, and whole classifier precision is higher, generalization ability is stronger.

One embodiment of the application also discloses a kind of cervical epithelial cells identification based on random forest disaggregated model Device.Fig. 5 is the schematic block diagram according to one embodiment of the cell recognition device of the random forest disaggregated model of the application. The apparatus may include:

Parameter initialization module 100 is disposed for default several and feature based on random forest disaggregated model Collect number, the range of default several range and the character subset number is set.

Model obtains module 200, is disposed in default several range and the character subset number In range, by the default several and character subset number random combine at characteristic value pair, the characteristic value is special to being formed Value indicative is to set；For each of characteristic value collection characteristic value pair, the training of initial cell image sample set is utilized Random forest disaggregated model tests the random forest disaggregated model, obtains sample accuracy rate, and optimal sample is accurate Sample accuracy rate of the rate as the characteristic value collection.

Parameter update module 300 is disposed for the corresponding random forest classification mould of the optimal sample accuracy rate Type is as fitness value calculation function, using the sample accuracy rate of the characteristic value collection as the fitness of artificial fish-swarm algorithm Value, is converted to Artificial Fish individual for the set of the characteristic value pair and is input to artificial fish-swarm algorithm, obtains optimal Artificial Fish individual, By the optimal Artificial Fish individual be converted to best eigenvalue to and as default of the random forest disaggregated model it is several The initial value of initial value and character subset number parameter repeats the parameter initialization step, until best eigenvalue is to no longer Until variation.

Categorization module 400 is disposed for utilizing random forest disaggregated model pair corresponding with the best eigenvalue Cell in image to be detected is classified.

Optionally, it includes the training device that the model, which obtains module 200, which may include:

Decision tree training module is disposed for utilizing the training sample set training random forest classification mould Decision tree in type without putting back to chooses the spy for meeting the characteristic value centering feature number of subsets in decision tree division Sign subset is trained the decision tree；

The device can test model using the outer sample of bag, simplify data source, user is not only able to from one A original sample, which is concentrated, obtains training data, moreover it is possible to obtain test data according to training data, and then evaluate model.

It includes the test device that the model, which obtains module, which may include:

Vote module is disposed for the classification results of all decision trees carrying out simple majority ballot, ballot is tied Classification results of the fruit as the test sample；

The parameter update module 300 may include:

The device improves random forests algorithm using Artificial Fish algorithm, using artificial fish-swarm algorithm to random gloomy Woods classifier carries out feature selecting, while optimizing to parameters such as forest scales, and the parameter for improving random forests algorithm is set The accuracy and specific aim set reduce the number of modification parameter and the number of repetition training and test, using machine learning The accuracy of disaggregated model is improved with the mode of artificial intelligence.

The one aspect of the embodiment of the application provides a kind of calculating equipment, and referring to Fig. 6, which includes depositing Reservoir 1120, processor 1110 and it is stored in the computer journey that can be run in the memory 1120 and by the processor 1110 Sequence, the computer program are stored in the space 1130 for program code in memory 1120, the computer program by It manages when device 1110 executes and realizes for any one of execution steps of a method in accordance with the invention 1131.

The other side of the embodiment of the application additionally provides a kind of computer readable storage medium.It, should referring to Fig. 7 Computer readable storage medium includes the storage unit for program code, which is provided with for executing according to this hair The program 1131 ' of bright method and step, the program are executed by processor.

The other side of the application embodiment additionally provides a kind of computer program product comprising instruction, including meter Calculation machine readable code causes the calculating equipment to execute institute as above when the computer-readable code is executed by calculating equipment The method stated.

In the above-described embodiments, can come wholly or partly by software, hardware, firmware or any combination thereof real It is existing.When implemented in software, it can entirely or partly realize in the form of a computer program product.The computer program Product includes one or more computer instructions.When computer loads and executes the computer program instructions, whole or portion Ground is divided to generate according to process or function described in the embodiment of the present application.The computer can be general purpose computer, dedicated computing Machine, computer network obtain other programmable devices.The computer instruction can store in computer readable storage medium In, or from a computer readable storage medium to the transmission of another computer readable storage medium, for example, the computer Instruction can pass through wired (such as coaxial cable, optical fiber, number from a web-site, computer, server or data center User's line (DSL)) or wireless (such as infrared, wireless, microwave etc.) mode to another web-site, computer, server or Data center is transmitted.The computer readable storage medium can be any usable medium that computer can access or It is comprising data storage devices such as one or more usable mediums integrated server, data centers.The usable medium can be with It is magnetic medium, (for example, floppy disk, hard disk, tape), optical medium (for example, DVD) or semiconductor medium (such as solid state hard disk Solid State Disk (SSD)) etc..

Professional should further appreciate that, described in conjunction with the examples disclosed in the embodiments of the present disclosure Unit and algorithm steps, can be realized with electronic hardware, computer software, or a combination of the two, hard in order to clearly demonstrate The interchangeability of part and software generally describes each exemplary composition and step according to function in the above description. These functions are implemented in hardware or software actually, the specific application and design constraint depending on technical solution. Professional technician can use different methods to achieve the described function each specific application, but this realization It is not considered that exceeding scope of the present application.

Those of ordinary skill in the art will appreciate that implement the method for the above embodiments be can be with By program come instruction processing unit completion, the program be can store in computer readable storage medium, and the storage is situated between Matter is non-transitory (English: non-transitory) medium, such as random access memory, read-only memory, flash Device, hard disk, solid state hard disk, tape (English: magnetic tape), floppy disk (English: floppy disk), CD (English: Optical disc) and any combination thereof.

The preferable specific embodiment of the above, only the application, but the protection scope of the application is not limited thereto, Within the technical scope of the present application, any changes or substitutions that can be easily thought of by anyone skilled in the art, Should all it cover within the scope of protection of this application.Therefore, the protection scope of the application should be with scope of protection of the claims Subject to.

Claims

1. a kind of cell recognition method based on random forest disaggregated model, comprising:

Parameter initialization step: the default several and character subset number based on random forest disaggregated model is arranged described default The range of several ranges and the character subset number；

Model obtains step: in the range of default several range and the character subset number, by described default The several and character subset number random combine is at characteristic value pair, and the characteristic value is to formation characteristic value to set；For described Each of characteristic value collection characteristic value pair is right using initial cell image sample set training random forest disaggregated model The random forest disaggregated model is tested, and sample accuracy rate is obtained, using optimal sample accuracy rate as the characteristic value collection The sample accuracy rate of conjunction；

Parameter updates step: using the corresponding random forest disaggregated model of the optimal sample accuracy rate as fitness value calculation letter Number, using the sample accuracy rate of the characteristic value collection as the fitness value of artificial fish-swarm algorithm, by the collection of the characteristic value pair Conjunction is converted to Artificial Fish individual and is input to artificial fish-swarm algorithm, obtains optimal Artificial Fish individual, by the optimal Artificial Fish individual Be converted to best eigenvalue to and as the random forest disaggregated model default several initial value and character subset number The initial value of parameter repeats the parameter initialization step, until best eigenvalue is to no longer changing；With

Classifying step: using random forest disaggregated model corresponding with the best eigenvalue to the cell in image to be detected into Row classification.

2. the method according to claim 1, wherein the model obtain step in, it is described utilize primary fine Born of the same parents' image sample set training random forest disaggregated model include:

Sampling step: several and an equal amount of training of original sample set are randomly selected from original sample set with putting back to Sample set；

Decision tree training step: training the decision tree in the random forest disaggregated model using the training sample set, The decision tree without putting back to chosen when dividing meet the character subset of the characteristic value centering feature number of subsets to it is described certainly Plan tree is trained；With

Decision tree forest generation step: repeated sampling step and decision tree training step, until generating has the default decision Set the random forest disaggregated model of a several decision tree.

3. described to described random according to the method described in claim 2, it is characterized in that, obtained in step in the model Forest classified model is tested, and is obtained sample accuracy rate and is included:

Testing procedure:, will using the sample that do not chosen by any training sample set in the original sample set as test sample Each of test sample sample is input to all decision trees, obtains the classification results of each decision tree；

Ballot step: the classification results of all decision trees are subjected to simple majority ballot, using voting results as the test sample Classification results；With

Accuracy rate calculates step: the correct quantity of classification results is accounted for the ratio of the total sample number of the test sample as described in Sample accuracy rate.

4. according to the method in any one of claims 1 to 3, which is characterized in that the parameter updates step and includes:

Artificial Fish individual forming step: it by each of characteristic value collection characteristic value to binary coding is carried out, is formed Artificial Fish individual, by more than two Artificial Fish random groups of individuals at multiple shoals of fish；

Optimal Artificial Fish individual obtains step: successively executes operator of bunching to the shoal of fish, knocks into the back operator and the operator and calculating of looking for food is fitted Answer the maximum Artificial Fish individual of angle value as optimal Artificial Fish individual；With

Initial value updates step: the optimal Artificial Fish individual being converted to best eigenvalue and is divided to and as the random forest The initial value of the default several initial value and character subset number parameter of class model, repeats the parameter initialization step, directly To best eigenvalue to until no longer changing.

5. a kind of cell recognition device based on random forest disaggregated model, comprising:

Parameter initialization module is disposed for the default several and character subset number based on random forest disaggregated model, The range of default several range and the character subset number is set；

Model obtains module, is disposed in the range of default several range and the character subset number, By the default several and character subset number random combine at characteristic value pair, the characteristic value is to formation characteristic value to collection It closes；For each of characteristic value collection characteristic value pair, initial cell image sample set training random forest is utilized Disaggregated model tests the random forest disaggregated model, obtains sample accuracy rate, using optimal sample accuracy rate as institute State the sample accuracy rate of characteristic value collection；

Parameter update module is disposed for using the corresponding random forest disaggregated model of the optimal sample accuracy rate as suitable Angle value is answered to calculate function, it, will be described using the sample accuracy rate of the characteristic value collection as the fitness value of artificial fish-swarm algorithm The set of characteristic value pair is converted to Artificial Fish individual and is input to artificial fish-swarm algorithm, obtains optimal Artificial Fish individual, by described in most Excellent Artificial Fish individual be converted to best eigenvalue to and as the random forest disaggregated model default several initial value and The initial value of character subset number parameter repeats the parameter initialization step, until best eigenvalue is to no longer changing； With

Categorization module is disposed for using random forest disaggregated model corresponding with the best eigenvalue to figure to be detected Cell as in is classified.

6. device according to claim 5, which is characterized in that the model obtains module and includes:

Sampling module is disposed for randomly selecting several and the original sample collection contract from original sample set with putting back to The training sample set of sample size；

Decision tree training module is disposed for using in the training sample set training random forest disaggregated model Decision tree, the decision tree division when without putting back to choose meet the characteristic value centering feature number of subsets feature son Collection is trained the decision tree；With

Decision tree forest generation module, is disposed for repeated sampling module and decision tree training module, until generating has The random forest disaggregated model of the several decision tree of the default decision tree.

7. device according to claim 6, which is characterized in that the model obtains module and includes:

Test module, be disposed for the sample that will not chosen by any training sample set in the original sample set as Each of test sample sample is input to all decision trees, obtains the classification of each decision tree by test sample As a result；

Vote module is disposed for the classification results of all decision trees carrying out simple majority ballot, voting results is made For the classification results of the test sample；With

Accuracy rate computing module is disposed for accounting for the correct quantity of classification results into the total sample number of the test sample Ratio is as the sample accuracy rate.

8. device according to any one of claims 5 to 7, which is characterized in that the parameter update module includes:

Artificial Fish individual formed module, be disposed for by each of characteristic value collection characteristic value to carry out two into System coding forms Artificial Fish individual, by more than two Artificial Fish random groups of individuals at multiple shoals of fish；

Optimal Artificial Fish individual obtains module, is disposed for successively executing the shoal of fish operator of bunching, knocks into the back and operator and look for food Operator simultaneously calculates the maximum Artificial Fish individual of fitness value as optimal Artificial Fish individual；With

Initial value update module is disposed for being converted to the optimal Artificial Fish individual best eigenvalue to and as institute The initial value for stating the default several initial value and character subset number parameter of random forest disaggregated model, at the beginning of repeating the parameter Beginningization step, until best eigenvalue is to no longer changing.

9. a kind of calculating equipment, including memory, processor and storage can be run in the memory and by the processor Computer program, wherein the processor is realized when executing the computer program such as any one of claims 1 to 4 institute The method stated.

10. a kind of computer readable storage medium, preferably non-volatile readable storage medium, are stored with computer journey Sequence, the computer program realize method according to any one of claims 1 to 4 when executed by the processor.