CN111753083A

CN111753083A - Complaint report text classification method based on SVM parameter optimization

Info

Publication number: CN111753083A
Application number: CN202010389257.6A
Authority: CN
Inventors: 范青武; 陈�光; 杨凯
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2020-05-10
Filing date: 2020-05-10
Publication date: 2020-10-09

Abstract

The invention discloses a method for automatically classifying complaint report texts, which aims to improve the classification precision and the working efficiency of personnel. The implementation of the invention comprises the following steps: obtaining a certain number of complaint report texts with category labels, and dividing the complaint report texts into a training set text and a test set text; performing word segmentation on the text and removing stop words; constructing a text model, and performing feature extraction and dimension reduction on the text model; training a Support Vector Machine (SVM) by using a training set text model; dynamically optimizing the parameters of the test set text according to the classification accuracy of the SVM to obtain the optimal parameter value of the SVM by adopting an improved drosophila optimization algorithm (IFOA); and preprocessing the complaint report texts to be classified, and inputting the complaint report texts into the SVM subjected to parameter optimization, so that automatic classification can be realized. The method is suitable for the automatic classification of various complaint report texts, has higher classification precision, and solves the problems of low manual classification precision and low efficiency.

Description

Complaint report text classification method based on SVM parameter optimization

Technical Field

The invention relates to the technical field of natural language processing, in particular to a complaint report text classification method based on SVM parameter optimization.

Background

The complaint report is one of the best ways to realize democratic management, public participation and public supervision. For government departments, the system can fully exert the strength of people and improve the working efficiency; for the enterprise unit, the opinion and attitude of the served group can be truly and objectively reflected.

At present, most complaint reporting platforms are designed based on the internet, and with the rapid development of network technology, online work is convenient and rapid, so that the number of reports is greatly increased. In order to process the massive information more efficiently, the staff can classify the massive information according to a certain rule and send the massive information to corresponding departments according to the classification. However, manually classifying the complaint report texts consumes a lot of time and cost, and due to subjective differences of workers, service levels and other reasons, understanding of one thing is deviated, so that classification errors are caused, and subsequent work is directly affected.

The complaint report text has the characteristics of various categories, irregular expression, unobvious characteristics and the like, the traditional text classification method, such as Text Clustering (TC), topic model (LDA) and the like, cannot realize accurate classification, and the artificial neural network has the characteristics of overfitting, poor popularization capability and the like. Therefore, an algorithm with strong classification capability and generalization capability is required. A Support Vector Machine (SVM) is a classification algorithm based on statistical learning, is suitable for the classification problems of small samples, high dimensionality, nonlinearity and the like, can map a low-dimensional space to a high-dimensional space after a kernel function is introduced to realize accurate classification, has strong adaptability, and is suitable for the practical problems related to the invention. However, the accuracy of the SVM classification is closely related to the size of the parameter, so that the selection of a proper parameter is also important.

For the parameter selection of the SVM, manual debugging is time-consuming and labor-consuming and has low precision, so that the optimization of the parameters by applying an optimization algorithm is an optimal method. The drosophila optimization algorithm (FOA) is proposed by pan super-imitating foraging behavior of drosophila, and has stronger optimization capability compared with Genetic Algorithm (GA) and Particle Swarm Optimization (PSO). However, FOA also has disadvantages, such as fixed search range, low population diversity, etc., which make it still converge to local optimum when optimizing complex problems. Thus, there is a need for improvements to FOAs to improve their optimizing capabilities.

Disclosure of Invention

The invention provides a text classification method for dynamically optimizing parameters of an SVM (support vector machine) by using an improved drosophila optimization algorithm (IFOA) aiming at the problem that the text classification of complaints and reports is difficult. The SVM has good classification capability and generalization capability, the optimizing capability of the IFOA is stronger than that of the FOA, and the optimal SVM parameter can be found more accurately, so that the classification accuracy is improved.

The technical scheme of the invention comprises the following steps:

step 1. text acquisition and preprocessing

Step 1.1, a certain number of complaint report texts and the categories thereof are obtained, the category labels are ensured to be accurate, and then the texts are divided into a training set text and a test set text according to a certain proportion.

Step 1.2 word segmentation and stop word removal are performed on all texts using the jieba toolkit of python.

Step 1.3, modeling the text subjected to word segmentation and stop word removal by adopting a space vector model (VSM), and expressing the text as a space vector, wherein the expression of the space vector is as follows:

D_i＝D(t₁,w₁；t₂,w₂；…；t_n,w_n),i∈N⁺,n∈N⁺(1)

in the above formula, D represents a space vector of a certain text, i represents a number of the certain text, t represents a sub-vector corresponding to a certain word in the certain text, w represents a weight of the sub-vector, and n is a label of the sub-vector.

1.4, using a word frequency-inverse document frequency (TF-IDF) algorithm to perform feature extraction and dimension reduction on a text space vector, wherein the method comprises the following steps:

the TF-IDF comprehensively considers the frequency of a single word appearing in a single document and the total frequency of the word appearing in a text set, has higher statistical precision, and has the following calculation formula:

P_i＝tf_ij×idf_i(2)

in the above formula, P_iRepresenting the comprehensive frequency of a certain word, tf representing the frequency of the word appearing in a document, idf representing the proportion of the document containing the word in the whole text set, i representing the label of the word, and j representing the label of the document. If P of the word is less than a certain value, the comprehensive frequency is considered to be low, and the word is removed.

Because the text is converted into a space vector form, the comprehensive frequency of a certain word, namely the comprehensive frequency of the sub-vector corresponding to the word in the whole space vector, achieves the effect of reducing the dimension by eliminating the sub-vector with low frequency. In step 1.3, D_iAfter dimension reduction, the method can be expressed as follows:

D′_i＝D(t₁,w₁；t₂,w₂；…；t_k,w_k),i∈N⁺,k∈N⁺and k < n (3)

In the above formula, D'_iIs less than D_i。

Step 2.SVM training

The general form of a linear discriminant function is:

g(x)＝w^Tx+b＝0 (4)

the points that satisfy the linear discriminant function are:

y_i[w·x_i+b]-1≥0,i＝1,2,…,n (5)

in the above equation, w is a normal vector of the classification plane, b is an offset, and x is (x, y).

The points that satisfy the linear discriminant function are:

y_i[w·x_i+b]-1≥0,i＝1,2,…,n (6)

the optimal classification surface can also be expressed as the minimum of the following function under the above constraint:

therefore, introduce the Lagrange function:

the problem is converted into a dual problem of solving the minimum value of the Lagrange function for w and b, namely convex quadratic programming. By solving the dual problem of the convex quadratic programming, a classification surface function can be finally obtained:

as the Gaussian Radial Basis Function (RBF) has stronger mapping capability, the invention selects the RBF as the kernel function of the SVM:

in the above equation, σ is a width parameter of RBF, and represents control of the radial range. If the sigma is smaller than the minimum distance between all training samples, all samples are support vectors and can be classified correctly; otherwise, all samples would be classified as a class, rendering them incapable of learning.

After the kernel function is introduced, the problem of nonlinear classification is solved, but some linear inseparable problems still exist in the transformed sample space, and the linear discriminant function is difficult to satisfy. Therefore, in order to make the classifier have a certain fault tolerance, a relaxation variable is introduced, so that the conditions are met as follows:

y_i[w·x_i+b]-1+≥0,i＝1,2,…,n (11)

in order to control the integral error number and ensure the classification precision, a penalty factor C is introduced, so that the constraint condition is increased:

in the above formula, the larger C is, the greater the punishment degree of the misclassification is, but the generalization ability of the classification is reduced; otherwise, the classification accuracy is reduced.

Therefore, the SVM training steps are as follows:

and 2.1, determining the values of C and sigma.

And 2.2, inputting the preprocessed training set text into the SVM for training, and substituting C and sigma.

Step 3, SVM parameter optimization

FOA is an optimization algorithm based on the Drosophila foraging principle. The invention provides an IFOA optimization algorithm aiming at the defects of FOA, which comprises the following calculation steps:

(1) parameter of the initialization algorithm, i.e. maximum number of iterations g_maxPopulation size p, initial search radius R, and initial location coordinates X of individual drosophila.

X＝(Random-0.5)·π (13)

In the above formula, Random is a Random number between (0,1), and X is a position coordinate value of an individual.

(2) Calculating the taste concentration judgment value of all drosophila individuals:

S＝tan(X) (14)

in the above formula, S is a taste concentration judgment value of an individual.

(3) Sequentially bringing the taste concentration judgment values of all fruit fly individuals into an objective function (to-be-optimized problem), obtaining the fitness values of the individuals, selecting the individuals with the maximum and minimum fitness values, namely the optimal individual and the worst individual, and recording the positions and the fitness values of the individuals:

fitness＝f(S_n) n＝(1,2,…,p) (15)

[bestfitness，bestlocation]＝max(fitness) (16)

[worstfitness，worstlocation]＝min(fitness) (17)

in the above formula, n is the individual label, the fitness value set of all the individuals is the fixness, f (x) is the objective function, bestfittess is the maximum fitness value, bestfitterion is the position of the optimal individual, Worstfixss is the minimum fitness value, and Worstresidence is the position of the worst individual.

(4) And calculating the distances between all drosophila individuals and the optimal individual and the worst individual, if the distances between the drosophila individuals and the optimal individual are shorter than the distances between the drosophila individuals and the worst individual, dividing the drosophila individuals into subgroups with stronger searching capability, and otherwise, dividing the drosophila individuals into subgroups with poorer searching capability.

In the above formula, distance_bestIs the distance, X, between an individual and an optimal individual_bestlocationLocation, distance, for optimal individuals_worstIs the distance, X, between an individual and the worst individual_{worstlocation}The location of the worst individual.

(5) And the fruit fly subgroups with stronger searching capability and poorer searching capability are searched under the guidance of the optimal individual according to different radiuses respectively, and the positions are updated.

X_best＝X_bestlocation+R_best(Random-0.5)·π (20)

X_worst＝X_bestlocation+R_worst(Random-0.5)·π (21)

Wherein:

in the above formula, X_bestTo search for the position coordinates of an individual among the subgroups with stronger power,R_bestsearch radius for individuals belonging to a subgroup with stronger search power, X_worstFor searching for the position coordinate, R, of an individual among subgroups of poorer power_worstRepresenting the search radius, g, of individuals belonging to subgroups with poor search ability_iIndicates the current number of iterations, fitness_iIndicating the fitness value, fitness, of the current individual_i+1Representing the fitness value of the previous generation of individuals, m and n being constants.

(6) Calculating the taste concentration judgment values and fitness values of all drosophila individuals after the positions are updated, recording the positions and fitness values of new optimal and worst individuals, and if the fitness value of the optimal individual is smaller than the value of the previous generation, the position of the optimal individual is still extended to the position of the previous generation.

(7) And (5) entering an iterative process of the algorithm, repeating the steps (4) to (6), if the maximum iteration times is reached, finishing the algorithm, and outputting the taste concentration judgment value of the last generation of the optimal individual, namely the optimal solution of the objective function.

Thus, the steps of optimizing parameters of the SVM by IFOA are as follows:

step 3.1 initialize IFOA parameters including population size, maximum iteration number, initial search radius, initial position coordinates of fruit fly and values of m and n.

Step 3.2 calculate the taste concentration decision value for all individual drosophila species, which is also a two-dimensional array.

And 3.3, taking two elements in the taste concentration judgment value of the drosophila individual as C and sigma respectively, sequentially inputting the C and sigma into the SVM, training by using the preprocessed training set text, and then carrying out classification test on the SVM by using the preprocessed test set text. At this time, the target function of IFOA will be replaced with a classification accuracy function with C and sigma as arguments, i.e., P_precision(C, σ). Using a function P_precisionAnd calculating the classification accuracy of the SVM to be used as the fitness value of the individual, simultaneously selecting the individual with the maximum and minimum fitness values, namely the optimal individual and the worst individual, and recording the position and the fitness value of the individual.

And 3.4, calculating the distances between all drosophila individuals and the optimal individual and the worst individual, if the distances between the drosophila individuals and the optimal individual are shorter than the distances between the drosophila individuals and the worst individual, dividing the drosophila individuals into subgroups with stronger searching capability, and otherwise, dividing the drosophila subgroups into subgroups with poorer searching capability.

And 3.5, searching the drosophila subgroups with stronger searching capability and poorer searching capability under the guidance of the optimal individual according to respective radiuses, and updating the positions.

And 3.6, calculating the taste concentration judgment values of all the drosophila individuals after the positions are updated, sequentially inputting the taste concentration judgment values into the SVM as C and sigma, training and testing, and calculating the classification accuracy as a new individual fitness value. And then, recording the positions and the fitness values of the new optimal and worst individuals, and if the fitness value of the optimal individual is smaller than the value of the previous generation, the position of the optimal individual is still extended to the position of the previous generation.

And 3.7, entering an iterative process of the algorithm, repeating the steps 3.4 to 3.6, finishing the algorithm if the maximum iterative times is reached, and outputting the taste concentration judgment value of the last generation of the optimal individual as the optimal parameter of the SVM.

Step 4. model usage

And 4.1, preprocessing the complaint report text to be classified, wherein the preprocessing steps are the same as 1.2-1.4.

And 4.2, inputting the preprocessed complaint report text into the SVM which is optimized by the parameters.

And 4.3, acquiring the output of the SVM, namely the category to which the complaint report text belongs.

Advantageous effects

The invention combines IFOA with SVM, dynamically optimizes SVM parameter by using strong optimizing ability of IFOA, further enhances classification precision and adaptability, and is more suitable for the complicated classification problem of complaint report text classification.

Drawings

FIG. 1 is a diagram of the location of a text space vector in coordinates.

Fig. 2 is an optimal hyperplane for an SVM.

FIG. 3 is an input space mapping for an SVM.

FIG. 4 is a schematic representation of foraging principle of a Drosophila population.

FIG. 5 is a function P_precisionThe method of (3).

Fig. 6 is an embodiment of the present invention.

Detailed Description

The invention is further described below in connection with fig. 6. The examples are given to illustrate the invention and not to limit its scope of use, and any modifications within the scope of the claims will fall within the scope of the invention.

In the embodiment, an environmental pollution complaint report text is taken as a research object, a certain amount of effective data is obtained from a report platform, and the method is applied to the following steps:

step 1. text acquisition and preprocessing

D_i＝D(t₁,w₁；t₂,w₂；…；t_n,w_n),i∈N⁺,n∈N⁺(1)

If will (t)₁,t₂,…,t_n) Viewed as an n-dimensional coordinate system, weight (w)₁,w₂,…,w_n) The position of the space vector of the text in the coordinate system is shown in fig. 1 when viewed as corresponding coordinates.

P_i＝tf_ij×idf_i(2)

In the above formula, D'_iIs less than D_i。

Step 2.SVM training

The SVM is a machine learning classification algorithm based on an optimal hyperplane theory, wherein the optimal hyperplane is shown in FIG. 2.

The general form of a linear discriminant function is:

g(x)＝w^Tx+b＝0 (4)

the points that satisfy the linear discriminant function are:

y_i[w·x_i+b]-1≥0,i＝1,2,…,n (5)

The points that satisfy the linear discriminant function are:

y_i[w·x_i+b]-1≥0,i＝1,2,…,n (6)

therefore, introduce the Lagrange function:

at this point, a kernel function may be introduced to map the input space into a high-dimensional Hilbert space, as shown in fig. 3.

y_i[w·x_i+b]-1+≥0,i＝1,2,…,n (11)

Therefore, the SVM training steps are as follows:

and 2.1, determining the values of C and sigma.

2.1, determining the values of C and sigma.

2.2 inputting the preprocessed training set text into the SVM for training, and substituting C and sigma.

Step 3, SVM parameter optimization

The FOA is an optimization algorithm based on the foraging principle of drosophila, as shown in fig. 4. The invention provides an IFOA optimization algorithm aiming at the defects of FOA, which comprises the following calculation steps:

step 3.1 initialize IFOA parameters including population size p set to 20, maximum number of iterations g_maxSetting the initial search radius R to be 2, setting the initial phase angle coordinate X of the fruit fly to be a two-dimensional array, wherein the range of the array elements is [ -pi/4, pi/4]The value of m is 16, and the value of n is 32.

In the above formula, Random is a Random number between (0,1), and X is a position coordinate array of an individual.

In the above formula, S is a taste concentration determination array for an individual.

And 3.3, taking two elements in the taste concentration judgment value of the drosophila individual as C and sigma respectively, sequentially inputting the C and sigma into the SVM, training by using the preprocessed training set text, and then carrying out classification test on the SVM by using the preprocessed test set text. At this time, the target function of IFOA will be replaced with a classification accuracy function with C and sigma as arguments, i.e., P_precision(C, σ), a specific calculation method of this function is shown in fig. 5. Using a function P_precisionAnd calculating the classification accuracy of the SVM to be used as the fitness value of the individual, simultaneously selecting the individual with the maximum and minimum fitness values, namely the optimal individual and the worst individual, and recording the position and the fitness value of the individual.

fitness＝P_precision(C,σ) n＝(1,2,…,p) (15)

[bestfitness，bestlocation]＝max(fitness) (16)

[worstfitness，worstlocation]＝min(fitness) (17)

X[]_best＝X[]_bestlocation+R_best(Random-0.5)·π (20)

X[]_worst＝X[]_bestlocation+R_worst(Random-0.5)·π (21)

Wherein:

in the above formula, X_bestFor searching for the position coordinate, R, of a particular body among the subgroups of greater power_bestSearch radius for individuals belonging to a subgroup with stronger search power, X_worstFor searching for the position coordinate, R, of an individual among subgroups of poorer power_worstRepresenting the search radius, g, of individuals belonging to subgroups with poor search ability_iIndicates the current number of iterations, fitness_iIndicating the fitness value, fitness, of the current individual_i+1Representing the fitness value of the previous generation of individuals, m and n being constants.

3.7, entering an iterative process of the algorithm, repeating the steps 3.4 to 3.6, if the maximum iterative times is reached, finishing the algorithm, and outputting a taste concentration judgment value of the last generation of the optimal individual as the optimal parameter of the SVM.

Step 4. model usage

The classification result shows that the classification accuracy of the method on the environmental pollution complaint report text can reach 91.0%, the recall rate can also reach 90.3%, and the method can meet the practical application.

Claims

1. A complaint report text classification method based on SVM parameter optimization is characterized by comprising the following steps: the method comprises the following steps:

step 1: text acquisition and preprocessing:

step 1.1, obtaining a certain number of complaint report texts and categories thereof, ensuring that category labels are accurate, and then dividing the texts into a training set text and a test set text according to a certain proportion;

step 1.2, performing word segmentation and stop word removal on all texts by using a jieba toolkit of python;

D_i＝D(t₁,w₁；t₂,w₂；…；t_n,w_n),i∈N⁺,n∈N⁺(1)

in the above formula, D represents a space vector of a certain text, i represents a number of the certain text, t represents a sub-vector corresponding to a certain word in the certain text, w represents a weight of the sub-vector, and n is a label of the sub-vector;

P_i＝tf_ij×idf_i(2)

in the above formula, P_iRepresenting the comprehensive frequency of a certain word, tf representing the frequency of the word appearing in a document, idf representing the proportion of the document containing the word in the whole text set, i representing the label of the word, and j representing the label of the document; if P of the word is less than a certain numerical value, considering that the comprehensive frequency is low, and rejecting the word;

because the text is converted into a space vector form, the comprehensive frequency of a certain word, namely the comprehensive frequency of the sub-vector corresponding to the word in the whole space vector, achieves the effect of reducing the dimension by eliminating the sub-vector with low frequency; in step 1.3, D_iAfter dimension reduction, the method can be expressed as follows:

In the above formula, D'_iIs less than D_i；

Step 2.SVM training

The general form of a linear discriminant function is:

g(x)＝w^Tx+b＝0 (4)

the points that satisfy the linear discriminant function are:

y_i[w·x_i+b]-1≥0,i＝1,2,…,n (5)

in the above formula, w is a normal vector of the classification plane, b is an offset, and x is (x, y);

the points that satisfy the linear discriminant function are:

y_i[w·x_i+b]-1≥0,i＝1,2,…,n (6)

therefore, introduce the Lagrange function:

the problem is converted into a dual problem of solving the minimum value of the Lagrange function for w and b, namely convex quadratic programming; by solving the dual problem of the convex quadratic programming, a classification surface function can be finally obtained:

in the above formula, σ is a width parameter of the RBF, and represents control over the radial range; if the sigma is smaller than the minimum distance between all training samples, all samples are support vectors and can be classified correctly; on the contrary, all samples are classified into one class, so that the learning ability of the samples is lost;

after a kernel function is introduced, the problem of nonlinear classification is solved, but some linear inseparable problems still exist in a transformed sample space, and a linear discriminant function is difficult to satisfy; therefore, in order to make the classifier have a certain fault tolerance, a relaxation variable is introduced, so that the conditions are met as follows:

y_i[w·x_i+b]-1+≥0,i＝1,2,…,n (11)

in the above formula, the larger C is, the greater the punishment degree of the misclassification is, but the generalization ability of the classification is reduced; otherwise, the classification precision is reduced;

therefore, the SVM training steps are as follows:

step 2.1, determining the value sizes of C and sigma;

2.2, inputting the preprocessed training set text into the SVM for training, and substituting C and sigma;

step 3, SVM parameter optimization

FOA is an optimization algorithm based on the fruit fly foraging principle; the invention provides an IFOA optimization algorithm aiming at the defects of FOA, which comprises the following calculation steps:

(1) parameter of the initialization algorithm, i.e. maximum number of iterations g_maxThe population scale p, the initial search radius R and the initial position coordinate X of the drosophila individual;

X＝(Random-0.5)·π (13)

in the above formula, Random is a Random number between (0,1), and X is a position coordinate value of an individual;

S＝tan(X) (14)

in the above formula, S is a taste concentration determination value of an individual;

fitness＝f(S_n) n＝(1,2,…,p) (15)

[bestfitness，bestlocation]＝max(finess) (16)

[worstfitness，worstlocation]＝min(fitness) (17)

in the above formula, n is an individual label, the fitness value set of all individuals is fixed as the fixness, f (x) is an objective function, the fixness is the maximum fitness value, the bestlocation is the position of the optimal individual, the workfixness is the minimum fitness value, and the worklocation is the position of the worst individual;

(4) calculating the distances between all drosophila individuals and the optimal individual and the worst individual, if the distances between the drosophila individuals and the optimal individual are shorter than the distances between the drosophila individuals and the worst individual, dividing the drosophila individuals into subgroups with stronger searching capability, and otherwise, dividing the drosophila individuals into subgroups with poorer searching capability;

in the above formula, distance_bestIs the distance, X, between an individual and an optimal individual_bestlocationLocation, distance, for optimal individuals_worstIs the distance, X, between an individual and the worst individual_{worstlocation}The location of the worst individual;

(5) the fruit fly subgroups with stronger searching capability and poorer searching capability are respectively searched under the guidance of the optimal individual according to different radiuses, and the positions are updated;

X_best＝X_bestlocation+R_best(Random-0.5)·π (20)

X_worst＝X_bestlocation+R_worst(Random-0.5)·π (21)

wherein:

in the above formula, X_bestFor searching for the position coordinate, R, of a particular body among the subgroups of greater power_bestSearch radius for individuals belonging to a subgroup with stronger search power, X_worstFor searching for the position coordinate, R, of an individual among subgroups of poorer power_worstRepresenting the search radius, g, of individuals belonging to subgroups with poor search ability_iIndicates the current number of iterations, fitness_iIndicating the fitness value, fitness, of the current individual_i+1Representing the fitness value of the previous generation of individuals, wherein m and n are constants;

(6) calculating taste concentration judgment values and fitness values of all drosophila individuals after the positions are updated, recording the positions and fitness values of new optimal and worst individuals, and if the fitness value of the optimal individual is smaller than the value of the previous generation, the position of the optimal individual is still extended to the position of the previous generation;

(7) entering an iterative process of the algorithm, repeating the steps (4) to (6), if the maximum iterative times is reached, finishing the algorithm, and outputting a taste concentration judgment value of the last generation of the optimal individual, namely the optimal solution of the objective function;

thus, the steps of optimizing parameters of the SVM by IFOA are as follows:

step 3.1, initializing parameters of the IFOA, including population scale, maximum iteration times, initial search radius, initial position coordinates of the fruit flies and values of m and n;

step 3.2, calculating the taste concentration judgment values of all the drosophila individuals, wherein the values are also two-dimensional arrays;

step 3.3, using two elements in the taste concentration judgment value of the fruit fly individual as C and sigma, inputting the C and sigma into SVM in turn, and training by using the preprocessed training set textThen, carrying out classification test on the SVM by adopting the preprocessed test set text; at this time, the target function of IFOA will be replaced with a classification accuracy function with C and sigma as arguments, i.e., P_precision(C, σ); using a function P_precisionCalculating the classification accuracy of the SVM to be used as the fitness value of the individual, simultaneously selecting the individual with the maximum and minimum fitness values, namely the optimal individual and the worst individual, and recording the position and the fitness value of the individual;

step 3.4, calculating the distances between all drosophila individuals and the optimal individual and the worst individual, if the distances between the drosophila individuals and the optimal individual are shorter than the distances between the drosophila individuals and the worst individual, dividing the drosophila individuals into subgroups with stronger searching capability, otherwise, dividing the drosophila individuals into subgroups with poorer searching capability;

3.5, searching the drosophila subgroups with stronger searching ability and poorer searching ability under the guidance of the optimal individual according to respective radiuses, and updating positions;

step 3.6, calculating taste concentration judgment values of all drosophila individuals after the positions are updated, sequentially inputting the taste concentration judgment values as C and sigma into the SVM, training and testing, and calculating classification accuracy as a new individual fitness value; then, recording the positions and the fitness values of the new optimal and worst individuals, and if the fitness value of the optimal individual is smaller than the value of the previous generation, the position of the optimal individual is still extended to the position of the previous generation;

step 3.7, entering an iterative process of the algorithm, repeating the steps 3.4 to 3.6, if the maximum iterative times is reached, finishing the algorithm, and outputting a taste concentration judgment value of the last generation of the optimal individual as an optimal parameter of the SVM;

step 4. model usage

Step 4.1, preprocessing the complaint report text to be classified, wherein the preprocessing steps are the same as 1.2 to 1.4;

step 4.2, inputting the preprocessed complaint report text into the SVM which is optimized by the parameters;