CN111522750A

CN111522750A - Method and system for processing function test problem

Info

Publication number: CN111522750A
Application number: CN202010341538.4A
Authority: CN
Inventors: 李元菊
Original assignee: Bank of China Ltd
Current assignee: Bank of China Ltd
Priority date: 2020-04-27
Filing date: 2020-04-27
Publication date: 2020-08-11
Anticipated expiration: 2040-04-27
Also published as: CN111522750B

Abstract

The invention provides a method and a system for processing a function test problem, wherein the method comprises the following steps: preprocessing a first text to be processed including problem data of a function test problem to be processed to obtain a second text to be processed; converting the second text to be processed into a feature vector; and inputting the feature vector corresponding to the second text to be processed into a preset classification model for problem classification to obtain problem category information corresponding to the function test problem to be processed. In the scheme, the problem classification is carried out on the problem data of the function test problems to be processed by utilizing the classification model trained in advance, so that the problem category information corresponding to the function test problems to be processed is obtained, developers can carry out problem positioning on the function test problems to be processed according to the problem category information, a large amount of analysis time is saved, and the efficiency of processing the function test problems is improved.

Description

Method and system for processing function test problem

Technical Field

The invention relates to the technical field of data processing, in particular to a method and a system for processing a function test problem.

Background

With the development of the internet, various application systems are developed, and before the application systems are brought online, the application systems need to be subjected to functional testing and solve the functional testing problem during testing.

At present, when an application system is subjected to function test, a tester determines a function test problem, and then a developer analyzes and solves the function test problem determined by the tester. However, various functional test problems usually occur when the application system is subjected to functional test, and a large amount of time is required for developers to analyze and solve the functional test problems one by one, so that the efficiency of processing the functional test problems is low.

Disclosure of Invention

In view of this, embodiments of the present invention provide a method and a system for processing a functional test problem, so as to solve the problems of long processing time and low efficiency in the current method for processing a functional test problem.

In order to achieve the above purpose, the embodiments of the present invention provide the following technical solutions:

the first aspect of the embodiments of the present invention discloses a method for processing a functional test problem, where the method includes:

preprocessing a first text to be processed including problem data of a function test problem to be processed to obtain a second text to be processed;

converting the second text to be processed into a feature vector;

and inputting the feature vector corresponding to the second text to be processed into a preset classification model for problem classification to obtain problem category information corresponding to the function test problem to be processed, wherein the classification model is obtained by training a sample data set according to a support vector machine classification algorithm, and the sample data set is composed of problem data of a plurality of historical function test problems.

Preferably, the preprocessing the first text to be processed including the question data of the function test question to be processed to obtain the second text to be processed includes:

performing word segmentation on a first text to be processed including problem data of a function test problem to be processed, and performing stop word removal processing on the first text to be processed after word segmentation to obtain a second text to be processed.

Preferably, the converting the second text to be processed into a feature vector includes:

performing feature processing on the second text to be processed to obtain features corresponding to the second text to be processed;

and carrying out normalization processing and vectorization processing on the features corresponding to the second text to be processed to obtain a feature vector corresponding to the second text to be processed.

Preferably, the process of constructing the sample data set comprises:

for each historical functional test problem, performing category marking on a first sample text comprising problem data of the historical functional test problem to obtain a second sample text;

for each second sample text, preprocessing the second sample text to obtain a third sample text;

and converting each third sample text into a feature vector, and constructing a sample data set based on the feature vector of each third sample text.

Preferably, the process of training the sample data set to obtain the classification model according to the support vector machine classification algorithm includes:

extracting a seed data set and a test data set from the sample data set;

training the seed data set according to a support vector machine classification algorithm to obtain a training model;

classifying and predicting the feature vector of each third sample text in the test data set by using the training model to obtain a category probability value corresponding to the feature vector of each third sample text;

and if the training model does not meet the preset model training requirement according to the class probability value corresponding to the feature vector of each third sample text, adding the feature vector of the third sample text with the class probability value lower than the probability threshold value into the seed data set, returning to the step of training the seed data set according to the support vector machine classification algorithm until the training model meets the model training requirement, and determining the training model meeting the model training requirement as the classification model.

The second aspect of the embodiments of the present invention discloses a system for processing a functional test problem, where the system includes:

the system comprises a preprocessing unit, a processing unit and a processing unit, wherein the preprocessing unit is used for preprocessing a first text to be processed comprising problem data of a function test problem to be processed to obtain a second text to be processed;

the conversion unit is used for converting the second text to be processed into a feature vector;

and the processing unit is used for inputting the feature vector corresponding to the second text to be processed into a preset classification model for problem classification to obtain problem category information corresponding to the function test problem to be processed, wherein the classification model is obtained by training a sample data set according to a support vector machine classification algorithm, and the sample data set is composed of problem data of a plurality of historical function test problems.

Preferably, the preprocessing unit is specifically configured to: performing word segmentation on a first text to be processed including problem data of a function test problem to be processed, and performing stop word removal processing on the first text to be processed after word segmentation to obtain a second text to be processed.

Preferably, the conversion unit is specifically for: and performing feature processing on a second text to be processed to obtain features corresponding to the second text to be processed, and performing normalization processing and vectorization processing on the features corresponding to the second text to be processed to obtain feature vectors corresponding to the second text to be processed.

Preferably, the processing unit for constructing the sample data set comprises:

the labeling module is used for labeling the category of a first sample text comprising the problem data of the historical functional test problems aiming at each historical functional test problem to obtain a second sample text;

the preprocessing module is used for preprocessing each second sample text to obtain a third sample text;

and the construction module is used for converting each third sample text into a feature vector and constructing a sample data set based on the feature vector of each third sample text.

Preferably, the processing unit, configured to train a sample data set according to a support vector machine classification algorithm to obtain the classification model, includes:

an extraction module for extracting a seed data set and a test data set from the sample data set;

the training module is used for training the seed data set according to a support vector machine classification algorithm to obtain a training model;

the prediction module is used for performing classified prediction on the feature vector of each third sample text in the test data set by using the training model to obtain a category probability value corresponding to the feature vector of each third sample text;

and the processing module is used for adding the feature vectors of the third sample texts with the class probability values lower than the probability threshold value into the seed data set if the training model does not meet the preset model training requirement according to the class probability values corresponding to the feature vectors of each third sample text, returning to execute the training module until the training model meets the preset model training requirement, and determining the training model meeting the model training requirement as a classification model.

Based on the method and the system for processing the functional test problem provided by the embodiment of the invention, the method comprises the following steps: preprocessing a first text to be processed including problem data of a function test problem to be processed to obtain a second text to be processed; converting the second text to be processed into a feature vector; and inputting the feature vector corresponding to the second text to be processed into a preset classification model for problem classification to obtain problem category information corresponding to the function test problem to be processed. In the scheme, the problem classification is carried out on the problem data of the function test problems to be processed by utilizing the classification model trained in advance, so that the problem category information corresponding to the function test problems to be processed is obtained, developers can carry out problem positioning on the function test problems to be processed according to the problem category information, a large amount of analysis time is saved, and the efficiency of processing the function test problems is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.

FIG. 1 is a flow chart of a method for handling functional test problems according to an embodiment of the present invention;

FIG. 2 is a flowchart for constructing a sample data set according to an embodiment of the present invention;

FIG. 3 is a flowchart of constructing a classification model according to an embodiment of the present invention;

fig. 4 is a block diagram of a system for processing a functional test problem according to an embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

In this application, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

As can be seen from the background art, when performing a function test on an application system, a tester determines a function test problem first, and then the tester analyzes and solves the function test problem. However, various functional test problems usually occur when the application system is subjected to functional test, and a large amount of time is required for developers to analyze and solve the functional test problems one by one, so that the efficiency of processing the functional test problems is low.

Therefore, the embodiment of the invention provides a method and a system for processing functional test problems, which utilize a pre-trained classification model to classify the problem data of the functional test problems to be processed, so as to obtain the problem category information corresponding to the functional test problems to be processed, so that developers can perform problem positioning on the functional test problems to be processed according to the problem category information, thereby saving a large amount of analysis time and improving the efficiency of processing the functional test problems.

Referring to fig. 1, a flowchart of a processing method for a functional test problem according to an embodiment of the present invention is shown, where the processing method includes the following steps:

step S101: and preprocessing the first text to be processed including the problem data of the function test problem to be processed to obtain a second text to be processed.

It can be understood that, in the process of testing the application system, the problem data of each functional test problem to be processed determined in the testing process is acquired. The question data of each functional test question to be processed is formed into a first question to be processed, that is, a first text to be processed includes question data (one-to-one correspondence relationship) of a functional test question to be processed.

In the process of implementing step S101 specifically, for each first text to be processed including the question data of the function test question to be processed, the first text to be processed is subjected to word segmentation, for example, the first text to be processed is subjected to word segmentation by a chinese word segmentation tool.

And removing stop words from the first text to be processed after word segmentation, and removing punctuation, numbers, nonsense words and the like from the first text to be processed after word segmentation to obtain a second text to be processed.

Step S102: and converting the second text to be processed into a feature vector.

In the process of implementing step S102 specifically, a word vector algorithm (for example, TF-IDF algorithm) is used to perform feature processing on the second text to be processed, so as to obtain features corresponding to the second text to be processed.

And carrying out normalization processing and vectorization processing on the features corresponding to the second text to be processed to obtain a feature vector corresponding to the second text to be processed. That is to say, the feature of the second text to be processed is normalized, and the result obtained by the normalization processing is converted into a vector form, so as to obtain a feature vector corresponding to the second text to be processed.

Step S103: and inputting the feature vector corresponding to the second text to be processed into a preset classification model for problem classification to obtain problem category information corresponding to the function test problem to be processed.

It should be noted that problem data of a plurality of historical functional test problems corresponding to the test application system are collected in advance, and the historical functional test problems are the functional test problems which have been analyzed and solved.

Issue data for each historical functional test issue includes, but is not limited to: question list number, task number, function code, project name, product information, department to which the problem belongs, problem resolution group, error reason, modified program name, whether the problem is a public problem, problem analysis information, and problem resolution method.

It is understood that the problem resolution group refers to products where problems occur, such as: if a function test problem occurs in a counter front end system of a bank, namely domestic (BOCTS-PLT), the problem solution group is BOCTS-PLT.

And constructing a sample data set by utilizing the problem data of the plurality of historical function test problems, and training the sample data set according to a support vector machine classification algorithm to obtain a corresponding classification model.

It should be noted that in the process of constructing the sample data set, category labeling (problem labeling category) needs to be performed on the problem data of each historical functional test problem, and the problem categories include, but are not limited to: requirement analysis problems, coding problems, environmental problems, documentation errors, parameter configuration errors, and the like.

In the process of specifically implementing step S103, the feature vector corresponding to the second text to be processed is input into a preset classification model for problem classification, so as to obtain problem category information corresponding to the function test problem to be processed.

It can be understood that, according to the content of the above-mentioned class labeling for the problem data of each historical functional test problem, the classification model is used to determine the functional test problem to be processed for problem classification, the classification model outputs the problem class information of the functional test problem to be processed, and the problem class of the functional test problem to be processed can be determined according to the problem class information.

It should be noted that the problem category information output by the classification model may be a category probability value (confidence) that the function test problem to be processed is each problem category, and the problem category with the largest category probability value is the problem category of the function test problem to be processed.

Such as: assuming that the problem categories are a, b, c and d, the problem category information of the functional test problem to be processed output by the classification model is [ a is 90%, b is 20%, c is 10%, and d is 5% ], that is, the category probability value of the functional test problem to be processed is 90% for the problem category a, 20% for the problem category b, 10% for the problem category c, and 5% for the problem category d.

Similarly, the problem category information output by the classification model may also be a maximum category probability value among the category probability values of the function test problems to be processed as the respective problem categories, that is, the problem category corresponding to the maximum category probability value is the problem category of the function test problems to be processed.

Such as: assuming that the problem categories are a, b, c and d, wherein the probability of the problem category of the function test problem to be processed is a is the largest (90%), and the classification model outputs the problem category information of the function test problem to be processed, namely a is 90%.

It should be noted that, the specific content of the problem category information about the to-be-processed functional test problem output by the classification model is only for illustration and is not limited specifically herein.

It can be understood that the problem category information corresponding to each to-be-processed functional test problem can be obtained by inputting the feature vector corresponding to the second to-be-processed text corresponding to each to-be-processed functional test problem into the classification model. The developer can determine and position the problem category of each function test problem to be processed according to the problem category information corresponding to each function test problem to be processed, so that the time for analyzing the function test problems to be processed is reduced.

In the embodiment of the invention, problem data of a plurality of historical function test problems are collected in advance, a sample data set is constructed, and the sample data set is trained by using a support vector machine classification algorithm to obtain a classification model. After problem data of the functional test problems to be processed are converted into the feature vectors, the feature vectors are input into the classification model to perform problem classification, and problem category information corresponding to the functional test problems to be processed is obtained, so that developers can perform problem positioning on the functional test problems to be processed according to the problem category information, a large amount of analysis time is saved, and the efficiency of processing the functional test problems is improved.

The process of constructing a sample data set related to step S103 in the above embodiment of the present invention is shown in fig. 2, which is a flowchart of constructing a sample data set provided in the embodiment of the present invention, and includes the following steps:

step S201: and for each historical functional test problem, performing category marking on a first sample text comprising problem data of the historical functional test problem to obtain a second sample text.

It is understood that, according to the content of the question data regarding the historical functional test questions in the above step S103, the question data of each historical functional test question includes a plurality of items of information, and therefore, the question data of each historical functional test question is formed into a text (first sample text), that is, a first sample text includes question data of one historical functional test question (one-to-one correspondence).

Before the sample data set is trained by using the support vector machine classification algorithm, class labeling needs to be performed on data in the sample data set. Therefore, in the process of specifically implementing step S201, for a first sample text corresponding to each historical functional test question, according to the question data of the historical functional test question, performing category labeling on the first sample text to obtain a second sample text, and for the content of the category labeling, reference may be made to the content in step S103 in fig. 1 in the embodiment of the present invention, which is not described herein again.

Step S202: and for each second sample text, preprocessing the second sample text to obtain a third sample text.

In the process of specifically implementing step S202, for each second sample text, performing word segmentation on the second sample text, and performing stop word removal processing on the second sample text after word segmentation to obtain a corresponding third sample text.

Step S203: and converting each third sample text into a feature vector, and constructing a sample data set based on the feature vector of each third sample text.

In the process of specifically implementing step S203, for each third sample text, performing feature processing (TF-IDF algorithm) on the third sample text by using a word vector algorithm to obtain features corresponding to the third sample text, and performing normalization processing and vectorization processing on the features of the third sample text to obtain a feature vector corresponding to the third text to be processed.

And combining the feature vector of each third sample text to construct a sample data set, namely, the sample data set comprises the feature vector of the third sample text corresponding to each historical functional test problem.

In the embodiment of the invention, the pre-collected problem data of a plurality of historical functional test problems are subjected to category marking and pre-processing and are converted into the feature vectors, and the sample data set is constructed based on the feature vectors corresponding to the plurality of historical functional test problems. Training a sample data set according to a support vector machine classification algorithm to obtain a classification model, and determining problem category information corresponding to the function test problem to be processed by using the classification model, so that a developer can perform problem positioning on the function test problem to be processed according to the problem category information, a large amount of analysis time is saved, and the efficiency of processing the function test problem is improved.

The process of building a classification model related to step S103 in fig. 1 in the above embodiment of the present invention is shown in fig. 3, which is a flowchart of building a classification model provided in the embodiment of the present invention, and includes the following steps:

step S301: a seed data set and a test data set are extracted from the sample data set.

In the process of implementing step S301, a seed data set (corresponding to training data) for model training is extracted from the sample data set, and a test data set for testing is extracted from the sample data set.

It should be noted that, when the seed data set is first (initially) constructed, the size of the seed data set may be set according to actual conditions.

Step S302: and training the seed data set according to a classification algorithm of the support vector machine to obtain a training model.

In the process of implementing step S302 specifically, a support vector machine classification algorithm is used to perform model training on the seed data set to obtain each parameter of the model, and a corresponding training model can be formed according to each obtained parameter.

Step S303: and classifying and predicting the feature vector of each third sample text in the test data set by using the training model to obtain a category probability value corresponding to the feature vector of each third sample text.

In the process of specifically implementing step S304, the training model is tested by using the test data set, and the specific manner is as follows: and classifying and predicting the feature vector of each third sample text in the test data set by using the training model to obtain a class probability value (confidence) of the feature vector of each third sample text in the test data set, namely obtaining the class probability value of the historical functional test problem corresponding to the feature vector of each third sample text in the test data set.

Step S304: and determining whether the training model meets the preset model training requirement or not according to the class probability value corresponding to the feature vector of each third sample text, executing step S306 if the training model meets the preset model training requirement, and executing step S305 if the training model does not meet the preset model training requirement.

It can be understood that, for the training model obtained in step S302, the corresponding model training requirement is set according to the actual requirement, that is, the training model obtained in step S302 needs to meet the model training requirement.

In the process of implementing step S304, the result of testing the training model using the test data set (the execution result of step S303) is used to determine whether the training model meets the preset model training requirement, and if the training model meets the preset model training requirement, the training model meeting the model training requirement is determined to be the classification model.

If the training model does not meet the preset model training requirement, step S305 is executed.

Step S305: and adding the feature vector of the third sample text with the class probability value lower than the probability threshold value into the seed data set, and returning to execute the step S302.

In the process of implementing the step S305 specifically, a probability threshold is preset, the feature vector of the third sample text with the class probability value lower than the probability threshold in the test data set is added to the seed data set, the step S302 is returned to, and the seed data set is trained by using the support vector machine classification algorithm continuously until the training model meets the preset model training requirement.

By the method, the training scale of model training is increased, so that the problem of rare training data of a few classes is solved, and the decision space range of the classifier of the support vector machine classification algorithm is expanded.

Step S306: and determining the training model meeting the model training requirement as a classification model.

In the embodiment of the invention, a seed data set is trained by using a support vector machine classification algorithm to obtain a training model. And testing the training model by using the test data set, if the training model is determined to be not in accordance with the preset model training requirement according to the test result, adding the feature vector of the third sample text with the class probability value lower than the probability threshold value into the seed data set, returning to the step of training the seed data set until the training model is in accordance with the model training requirement, determining the training model in accordance with the model training requirement as a classification model, and improving the classification accuracy of the classification model.

Corresponding to the method for processing a functional test problem provided in the foregoing embodiment of the present invention, referring to fig. 4, an embodiment of the present invention further provides a structural block diagram of a system for processing a functional test problem, where the system includes: a pretreatment unit 401, a conversion unit 402, and a processing unit 403;

the preprocessing unit 401 is configured to preprocess a first text to be processed that includes question data of a function test question to be processed, to obtain a second text to be processed.

In a specific implementation, the preprocessing unit 401 is specifically configured to: performing word segmentation on a first text to be processed including problem data of a function test problem to be processed, and performing stop word removal processing on the first text to be processed after word segmentation to obtain a second text to be processed.

A converting unit 402, configured to convert the second text to be processed into a feature vector.

In a specific implementation, the conversion unit 402 is specifically configured to: and performing feature processing on the second text to be processed to obtain features corresponding to the second text to be processed, and performing normalization processing and vectorization processing on the features corresponding to the second text to be processed to obtain feature vectors corresponding to the second text to be processed.

The processing unit 403 is configured to input the feature vector corresponding to the second text to be processed into a preset classification model for problem classification, so as to obtain problem category information corresponding to the functional test problem to be processed, where the classification model is obtained by training a sample data set according to a support vector machine classification algorithm, and the sample data set is composed of problem data of a plurality of historical functional test problems.

Preferably, in conjunction with the content in fig. 4, the processing unit 403 for constructing the sample data set includes a labeling module, a preprocessing module and a constructing module, and the execution principle of each module is as follows:

and the marking module is used for marking the first sample text of the question data including the historical functional test questions according to the category of each historical functional test question to obtain a second sample text.

And the preprocessing module is used for preprocessing the second sample texts to obtain third sample texts aiming at each second sample text.

Preferably, with reference to fig. 4, the processing unit 403 for training the sample data set according to the support vector machine classification algorithm to obtain a classification model includes: the system comprises an extraction module, a training module, a prediction module and a processing module, wherein the execution principle of each module is as follows:

and the extraction module is used for extracting the seed data set and the test data set from the sample data set.

And the training module is used for training the seed data set according to the support vector machine classification algorithm to obtain a training model.

And the prediction module is used for performing classified prediction on the feature vector of each third sample text in the test data set by using the training model to obtain a category probability value corresponding to the feature vector of each third sample text.

And the processing module is used for adding the feature vector of the third sample text with the class probability value lower than the probability threshold value into the seed data set if the training model does not meet the preset model training requirement, returning to the execution training module until the training model meets the preset model training requirement, and determining the training model meeting the model training requirement as the classification model.

In summary, the embodiments of the present invention provide a method and a system for processing a functional test problem, where the method includes: preprocessing a first text to be processed including problem data of a function test problem to be processed to obtain a second text to be processed; converting the second text to be processed into a feature vector; and inputting the feature vector corresponding to the second text to be processed into a preset classification model for problem classification to obtain problem category information corresponding to the function test problem to be processed. In the scheme, the problem classification is carried out on the problem data of the function test problems to be processed by utilizing the classification model trained in advance, so that the problem category information corresponding to the function test problems to be processed is obtained, developers can carry out problem positioning on the function test problems to be processed according to the problem category information, a large amount of analysis time is saved, and the efficiency of processing the function test problems is improved.

The embodiments in the present specification are described in a progressive manner, and the same and similar parts among the embodiments are referred to each other, and each embodiment focuses on the differences from the other embodiments. In particular, the system or system embodiments are substantially similar to the method embodiments and therefore are described in a relatively simple manner, and reference may be made to some of the descriptions of the method embodiments for related points. The above-described system and system embodiments are only illustrative, wherein the units described as separate parts may or may not be physically separate, and the parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Those of skill would further appreciate that the various illustrative elements and algorithm steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, computer software, or combinations of both, and that the various illustrative components and steps have been described above generally in terms of their functionality in order to clearly illustrate this interchangeability of hardware and software. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present invention.

The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims

1. A method for handling functional test problems, the method comprising:

converting the second text to be processed into a feature vector;

2. The method according to claim 1, wherein the preprocessing a first text to be processed including question data of a functional test question to be processed to obtain a second text to be processed comprises:

3. The method of claim 1, wherein converting the second to-be-processed text into a feature vector comprises:

4. The method of claim 1, wherein constructing the sample data set comprises:

5. The method of claim 4, wherein the process of training the sample data set to obtain the classification model according to the support vector machine classification algorithm comprises:

extracting a seed data set and a test data set from the sample data set;

6. A system for handling functional test problems, the system comprising:

7. The system of claim 6, wherein the preprocessing unit is specifically configured to: performing word segmentation on a first text to be processed including problem data of a function test problem to be processed, and performing stop word removal processing on the first text to be processed after word segmentation to obtain a second text to be processed.

8. The system according to claim 6, wherein the conversion unit is specifically configured to: and performing feature processing on a second text to be processed to obtain features corresponding to the second text to be processed, and performing normalization processing and vectorization processing on the features corresponding to the second text to be processed to obtain feature vectors corresponding to the second text to be processed.

9. The system according to claim 6, wherein the processing unit for constructing the sample data set comprises:

10. The system according to claim 9, wherein the processing unit configured to train a sample data set to obtain the classification model according to a support vector machine classification algorithm comprises: