CN111125359A - Text information classification method, device and equipment - Google Patents

Text information classification method, device and equipment Download PDF

Info

Publication number
CN111125359A
CN111125359A CN201911302877.5A CN201911302877A CN111125359A CN 111125359 A CN111125359 A CN 111125359A CN 201911302877 A CN201911302877 A CN 201911302877A CN 111125359 A CN111125359 A CN 111125359A
Authority
CN
China
Prior art keywords
target
classification
classified
text information
sample information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911302877.5A
Other languages
Chinese (zh)
Other versions
CN111125359B (en
Inventor
陈建华
崔朝辉
赵立军
张霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Neusoft Corp
Original Assignee
Neusoft Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Neusoft Corp filed Critical Neusoft Corp
Priority to CN201911302877.5A priority Critical patent/CN111125359B/en
Publication of CN111125359A publication Critical patent/CN111125359A/en
Application granted granted Critical
Publication of CN111125359B publication Critical patent/CN111125359B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the application discloses a method, a device and equipment for classifying text information, wherein when the text information to be classified is classified by using a target classification model, whether the classification result output by the target classification model is credible is determined by using the distance between a target characteristic vector corresponding to the text information to be classified and an optimal hyperplane in the target classification model, if the distance between the target characteristic vector and the optimal hyperplane is greater than or equal to a first threshold value, the classification result is credible, and the classification result is used as the classification result of the text information to be classified; if the distance from the target characteristic vector to the optimal hyperplane is smaller than a first threshold value, the classification result is not credible, the target characteristic vector is shifted until the shifted target characteristic vector meets a preset condition, then the shifted target characteristic vector is input into a target classification model, the classification result output by the target classification model is used as the classification result of the text information to be classified, noise generated in classification is eliminated, and classification accuracy is improved.

Description

Text information classification method, device and equipment
Technical Field
The present application relates to the field of information processing technologies, and in particular, to a method, an apparatus, and a device for text information classification.
Background
Machine learning is a method based on characterization learning of data, and has been widely used in recent years, for example, in image classification and text classification. In practical application, a classification learning model with high accuracy can be trained through training samples. However, in the training process, due to the limitation of the application environment, the obtained training samples are limited, so that the generalization capability of the model cannot be improved, and the accuracy of the data classification result is affected.
Disclosure of Invention
In view of this, embodiments of the present application provide a method, an apparatus, and a device for classifying text messages, so as to improve accuracy of text message classification.
In order to solve the above problem, the technical solution provided by the embodiment of the present application is as follows:
a method of classifying textual information, the method comprising:
converting text information to be classified into target feature vectors;
inputting the target feature vector into a target classification model, wherein the target classification model comprises an optimal hyperplane for distinguishing first sample information and second sample information, and the optimal hyperplane is obtained by training according to the distribution of the feature vector of the first sample information and the distribution of the feature vector of the second sample information;
obtaining the distance between the target characteristic vector and the optimal hyperplane;
when the distance between the target feature vector and the optimal hyperplane is larger than or equal to a first threshold value, acquiring a first classification result output by the target classification model as a classification result of the text information to be classified;
when the distance between the target characteristic vector and the optimal hyperplane is smaller than a first threshold value, shifting the target characteristic vector according to the distribution of the characteristic vector of the first sample information and the distribution of the characteristic vector of the second sample information;
and inputting the shifted target feature vector into the target classification model, and acquiring a second classification result output by the target classification model as a classification result of the text information to be classified.
In a possible implementation manner, the shifting the target feature vector according to the distribution of the feature vector of the first sample information and the distribution of the feature vector of the second sample information includes:
shifting the target characteristic vector to a target position, wherein the target position is a position corresponding to the mean value of the characteristic vector of each first sample information and the characteristic vector of the second sample information in a preset neighborhood of the current position of the target characteristic vector;
repeatedly executing the drifting of the target feature vector to the first target position until the distance between the drifted target feature vector and the optimal hyperplane is larger than or equal to a first threshold or the difference value between the current position of the target feature vector and the target position is smaller than a second threshold.
In a possible implementation manner, the converting the text information to be classified into the target feature vector includes:
acquiring text information to be classified;
performing word segmentation on the text information to be classified;
converting each word segmentation into word features, and forming target feature vectors by the word features.
In a possible implementation manner, when the number of the second sample information is greater than the number of the first sample information, the method further includes:
dividing the second sample information into a plurality of second sample information sets;
and respectively training and generating a plurality of classification models according to the first sample information and the second sample information in each second sample information set.
In one possible implementation, the method further includes:
respectively determining the classification models as target classification models, and executing the steps of converting the text information to be classified into target characteristic vectors and the subsequent steps;
and determining a final classification result of the text information to be classified according to the obtained multiple classification results of the text information to be classified.
In a possible implementation manner, the determining a final classification result of the text information to be classified according to the obtained multiple classification results of the text information to be classified includes:
determining the most classified result in the obtained multiple classified results of the text information to be classified as the final classified result of the text information to be classified.
In a possible implementation manner, the first sample information is service question text information, and the second sample information is chat question text information.
A text information classification apparatus, the apparatus comprising:
the conversion unit is used for converting the text information to be classified into a target characteristic vector;
an input unit, configured to input the target feature vector into a target classification model, where the target classification model includes an optimal hyperplane used for distinguishing first sample information from second sample information, and the optimal hyperplane is obtained by training according to distribution of feature vectors of the first sample information and distribution of feature vectors of the second sample information;
the obtaining unit is used for obtaining the distance between the target characteristic vector and the optimal hyperplane;
a first determining unit, configured to, when a distance between the target feature vector and the optimal hyperplane is greater than or equal to a first threshold, obtain a first classification result output by the target classification model as a classification result of the to-be-classified text information;
a drifting unit, configured to, when a distance between the target feature vector and the optimal hyperplane is smaller than a first threshold, drift the target feature vector according to a distribution of feature vectors of the first sample information and a distribution of feature vectors of the second sample information;
and the second determining unit is used for inputting the shifted target feature vector into the target classification model and acquiring a second classification result output by the target classification model as a classification result of the text information to be classified.
A computer-readable storage medium having stored therein instructions that, when run on a terminal device, cause the terminal device to execute the text information classification method.
An apparatus for implementing classification of textual information, comprising: the text information classification method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the computer program, the text information classification method is realized.
Therefore, the embodiment of the application has the following beneficial effects:
the method and the device for classifying the text information firstly acquire the text information to be classified and convert the text information to be classified into the target characteristic vector. And then, inputting the target feature vector into the target classification model to obtain the distance between the target feature vector and the optimal hyperplane in the target classification model. The optimal hyperplane is obtained by training the distribution of the feature vectors of the first sample information and the distribution of the feature vectors of the second sample information, and can be used for distinguishing the first sample information from the second sample information. And then judging the relation between the distance from the target feature vector to the optimal hyperplane and a first threshold, if the distance is larger than or equal to the first threshold, the target feature vector can definitely represent the attribute of the text information to be classified, and taking a first classification result output by the target classification model as the classification result of the text information to be classified. And if the target characteristic vector is smaller than the first threshold value, which indicates that the target characteristic vector can not clearly represent the attribute of the text information to be classified, shifting the target characteristic vector according to the characteristic vector of the first sample information and the characteristic vector of the second sample information. And inputting the shifted target feature vector into a target classification model, and taking a second classification result output by the target classification model as a classification result of the text information to be classified.
That is, in the embodiment of the application, when the target classification model is used for classifying the text information to be classified, whether the classification result output by the target classification model is credible is determined by using the distance between the target feature vector corresponding to the text information to be classified and the optimal hyperplane in the target classification model, if the distance between the target feature vector and the optimal hyperplane is greater than or equal to the first threshold, the classification result is credible, and the classification result is used as the classification result of the text information to be classified; if the distance from the target characteristic vector to the optimal hyperplane is smaller than a first threshold value, the classification result is not credible, the target characteristic vector is shifted until the position of the target characteristic vector after shifting meets a preset condition, then the shifted target characteristic vector is input into a target classification model, the classification result output by the target classification model is used as the classification result of the text information to be classified, noise generated during classification is eliminated, and the accuracy of the classification result is improved.
Drawings
Fig. 1 is a flowchart of a text information classification method according to an embodiment of the present application;
fig. 2 is a schematic diagram of a two-dimensional spatial classification line according to an embodiment of the present application;
fig. 3 is a schematic diagram illustrating a target eigenvector drift according to an embodiment of the present application;
fig. 4 is a schematic diagram of sample information partitioning according to an embodiment of the present disclosure;
fig. 5 is a structural diagram of a text information classification device according to an embodiment of the present application.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present application more comprehensible, embodiments accompanying the drawings are described in detail below.
In order to facilitate understanding of the technical solutions provided in the embodiments of the present application, the following description will first discuss the background art related to the present application.
Generally, in the process of training a classification learning model, the method is limited by an application environment, and due to the fact that the obtained training samples are limited, the generalization capability of the model cannot be improved, and the accuracy of a data classification result is affected. Especially in the intelligent question-answering system, the text information input by the user needs to be classified firstly, and if the classification result is inaccurate, the use experience of the user is greatly reduced.
Specifically, the intelligent question answering is an important application in the field of artificial intelligence, compared with a traditional customer service system, the intelligent question answering has the advantages of high efficiency, low cost and the like, and more enterprises use the intelligent question answering system to provide conversation services for users at present. In many open question-answering systems (such as a question-answering robot provided in a government affairs service hall), generally after a user inputs text information, a service-type answer or a chat-type answer may be provided for the text information. In order to improve the user experience of the question-answering system, the text information input by the user is often required to be classified, and when the classification result is the chat text information, the question-answering system gives an answer to the chat; and if the classification result is the service text information, the question-answering system gives an answer to the service class.
However, since the data volume of the open internet chat corpus is very large, and the business problem corpus in the whole system is relatively small, the trained classification model has a weak ability of identifying the business problem classification, that is, the classification result of the classification model is not accurate. In this case, if the user proposes a chat-type question, the classification result output by the classification model is a service type, and when the question-answering system gives a professional answer, poor use experience cannot be brought to the user generally; if the user proposes the service type question and the classification result output by the classification model is the chat type, the use experience of the user is greatly reduced when the question answering system gives the answer of the chat type.
Based on this, the embodiment of the application provides a text information classification method, and for text information to be classified, a target feature vector of the text information to be classified is obtained. And inputting the target characteristic vector into a target classification model to obtain a classification result of the text information to be classified, and meanwhile, obtaining the distance between the target characteristic vector and the optimal hyperplane. Then, judging whether the distance between the target feature vector and the optimal hyperplane is smaller than a first threshold value or not, if not, indicating that a first classification result output by the target classification model is credible, and taking the first classification result as a classification result of the text information to be classified; if the first classification result is smaller than the second classification result, the first classification result output by the target classification model is not credible, the target characteristic vector is subjected to drifting according to the distribution of the characteristic vector of the information in the first sample and the distribution of the characteristic vector of the information in the second sample, so that the drifted target characteristic vector can represent the text information to be classified, the drifted target characteristic vector is input into the target classification model, and the second classification result output by the target classification model is used as the classification result of the text information to be classified. That is, the embodiment of the present application improves the accuracy of the classification result by using the optimal hyperplane and the drift processing.
In order to facilitate understanding of the scheme provided by the embodiment of the present application, the text information classification method provided by the present application will be described below with reference to the accompanying drawings.
Referring to fig. 1, which is a flowchart of a text information classification method provided in an embodiment of the present application, as shown in fig. 1, the method may include:
s101: and converting the text information to be classified into a target feature vector.
In this embodiment, for the obtained text information to be classified, the text information to be classified is converted into a target feature vector. In a specific implementation, converting text information to be classified into a target feature vector may be: acquiring text information to be classified, and segmenting words of the text information to be classified; and converting each word segmentation into word features, and forming a target feature vector by using each word feature. The method comprises the steps of firstly carrying out word segmentation on text information to be classified, then obtaining word features corresponding to each word segmentation by utilizing a machine learning technology, and determining the word features corresponding to all the words as target feature vectors corresponding to the text information to be classified. The word feature may include a Term Frequency (TF), or a Term Frequency-Inverse file Frequency (TF-IDF), and the like. Specifically, word features of each participle may be obtained using the word2vector model.
In practical application, the text information input by the user can be classified, and the text information to be classified can be text information directly input by the user through the terminal device or text information obtained by converting voice information input by the user through the terminal device.
S102: and inputting the target feature vector into a target classification model.
S103: and obtaining the distance between the target characteristic vector and the optimal hyperplane.
And after the target feature vector is extracted, inputting the target feature vector into a target classification model to obtain a classification result output by the target classification model. The target classification model comprises an optimal hyperplane for distinguishing the first sample information from the second sample information, the optimal hyperplane is obtained by training according to the distribution of the feature vectors of the first sample information and the distribution of the feature vectors of the second sample information, and the optimal hyperplane can accurately separate the first sample information and the second sample information.
For ease of understanding, a two-dimensional space is used as an example for illustration. For example, as shown in fig. 2, square points and circular points represent feature vectors of two types of samples, and the goal of training the target classification model is to find a classification line so that the two types of samples are separated, and the distance from the classification line to the nearest sample point on one side is equal to the distance from the classification line to the nearest sample point on the other side, and the two lines on the two sides are parallel. If A is the optimal classification line, A1 and A2 are the straight lines parallel to the classification line and the closest sample to the classification line in each class, respectively. When the method is popularized to a high-dimensional space, the Optimal classification line becomes an Optimal classification surface, and the Optimal classification surface (Optimal Hyperplane) not only can correctly separate two types, but also can maximize the classification interval.
In practical application, the first sample information may be service problem text information, the second sample information may be chat problem text information, and the target classification model may be used to identify whether the text information to be classified is the service problem text information or the chat problem text information.
Meanwhile, after the target feature vector is input into the target classification model, the distance between the target feature vector and the optimal hyperplane can be obtained. In specific implementation, when the target feature vector is input into the target classification model, the distance between the target feature vector and the optimal hyperplane can be directly calculated by the target classification model, or after the target feature vector is obtained, the distance between the target feature vector and the optimal hyperplane can be calculated by other modes. Specifically, the distance between the target classification model and the optimal hyperplane is calculated by the following formula:
Figure BDA0002322299350000081
wherein r is the distance length of the high-dimensional space, g (X) is a calculation formula of the optimal hyperplane, g (X) >0 represents that the target feature vector X is on the left side of the optimal hyperplane, g (X) <0 represents that the target feature vector X is on the right side of the hyperplane, and g (X) <0 represents that the target feature vector X is on the optimal hyperplane; w represents the normal vector of the optimal hyperplane (obtained when the target classification model is trained), w1.. wn represents the vector value of each dimension of W; x denotes the target feature vector, X1 … xn denotes the vector value for each dimension of X, and b denotes the model parameters (obtained when the target classification model is trained).
After the distance r between the target feature vector and the optimal hyperplane is obtained through the formula, comparing the relation between the distance r and a first threshold, and if the distance r is larger than or equal to the first threshold, executing S104; if the distance r is less than the first threshold, S105 is performed.
S104: and acquiring a first classification result output by the target classification model as a classification result of the text information to be classified.
In this embodiment, when the distance between the target feature vector and the optimal hyperplane is greater than or equal to the first threshold, it indicates that the target feature vector may represent the attribute of the text information to be classified, that is, the output classification result of the target classification model is trusted, and the first classification result output by the target classification model is directly used as the classification result of the text information to be classified.
S105: and shifting the target characteristic vector according to the distribution of the characteristic vector of the first sample information and the distribution of the characteristic vector of the second sample information.
When the distance between the target feature vector and the optimal hyperplane is smaller than a first threshold value, the target feature vector cannot clearly represent the attribute of the text information to be classified, the classification result output by the target classification model is fuzzy and unreliable, and the target feature vector needs to be shifted according to the distribution of the feature vectors of the first sample information and the distribution of the feature vectors of the second sample information, so that the shifted target feature vector can represent the attribute of the text information to be classified.
Specifically, the target feature vector may be shifted by: shifting the target characteristic vector to a target position, wherein the target position is a position corresponding to the mean value of the characteristic vector of each first sample information and the characteristic vector of the second sample information in a preset neighborhood of the current position of the target characteristic vector; and repeatedly shifting the target characteristic vector to the first target position until the distance between the shifted target characteristic vector and the optimal hyperplane is greater than or equal to a first threshold or the difference between the current position of the target characteristic vector and the target position is less than a second threshold. The first target position is a position corresponding to the mean value of the feature vector of each first sample information and the feature vector of the second sample information in a preset neighborhood of the current position of the target feature vector after drift. The specific implementation process can be as follows:
1) and shifting the target feature vector to the target position.
In this embodiment, when the current position of the target feature vector is determined, the feature vector of each first sample information and the feature vector of the second sample information in the preset neighborhood with the current position of the target feature vector as the center of a circle may be obtained, and a position corresponding to the mean value of the feature vector of each first sample information and the feature vector of the second sample information is used as the target position. Then, the target feature vector is shifted to the target position.
2) And judging whether the distance between the shifted target characteristic vector and the optimal hyperplane is greater than or equal to a first threshold or whether the difference value between the current position of the shifted target characteristic vector and the new target position is less than a second threshold.
After the target feature vector is shifted to the target position, the feature vector at the target position (the mean value of the feature vectors of the first sample information and the second sample information in the preset neighborhood) is used as the shifted target feature vector. Then, whether the distance between the shifted target feature vector and the optimal hyperplane is larger than or equal to a first threshold value and whether the difference value between the current position of the shifted target feature vector and the target position is smaller than a second threshold value is judged. If any condition is met, the target characteristic vector after drifting can clearly represent the attribute of the text information to be classified, and then drifting is stopped; and if the two conditions are not met, the target characteristic vector after the drift still cannot clearly represent the attribute of the text information to be classified, and the drift is continued.
3) And if the shifted target feature vector is smaller than a first threshold value or the difference value between the current position of the shifted target feature vector and the target position is larger than a second threshold value, determining the position corresponding to the mean value of the feature vector of each piece of first sample information and the feature vector of the second sample information in a preset neighborhood of the current position of the shifted target feature vector as the target position, and repeatedly executing the step of shifting the target feature vector to the target position.
And if the target characteristic vector after the drift does not meet the condition, determining the position corresponding to the mean value of the characteristic vector of each piece of first sample information and the characteristic vector of the second sample information in the preset field of the current position of the target characteristic vector after the drift as the target position, and continuing to drift.
In this embodiment, the target feature vector is shifted by a mean shift method, that is, the shift mean (the mean of the feature vector of the first sample information and the feature vector of the second sample information) of the current point (target feature vector) is calculated first, the point is moved to the shift mean, and then the point is used as a new starting point to continue moving until the final condition is satisfied. Specifically, the iteration can be performed using the following formula:
Figure BDA0002322299350000111
where x denotes a reference point in the high-dimensional vector space, i.e., a target feature vector, and xi denotes other points in the high-dimensional sphere neighborhood centered around x (i.e., the feature vector of each of the first sample information and the feature vector of the second sample information in the neighborhood). Assuming that there are k such points, Mh is obtained, x is shifted to the position where Mh is located, and iteration is repeated until the shifted target feature vector converges.
For convenience of understanding, referring to fig. 3, black circles represent target feature vectors, the feature vectors of the first sample information and the feature vectors of the second sample information within a preset radius are obtained by taking the position of the black circle as a center of a circle, an average value a of the feature vectors of each first sample information and the feature vectors of the second sample information within the circle is calculated, the average value is represented by a triangle, and the position of the triangle is determined as a target position. And the black circle is shifted to the position of the triangle, and the target feature vector is a. Judging whether the distance from the target characteristic vector to the optimal hyperplane is larger than or equal to a first threshold value or not, if so, not drifting any more; if not, acquiring the feature vector of each first sample information and the feature direction of the second sample information in the preset neighborhood of the position of the triangle, calculating the mean value b of the feature vector of each first sample information and the feature vector of the second sample information, taking the position of the mean value b as a first target position, and shifting the target feature vector a to the position of the feature vector b.
Or, judging whether the difference value between the current position of the target characteristic vector a and the target position (the position of the characteristic vector b) is smaller than a second threshold value, and if so, stopping drifting; otherwise, continuing to drift the target feature vector a to the position of the feature vector b.
S106: and inputting the shifted target feature vector into a target classification model, and acquiring a second classification result output by the target classification model as a classification result of the text information to be classified.
And when the shifted target feature vector meets the conditions, inputting the shifted target feature vector into a target classification model, and taking a second classification result output by the target classification model as a classification result of the text information to be classified.
It should be noted that, shifting the target feature vector is not a real movement, but the target feature vector and the feature vector corresponding to the convergence position are labeled as the same class, so that the feature vector at the convergence position is input into the target classification model, and the classification result output by the target classification model is used as the classification result of the text information to be classified.
Based on the above description, in the embodiment of the application, when the target classification model is used for classifying the text information to be classified, whether the classification result output by the target classification model is credible is determined by using the distance between the target feature vector corresponding to the text information to be classified and the optimal hyperplane in the target classification model, and if the distance between the target feature vector and the optimal hyperplane is greater than or equal to the first threshold, the classification result is credible, and the classification result is used as the text information to be classified; if the distance from the target characteristic vector to the optimal hyperplane is smaller than a first threshold value, the classification result is not credible, the target characteristic vector is shifted until the position of the target characteristic vector after shifting meets a preset condition, then the shifted target characteristic vector is input into a target classification model, the classification result output by the target classification model is used as the classification result of the text information to be classified, noise generated during classification is eliminated, and the accuracy of the classification result is improved.
In practical application, feature vectors of service problem text information are relatively concentrated, feature vectors of chat problem text information are relatively dispersed, and when target feature vectors need to be drifted, the target feature vectors with relatively fuzzy classification boundaries are more likely to be classified into service categories. For the intelligent question-answering system, if the user inputs a chat question text message, the question-answering system gives an answer in the service field, and the user can tolerate the scene, but if the user inputs a service question text message, the question-answering system gives a chat answer, and the tolerance of the user to the scene is very low. Therefore, when the problem text information is classified based on the classification method provided by the embodiment of the application, the user requirements can be better met.
It can be understood that, in this embodiment, the target classification model can be obtained by training using the first sample information and the second sample information, so that the target classification model can identify whether the classification result of the text information to be classified is the classification result corresponding to the first sample information or the classification result corresponding to the second sample information. If the number of the second sample information is greater than the number of the first sample information during training, the trained target classification model has a very weak text recognition capability of the first sample information class. In order to solve the above problem, an embodiment of the present application provides a method for generating a target classification model, which specifically includes:
1) the second sample information is divided into a plurality of second sample information sets.
2) And respectively training and generating a plurality of classification models according to the first sample information and the second sample information in each second sample information set.
That is, when the number of the second sample information is much larger than that of the first sample information, the second sample information is split to obtain a plurality of second sample information sets. And then, fusing the first sample information with each second sample information set, and training by using the fused first sample information and second sample information to obtain a plurality of classification models.
In a specific implementation, the second sample information may be divided into k second sample information sets, where k is equal to the number of second sample information divided by the number of first sample information. For example, the number of the second sample information is M, the number of the first sample information is N, where M > > N, and k is M/N. And fusing the N pieces of first sample information as a whole with the k pieces of second sample information sets respectively to obtain k training sample sets, and training by using the k training sample sets respectively to generate k classification models. It should be noted that, in practical applications, to avoid the classification results of the k classification models from being flat, k may be set to be an odd number. As shown in fig. 4, the second sample information is divided into a plurality of second sample information sets, and the number of the second sample information in each second sample information set is the same as the number of the first sample information.
And after the plurality of classification models are determined, taking each classification model as a target classification model, and classifying the text information to be classified by using the target classification model so as to obtain a plurality of classification results. And then, determining a final classification result of the text information to be classified according to the obtained multiple classification results of the text information to be classified. Specifically, the classification result that is the majority of the obtained multiple classification results of the text information to be classified may be determined as the final classification result of the text information to be classified.
For convenience of understanding, the question-answering system is described as an example of classifying the service question text information or the chat question text information, specifically, the question-answering system first obtains the question text information input by the user and converts the question text information into the target feature vector. Then, the target feature vector is input into a target classification model, so that the target classification model determines a first classification result of the question text information according to the target feature vector. Meanwhile, the question-answering system obtains the distance between the target feature vector and the optimal hyperplane, if the distance is larger than or equal to a first threshold value, a first classification result output by the target classification model is determined as a classification result of the question text information, and an answer corresponding to the classification result is output. Specifically, if the first classification result is the service question text information, the question-answering system outputs a service answer; and if the first classification result is the text information of the chat question, the question answering system outputs the answer of the chat question.
And if the distance is smaller than the first threshold value, shifting the target characteristic vector until the shifted target characteristic vector meets a preset condition, inputting the shifted target characteristic vector meeting the preset condition into a target classification model, determining a second classification result output by the target classification model as a classification result of the question text information, and outputting an answer corresponding to the classification result. Specifically, if the second classification result is the service question text information, the question-answering system outputs a service answer; and if the second classification result is the text information of the chat question, the question-answering system outputs the chat answer.
Based on the above method embodiments, the present application provides a device for classifying text information, and the device will be described below with reference to the accompanying drawings.
Referring to fig. 5, which is a structural diagram of a text information classification apparatus according to an embodiment of the present application, as shown in fig. 5, the apparatus may include:
a conversion unit 501, configured to convert text information to be classified into a target feature vector;
an input unit 502, configured to input the target feature vector into a target classification model, where the target classification model includes an optimal hyperplane for distinguishing first sample information from second sample information, and the optimal hyperplane is obtained according to a distribution of feature vectors of the first sample information and a distribution of feature vectors of the second sample information;
an obtaining unit 503, configured to obtain a distance between the target feature vector and the optimal hyperplane;
a first determining unit 504, configured to, when a distance between the target feature vector and the optimal hyperplane is greater than or equal to a first threshold, obtain a first classification result output by the target classification model as a classification result of the to-be-classified text information;
a drifting unit 505, configured to, when a distance between the target feature vector and the optimal hyperplane is smaller than a first threshold, drift the target feature vector according to a distribution of feature vectors of the first sample information and a distribution of feature vectors of the second sample information;
a second determining unit 506, configured to input the shifted target feature vector into the target classification model, and obtain a second classification result output by the target classification model as a classification result of the text information to be classified.
In a possible implementation manner, the drifting unit is specifically configured to drift the target feature vector to a target position, where the target position is a position corresponding to an average value of feature vectors of the first sample information and feature vectors of the second sample information in a preset neighborhood of a current position of the target feature vector;
and repeatedly executing the drifting unit to drift the target feature vector to a first target position until the distance between the drifted target feature vector and the optimal hyperplane is larger than or equal to a first threshold or the difference value between the current position of the target feature vector and the target position is smaller than a second threshold.
In one possible implementation manner, the conversion unit includes:
the acquiring subunit is used for acquiring text information to be classified;
the word segmentation subunit is used for segmenting the text information to be classified;
and the conversion subunit is used for converting each word segmentation into word characteristics and forming a target characteristic vector by using each word characteristic.
In a possible implementation manner, when the number of the second sample information is greater than the number of the first sample information, the apparatus further includes:
a dividing unit configured to divide the second sample information into a plurality of second sample information sets;
and the training unit is used for respectively training and generating a plurality of classification models according to the first sample information and the second sample information in each second sample information set.
In one possible implementation, the apparatus further includes:
a third determining unit configured to determine the classification models as target classification models, respectively, and execute the converting unit and subsequent steps;
and the fourth determining unit is used for determining a final classification result of the text information to be classified according to the obtained multiple classification results of the text information to be classified.
In a possible implementation manner, the fourth determining unit is specifically configured to determine, as a final classification result of the to-be-classified text information, a classification result that is a majority of the obtained multiple classification results of the to-be-classified text information.
In a possible implementation manner, the first sample information is service question text information, and the second sample information is chat question text information.
It should be noted that, implementation of each unit in this embodiment may refer to the above method embodiment, and this embodiment is not described herein again.
In addition, an embodiment of the present application further provides a computer-readable storage medium, where instructions are stored in the computer-readable storage medium, and when the instructions are run on a terminal device, the terminal device is caused to execute the text information classification method.
The embodiment of the application provides a device for realizing text information classification, which comprises: the text information classification method comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the computer program, the text information classification method is realized.
Based on the above description, in the embodiment of the application, when the target classification model is used for classifying the text information to be classified, whether the classification result output by the target classification model is credible is determined by using the distance between the target feature vector corresponding to the text information to be classified and the optimal hyperplane in the target classification model, if the distance between the target feature vector and the optimal hyperplane is greater than or equal to the first threshold, the classification result is credible, and the classification result is used as the classification result of the text information to be classified; if the distance from the target characteristic vector to the optimal hyperplane is smaller than a first threshold value, the classification result is not credible, the target characteristic vector is shifted until the position of the target characteristic vector after shifting meets a preset condition, then the shifted target characteristic vector is input into a target classification model, the classification result output by the target classification model is used as the classification result of the text information to be classified, noise generated during classification is eliminated, and the accuracy of the classification result is improved.
It should be noted that, in the present specification, the embodiments are described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the system or the device disclosed by the embodiment, the description is simple because the system or the device corresponds to the method disclosed by the embodiment, and the relevant points can be referred to the method part for description.
It should be understood that in the present application, "at least one" means one or more, "a plurality" means two or more. "and/or" for describing an association relationship of associated objects, indicating that there may be three relationships, e.g., "a and/or B" may indicate: only A, only B and both A and B are present, wherein A and B may be singular or plural. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. "at least one of the following" or similar expressions refer to any combination of these items, including any combination of single item(s) or plural items. For example, at least one (one) of a, b, or c, may represent: a, b, c, "a and b", "a and c", "b and c", or "a and b and c", wherein a, b, c may be single or plural.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (10)

1. A method for classifying textual information, the method comprising:
converting text information to be classified into target feature vectors;
inputting the target feature vector into a target classification model, wherein the target classification model comprises an optimal hyperplane for distinguishing first sample information and second sample information, and the optimal hyperplane is obtained by training according to the distribution of the feature vector of the first sample information and the distribution of the feature vector of the second sample information;
obtaining the distance between the target characteristic vector and the optimal hyperplane;
when the distance between the target feature vector and the optimal hyperplane is larger than or equal to a first threshold value, acquiring a first classification result output by the target classification model as a classification result of the text information to be classified;
when the distance between the target characteristic vector and the optimal hyperplane is smaller than a first threshold value, shifting the target characteristic vector according to the distribution of the characteristic vector of the first sample information and the distribution of the characteristic vector of the second sample information;
and inputting the shifted target feature vector into the target classification model, and acquiring a second classification result output by the target classification model as a classification result of the text information to be classified.
2. The method of claim 1, wherein the shifting the target eigenvector according to the distribution of the eigenvectors of the first sample information and the distribution of the eigenvectors of the second sample information comprises:
shifting the target characteristic vector to a target position, wherein the target position is a position corresponding to the mean value of the characteristic vector of each first sample information and the characteristic vector of the second sample information in a preset neighborhood of the current position of the target characteristic vector;
repeatedly executing the drifting of the target feature vector to the first target position until the distance between the drifted target feature vector and the optimal hyperplane is larger than or equal to a first threshold or the difference value between the current position of the target feature vector and the target position is smaller than a second threshold.
3. The method of claim 1, wherein converting the text information to be classified into the target feature vector comprises:
acquiring text information to be classified;
performing word segmentation on the text information to be classified;
converting each word segmentation into word features, and forming target feature vectors by the word features.
4. The method of claim 1, wherein when the amount of the second sample information is greater than the amount of the first sample information, the method further comprises:
dividing the second sample information into a plurality of second sample information sets;
and respectively training and generating a plurality of classification models according to the first sample information and the second sample information in each second sample information set.
5. The method of claim 4, further comprising:
respectively determining the classification models as target classification models, and executing the steps of converting the text information to be classified into target characteristic vectors and the subsequent steps;
and determining a final classification result of the text information to be classified according to the obtained multiple classification results of the text information to be classified.
6. The method according to claim 5, wherein the determining a final classification result of the text information to be classified according to the obtained multiple classification results of the text information to be classified comprises:
determining the most classified result in the obtained multiple classified results of the text information to be classified as the final classified result of the text information to be classified.
7. The method according to any one of claims 1 to 6, wherein the first sample information is a business question text information and the second sample information is a chat question text information.
8. A text information classification apparatus, characterized in that the apparatus comprises:
the conversion unit is used for converting the text information to be classified into a target characteristic vector;
an input unit, configured to input the target feature vector into a target classification model, where the target classification model includes an optimal hyperplane used for distinguishing first sample information from second sample information, and the optimal hyperplane is obtained by training according to distribution of feature vectors of the first sample information and distribution of feature vectors of the second sample information;
the obtaining unit is used for obtaining the distance between the target characteristic vector and the optimal hyperplane;
a first determining unit, configured to, when a distance between the target feature vector and the optimal hyperplane is greater than or equal to a first threshold, obtain a first classification result output by the target classification model as a classification result of the to-be-classified text information;
a drifting unit, configured to, when a distance between the target feature vector and the optimal hyperplane is smaller than a first threshold, drift the target feature vector according to a distribution of feature vectors of the first sample information and a distribution of feature vectors of the second sample information;
and the second determining unit is used for inputting the shifted target feature vector into the target classification model and acquiring a second classification result output by the target classification model as a classification result of the text information to be classified.
9. A computer-readable storage medium having stored therein instructions that, when run on a terminal device, cause the terminal device to perform the method of classifying textual information according to any one of claims 1-7.
10. An apparatus for implementing classification of text information, comprising: a memory, a processor, and a computer program stored on the memory and executable on the processor, when executing the computer program, implementing the method of text information classification according to any of claims 1-7.
CN201911302877.5A 2019-12-17 2019-12-17 Text information classification method, device and equipment Active CN111125359B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911302877.5A CN111125359B (en) 2019-12-17 2019-12-17 Text information classification method, device and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911302877.5A CN111125359B (en) 2019-12-17 2019-12-17 Text information classification method, device and equipment

Publications (2)

Publication Number Publication Date
CN111125359A true CN111125359A (en) 2020-05-08
CN111125359B CN111125359B (en) 2023-12-15

Family

ID=70499339

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911302877.5A Active CN111125359B (en) 2019-12-17 2019-12-17 Text information classification method, device and equipment

Country Status (1)

Country Link
CN (1) CN111125359B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858474A (en) * 2023-02-27 2023-03-28 环球数科集团有限公司 AIGC-based file arrangement system

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101295362A (en) * 2007-04-28 2008-10-29 中国科学院国家天文台 Combination supporting vector machine and pattern classification method of neighbor method
US7502767B1 (en) * 2006-07-21 2009-03-10 Hewlett-Packard Development Company, L.P. Computing a count of cases in a class
JP2013007578A (en) * 2011-06-22 2013-01-10 Nec Corp Signal detection device, signal detection method and signal detection program
CN106503656A (en) * 2016-10-24 2017-03-15 厦门美图之家科技有限公司 A kind of image classification method, device and computing device
US20170083507A1 (en) * 2015-09-22 2017-03-23 International Business Machines Corporation Analyzing Concepts Over Time
WO2019041629A1 (en) * 2017-08-30 2019-03-07 哈尔滨工业大学深圳研究生院 Method for classifying high-dimensional imbalanced data based on svm
CN110413791A (en) * 2019-08-05 2019-11-05 哈尔滨工业大学 File classification method based on CNN-SVM-KNN built-up pattern
CN110427959A (en) * 2019-06-14 2019-11-08 合肥工业大学 Complain classification method, system and the storage medium of text

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US7502767B1 (en) * 2006-07-21 2009-03-10 Hewlett-Packard Development Company, L.P. Computing a count of cases in a class
CN101295362A (en) * 2007-04-28 2008-10-29 中国科学院国家天文台 Combination supporting vector machine and pattern classification method of neighbor method
JP2013007578A (en) * 2011-06-22 2013-01-10 Nec Corp Signal detection device, signal detection method and signal detection program
US20170083507A1 (en) * 2015-09-22 2017-03-23 International Business Machines Corporation Analyzing Concepts Over Time
CN106503656A (en) * 2016-10-24 2017-03-15 厦门美图之家科技有限公司 A kind of image classification method, device and computing device
WO2019041629A1 (en) * 2017-08-30 2019-03-07 哈尔滨工业大学深圳研究生院 Method for classifying high-dimensional imbalanced data based on svm
CN110427959A (en) * 2019-06-14 2019-11-08 合肥工业大学 Complain classification method, system and the storage medium of text
CN110413791A (en) * 2019-08-05 2019-11-05 哈尔滨工业大学 File classification method based on CNN-SVM-KNN built-up pattern

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
冯进玫;卢志茂;陈纯锴;: "一种基于均值更新的分类模型", no. 08, pages 125 - 128 *
盛晓遐 等: "DP聚类的可信性加权模糊支持向量机", 计算机工程与应用, no. 10, pages 174 - 183 *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115858474A (en) * 2023-02-27 2023-03-28 环球数科集团有限公司 AIGC-based file arrangement system

Also Published As

Publication number Publication date
CN111125359B (en) 2023-12-15

Similar Documents

Publication Publication Date Title
Carreira-Perpinan et al. On contrastive divergence learning
CN109697282B (en) Sentence user intention recognition method and device
WO2020182122A1 (en) Text matching model generation method and device
US9807473B2 (en) Jointly modeling embedding and translation to bridge video and language
WO2020073507A1 (en) Text classification method and terminal
CN109614473B (en) Knowledge reasoning method and device applied to intelligent interaction
CN106469192B (en) Text relevance determining method and device
WO2019149059A1 (en) Method and apparatus for determining decision strategy corresponding to service and electronic device
CN108334805A (en) The method and apparatus for detecting file reading sequences
WO2021135271A1 (en) Classification model training method and system, electronic device and storage medium
JP6824795B2 (en) Correction device, correction method and correction program
JP2020004322A (en) Device and method for calculating similarity of text and program
CN111737439A (en) Question generation method and device
CN109753561B (en) Automatic reply generation method and device
CN111125359B (en) Text information classification method, device and equipment
CN114399025A (en) Graph neural network interpretation method, system, terminal and storage medium
JP2015038709A (en) Model parameter estimation method, device, and program
US7933449B2 (en) Pattern recognition method
WO2023155301A1 (en) Answer sequence prediction method based on improved irt structure, and controller and storage medium
JPWO2021038840A5 (en)
CN116579376A (en) Style model generation method and device and computer equipment
JP2021089719A (en) Information processor and information processing method
Liu et al. Classifier fusion based on cautious discounting of beliefs
Sato-Ilic et al. Visualization of fuzzy clustering result in metric space
CN111126617A (en) Method, device and equipment for selecting fusion model weight parameters

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant