CN111143552A

CN111143552A - Text information category prediction method and device and server

Info

Publication number: CN111143552A
Application number: CN201911236894.3A
Authority: CN
Inventors: 马良庄
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2020-05-12
Anticipated expiration: 2039-12-05
Also published as: CN111143552B

Abstract

The embodiment of the specification provides a text information category prediction method, a text information category prediction device and a server. At least one of the first text classification models is an under-fit model, so that the original single text classification model is replaced by the plurality of first text classification models comprising the under-fit model, the final prediction result can be restrained by the under-fit model, the generalization capability of the model is improved, and the accuracy of the classification result is improved.

Description

Text information category prediction method and device and server

Technical Field

The specification relates to the technical field of artificial intelligence, in particular to a text information category prediction method, a text information category prediction device and a text information category prediction server.

Background

In everyday applications, some textual information often needs to be classified. For example, in a smart robot customer service application scenario, a user may send text information to the smart robot customer service, where the text information may be text information related to account operations, such as: how to register an account or how to bind a mobile phone number to the account, etc.; or text information related to the order, such as: how to cancel an order or how long the order refund process is to be cancelled, etc.; other types of textual information are also possible. In order to improve the response efficiency of the intelligent robot customer service, the text information needs to be classified. Therefore, it is necessary to improve the accuracy of text information classification.

Disclosure of Invention

Based on the above, the embodiments of the present specification provide a method and an apparatus for predicting a category of text information, and a server.

According to a first aspect of embodiments herein, there is provided a method for predicting a category of text information, the method including:

receiving text information to be processed;

predicting the text information to be processed by respectively adopting a plurality of pre-trained first text classification models to obtain first prediction categories output by each first text classification model; wherein at least one of the first text classification models is an under-fit model;

and acquiring the real category of the text information to be processed according to the first prediction category output by each first text classification model.

According to a second aspect of embodiments herein, there is provided an apparatus for predicting a category of text information, the apparatus including:

the receiving module is used for receiving text information to be processed;

the first prediction module is used for predicting the text information to be processed by adopting a plurality of pre-trained first text classification models respectively to obtain first prediction categories output by the first text classification models; wherein at least one of the first text classification models is an under-fit model;

and the second prediction module is used for acquiring the real category of the text information to be processed according to the first prediction categories output by the first text classification models.

According to a third aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the embodiments.

According to a fourth aspect of the embodiments of the present specification, there is provided a server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method according to any of the embodiments when executing the program.

By applying the scheme of the embodiment of the specification, the text information to be processed is predicted by adopting a plurality of first text classification models, and then the first prediction categories output by each first text classification model are fused to obtain the real categories of the text information to be processed. At least one of the first text classification models is an under-fit model, so that the original single text classification model is replaced by the plurality of first text classification models comprising the under-fit model, the final prediction result can be restrained by the under-fit model, the generalization capability of the model is improved, and the accuracy of the classification result is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the specification.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present specification and together with the description, serve to explain the principles of the specification.

Fig. 1 is a flowchart of a method for predicting a category of text information according to an embodiment of the present disclosure.

Fig. 2 is a schematic diagram of multi-model prediction result fusion according to an embodiment of the present disclosure.

FIG. 3 is a schematic diagram of a model training and validation process according to an embodiment of the present description.

Fig. 4 is a block diagram of a text information category prediction device according to an embodiment of the present specification.

FIG. 5 is a schematic diagram of a computer device for implementing methods of embodiments of the present description, according to an embodiment of the present description.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the specification, as detailed in the appended claims.

The terminology used in the description herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present specification. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The meanings of the respective terms employed in the examples of the present specification are as follows:

deep learning: a more abstract high-level representation attribute class or feature is formed by combining low-level features to discover a distributed feature representation of the data.

Model training: in supervised learning or unsupervised learning, given training text information and a model hypothesis space, an optimization problem can be constructed, namely how to determine parameters so as to optimize an optimization target.

Training set, verification set and test set: in the text classification model, data are divided into three parts, model training is respectively carried out, the trained model is verified, and the verified model is adopted for data prediction.

Under-fitting: the error between the function fitted by the trained text classification model and the training set is large.

Overfitting: the function fitted by the trained text classification model perfectly matches the data in the training set, but the generalization capability is not enough. I.e., performs excessively well on training text information and does not perform well on other data.

Learning rate: in the model training process, the step length of the model parameters is adjusted at each iteration.

As shown in fig. 1, an embodiment of the present specification provides a method for predicting a category of text information, where the method may include:

step S102: receiving text information to be processed;

step S104: predicting the text information to be processed by respectively adopting a plurality of pre-trained first text classification models to obtain first prediction categories output by each first text classification model; wherein at least one of the first text classification models is an under-fit model;

step S106: and acquiring the real category of the text information to be processed according to the first prediction category output by each first text classification model.

The steps in the embodiments of the present description may be performed by an intelligent robot customer service located on the server side. For step S102, the text information to be processed may be sent to the intelligent robot customer service by the user through the client. The user can input the text information to be processed on the client, and the client can send the text information to be processed to the intelligent robot customer service. The client may be an application installed on an electronic device such as a smart phone, a tablet computer, or a desktop computer. For example, it may be an application program such as Taobao, Internet banking or Payment. The text information to be processed input by the user on the client can be text information related to account operation, such as: how to register an account or how to bind a mobile phone number to the account, etc.; or text information related to the order, such as: how to cancel an order or how long the order refund process is to be cancelled, etc.; other types of text information to be processed are also possible.

In some embodiments, the user may also send information in other formats to the client, other than text. After receiving the information in other formats, the client can extract text information to be processed from the information, and then sends the text information to the intelligent robot customer service. For example, when the other format is a picture format, the text information to be processed may be recognized from the picture by an OCR (Optical Character Recognition) technique. Further, for the received or extracted text information to be processed, stop words can be filtered from the text information to be processed, and then the filtered text information to be processed is sent to the intelligent robot customer service.

For step S104, a plurality of under-fitted first text classification models may be used to predict the category of the text information to be processed. At least one of the first text classification models is an under-fit model, and the original single text classification model is replaced by the plurality of first text classification models comprising the under-fit model, so that the under-fit model can constrain the final prediction result, the generalization capability of the model is improved, and the accuracy of the classification result is improved.

The first text classification model that is under-fitted may be a relatively small scale model, where the scale may refer to the number of model parameters or the number of layers of the model, or to other features of the model. "Small" is relative to an over-fit model, e.g., a first text classification model has fewer model parameters than the over-fit model, or a first text classification model has fewer model layers than the over-fit model, indicating that the first text classification model is a smaller sized text classification model than the over-fit model. In practical applications, the scale of the under-fit first text classification model may be at least one order of magnitude smaller than the scale of the over-fit model. For example, the number of model parameters of the under-fitted first text classification model is at least one order of magnitude less than the over-fitted model, or the number of model layers of the under-fitted first text classification model is at least one order of magnitude less than the over-fitted model.

Each first text classification model may be a text classification model of various categories, such as a neural network model, a decision tree model, a bayesian classifier, and the like, which is not limited by the present disclosure. The categories of the plurality of first text classification models may be the same, may also be partially the same, or may be different, which is not limited in this disclosure.

The under-fit first text classification model is less capable of extracting features from the first training text information than the over-fit model. The overfitting model may treat some characteristics of the training samples themselves as general properties that all potential samples would have, causing overfitting. In some embodiments, each of the first text classification models is a under-fit model. The first text classification models with weak feature extraction capability replace single models to predict data, so that the phenomenon that the characteristics of the training text information are mistaken for the general properties of all potential samples in the training process can be effectively avoided, the generalization capability of the models is improved, and the accuracy of classification results is improved. Further, each first text classification model is an under-fit deep learning model. Due to the fact that model parameters and the number of model layers of the deep learning model are large, the overfitting phenomenon is easy to occur. Therefore, the generalization ability of the deep learning model can be improved by replacing a single deep learning model with a plurality of under-fitted deep learning models, so that the classification accuracy of classification by using the deep learning model is improved.

The number of the first text classification models may be preset, for example, to 5, 10 or other numbers, and the larger the number of the first text classification models is, the more accurate the classification result is, but the more resources the classification process occupies, the larger the time delay is. Therefore, the number of the first text classification models may be set according to a preset accuracy constraint and a time delay constraint.

For step S106, the first prediction categories output by each first text classification model may be fused to obtain the real category of the text information to be processed.

In some embodiments, the step of obtaining the true category of the text information to be processed according to the first prediction category output by each first text classification model includes: and carrying out weighted average on the first prediction categories output by each first text classification model to obtain the real categories of the text information to be processed.

For example, assume that there are M first text classification models, and assume that the first prediction class is a probability value that the text information to be processed belongs to a certain class a, the probability values output by the first text classification models are P1, P2, …, and PM, and the weights corresponding to the first text classification models are r1, r2, …, and rM, respectively. The probability value that the text information to be processed belongs to the category a can be recorded as:

wherein, the values of r1, r2, …, rM can be set according to the capability of extracting features of the respective first text classification models. The higher the ability to extract features, the smaller the weight may be set, whereas the lower the ability to extract features, the larger the weight may be set. In the case that each of the first text classification models is an under-fit model, the values of r1, r2, …, rM may all take 1, i.e., the first prediction classes output by each of the first text classification models are averaged to obtain the true class.

It should be noted that the probability value may also be replaced by a score, and the higher the score of the text information to be processed belonging to the category a, the higher the probability that the text information to be processed belongs to the category a is.

In some embodiments, the step of obtaining the true category of the to-be-processed text information according to the first prediction category output by each first text classification model includes: taking a target first prediction category meeting a preset condition as the real category, wherein the preset condition is as follows: the number of first text classification models outputting the target first prediction category is larger than the number of first text classification models outputting other first prediction categories.

For example, for a classification problem, it is assumed that a total of 10 first text classification models are adopted, wherein the first prediction class output by 4 first text classification models is class a, the first prediction class output by 3 first text classification models is class B, and the first prediction class output by 3 first text classification models is class C. Since the number of first text classification models outputting the category a is the largest, the category a is taken as the true category.

Further, the number may be a weighted number, and the weight may be set according to the capability of each first text classification model to extract features. The higher the ability to extract features, the smaller the weight may be set, whereas the lower the ability to extract features, the larger the weight may be set. And when all the first text classification models are under-fit models, setting the weight corresponding to each first text classification model as 1.

The first prediction categories may also be fused in other manners, which is not limited in this description embodiment. When each first text classification model is an under-fit model, a schematic diagram of multi-model prediction result fusion is shown in fig. 2.

In some embodiments, the method further comprises: and training the first text classification models by respectively adopting the training text information corresponding to each first text classification model. In this embodiment, the training text information for training the first text classification model may be the same or different, and this description does not limit this.

In the process of training each first text classification model, an ID number of a real category of the training text information may be acquired, the ID number is used to uniquely identify each real category of the training text information, and a sequence number of the real category of the training text information in a category database may be used as the ID number. The training text information may also be converted into a vector, for example, the word2vec technology may be used to convert the training text information into a vector, and of course, other methods may also be used to perform the conversion, which is not limited in this specification. Then, the vector is used as the input of the first text classification model, the ID number is used as the output of the first text classification model, and the first text classification model is trained.

In some embodiments, after the training text information corresponding to each first text classification model is respectively used to train the first text classification model, the method further includes: inputting verification text information corresponding to the first text classification model into the first text classification model to obtain a second prediction category output by the first text classification model; judging whether the loss function corresponding to the second prediction category meets a training termination condition; and if so, terminating the training of the first text classification model. A schematic diagram of the model training and validation process of one embodiment is shown in fig. 3.

In this embodiment, a first text classification model satisfying a training termination condition is finally obtained through multiple iterations. In some embodiments, the training termination condition is used to limit the trained first text classification model to an under-fit model. Specifically, the training termination condition is: the loss function corresponding to the second prediction category is K times the loss function corresponding to the first prediction category, where K is a constant greater than 1. By setting a large loss function, the trained first text classification model can be ensured to be an under-fit model. When the first text classification model comprises a plurality of under-fit models, each under-fit first text classification model can be limited by the training termination condition in the training process.

For example, for the ith first text classification model, the first text classification model is assumed to be an under-fit model, and then the first prediction category output by the first text classification model is assumed to be a category to which training text information belongs, the first prediction category corresponding to K1 pieces of training text information in the training set is different from the true category of the training text information, and the second prediction category corresponding to K2 pieces of verification text information in the verification set is different from the true category of the training text information. The following loss function is defined: and if the prediction result is the same as the real type of the training text information, the loss function is 0, and if the prediction result is different from the real type of the training text information, the loss function is 1. The loss function for the first prediction class is K1 and the loss function for the second prediction class is K2. And when K2/K1 is satisfied, stopping training the ith first text classification model, and otherwise, continuing training. The above examples are illustrative only and are not intended to limit the present disclosure. In the embodiments of this specification, the loss function may also be determined in other manners, which are not described herein again.

In some embodiments, the first text classification model is trained with a learning rate greater than a preset learning rate threshold. By setting a large learning rate, the trained first text classification model can be prevented from being over-fitted.

As shown in fig. 4, a text information category prediction apparatus according to an embodiment of the present specification includes:

a receiving module 402, configured to receive text information to be processed;

a first prediction module 404, configured to predict the to-be-processed text information by using a plurality of pre-trained first text classification models, respectively, and obtain a first prediction category output by each first text classification model; wherein at least one of the first text classification models is an under-fit model;

and the second prediction module 406 is configured to obtain a real category of the text information to be processed according to the first prediction categories output by the first text classification models.

The detailed details of the implementation process of the functions and actions of each module in the text information category prediction device are found in the implementation process of the corresponding step in the text information category prediction method, and are not described herein again.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, wherein the modules described as separate parts may or may not be physically separate, and the parts displayed as modules may or may not be physical modules, may be located in one place, or may be distributed on a plurality of network modules. Some or all of the modules can be selected according to actual needs to achieve the purpose of the solution in the specification. One of ordinary skill in the art can understand and implement it without inventive effort.

The embodiments of the apparatus of the present specification can be applied to a computer device, such as a server or a terminal device. The device embodiments may be implemented by software, or by hardware, or by a combination of hardware and software. The software implementation is taken as an example, and as a logical device, the device is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory for operation through the processor in which the file processing is located. From a hardware aspect, as shown in fig. 5, it is a hardware structure diagram of a computer device in which the apparatus of this specification is located, except for the processor 502, the memory 504, the network interface 506, and the nonvolatile memory 508 shown in fig. 5, a server or an electronic device in which the apparatus is located in the embodiment may also include other hardware according to an actual function of the computer device, which is not described again.

Accordingly, the embodiments of the present specification also provide a computer storage medium, in which a program is stored, and the program, when executed by a processor, implements the method in any of the above embodiments.

Accordingly, embodiments of the present specification further provide a server, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the method in any of the above embodiments is implemented.

Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having program code embodied therein. Computer-usable storage media include permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of the storage medium of the computer include, but are not limited to: phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technologies, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic tape storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by a computing device.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This disclosure is intended to cover any variations, uses, or adaptations of the disclosure following, in general, the principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It will be understood that the present disclosure is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The above description is only exemplary of the present disclosure and should not be taken as limiting the disclosure, as any modification, equivalent replacement, or improvement made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.

Claims

1. A method of category prediction of textual information, the method comprising:

receiving text information to be processed;

2. The method of claim 1, wherein the step of obtaining the true category of the text information to be processed according to the first prediction category output by each first text classification model comprises:

carrying out weighted average on the first prediction categories output by each first text classification model to obtain the real categories of the text information to be processed; or

Taking a target first prediction category meeting a preset condition as the real category, wherein the preset condition is as follows: the number of first text classification models outputting the target first prediction category is larger than the number of first text classification models outputting other first prediction categories.

3. The method of claim 1, further comprising:

and training the first text classification models by respectively adopting the training text information corresponding to each first text classification model.

4. The method of claim 3, after the first text classification models are trained by using the training text information corresponding to each first text classification model, the method further comprising:

inputting verification text information corresponding to the first text classification model into the first text classification model to obtain a second prediction category output by the first text classification model;

judging whether the loss function corresponding to the second prediction category meets a training termination condition;

and if so, terminating the training of the first text classification model.

5. The method of claim 4, the training termination condition being: the loss function corresponding to the second prediction category is K times the loss function corresponding to the first prediction category, where K is a constant greater than 1.

6. The method according to any one of claims 3 to 5, wherein a learning rate used for training the first text classification model is greater than a preset learning rate threshold.

7. The method of any of claims 1-5, each first text classification model being an under-fit deep learning model.

8. An apparatus for predicting a category of text information, the apparatus comprising:

the receiving module is used for receiving text information to be processed;

9. A computer-readable storage medium, on which a computer program is stored which, when being executed by a processor, carries out the method of any one of claims 1 to 7.

10. A server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 7 when executing the program.