CN111143552B

CN111143552B - Text information category prediction method and device and server

Info

Publication number: CN111143552B
Application number: CN201911236894.3A
Authority: CN
Inventors: 马良庄
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2019-12-05
Filing date: 2019-12-05
Publication date: 2023-06-27
Anticipated expiration: 2039-12-05
Also published as: CN111143552A

Abstract

The embodiment of the specification provides a text information category prediction method, a text information category prediction device and a text information server, wherein a plurality of first text classification models are adopted to predict text information to be processed, and then first prediction categories output by the first text classification models are fused to obtain the real category of the text information to be processed. Because at least one of the first text classification models is an under-fitting model, the original single text classification model is replaced by a plurality of first text classification models containing the under-fitting model, so that the under-fitting model can constrain a final prediction result, the generalization capability of the model is improved, and the accuracy of the classification result is improved.

Description

Text information category prediction method and device and server

Technical Field

The present disclosure relates to the field of artificial intelligence technologies, and in particular, to a method and an apparatus for predicting a category of text information, and a server.

Background

In everyday applications, it is often necessary to categorize some text information. For example, in a smart robot customer service application scenario, a user may send text information to a smart robot customer service, which may be text information related to account operations, such as: how to register an account or how to bind a mobile phone number for an account, etc.; text information related to the order may also be used, for example: how to cancel an order, how long to cancel order refund processing is, and the like; other types of text information are also possible. In order to improve the response efficiency of intelligent robot customer service, such text information needs to be classified. Therefore, it is necessary to improve the accuracy of text information classification.

Disclosure of Invention

Based on the above, the embodiment of the specification provides a method and a device for predicting the category of text information and a server.

According to a first aspect of embodiments of the present specification, there is provided a category prediction method of text information, the method including:

receiving text information to be processed;

predicting the text information to be processed by adopting a plurality of first text classification models trained in advance respectively, and obtaining first prediction categories output by each first text classification model; wherein at least one of the respective first text classification models is an under-fitted model;

and acquiring the real category of the text information to be processed according to the first prediction category output by each first text classification model.

According to a second aspect of embodiments of the present specification, there is provided a category prediction apparatus of text information, the apparatus comprising:

the receiving module is used for receiving the text information to be processed;

the first prediction module is used for predicting the text information to be processed by adopting a plurality of pre-trained first text classification models respectively, and obtaining first prediction categories output by each first text classification model; wherein at least one of the respective first text classification models is an under-fitted model;

and the second prediction module is used for acquiring the real category of the text information to be processed according to the first prediction category output by each first text classification model.

According to a third aspect of embodiments of the present specification, there is provided a computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements the method of any of the embodiments.

According to a fourth aspect of the embodiments of the present description, there is provided a server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any of the embodiments when executing the program.

By applying the scheme of the embodiment of the specification, the text information to be processed is predicted by adopting a plurality of first text classification models, and the first prediction categories output by the first text classification models are fused to obtain the real categories of the text information to be processed. Because at least one of the first text classification models is an under-fitting model, the original single text classification model is replaced by a plurality of first text classification models containing the under-fitting model, so that the under-fitting model can constrain a final prediction result, the generalization capability of the model is improved, and the accuracy of the classification result is improved.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the disclosure.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the specification and together with the description, serve to explain the principles of the specification.

Fig. 1 is a flowchart of a method for predicting a category of text information according to an embodiment of the present disclosure.

FIG. 2 is a schematic diagram of multi-model predictive outcome fusion in accordance with one embodiment of the present disclosure.

FIG. 3 is a schematic diagram of a model training and validation process according to an embodiment of the present disclosure.

Fig. 4 is a block diagram of a text information category predicting device according to an embodiment of the present specification.

FIG. 5 is a schematic diagram of a computer device for implementing the method of an embodiment of the present specification, according to an embodiment of the present specification.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the present specification. Rather, they are merely examples of apparatus and methods consistent with some aspects of the present description as detailed in the accompanying claims.

The terminology used in the description presented herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the description. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used in this specification to describe various information, these information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, the first information may also be referred to as second information, and similarly, the second information may also be referred to as first information, without departing from the scope of the present description. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "responsive to a determination", depending on the context.

The meanings of the terms used in the embodiments of the present specification are as follows:

deep learning: a more abstract high-level representation attribute category or feature is formed by combining low-level features to discover a distributed feature representation of the data.

Model training: in supervised learning or unsupervised learning, given training text information and model hypothesis space, an optimization problem, i.e., how to determine parameters, can be constructed such that the optimization objective is optimized.

Training set, validation set, test set: in the text classification model, data are divided into three parts, model training is respectively carried out, the trained model is verified, and data prediction is carried out by adopting the verified model.

Under fitting: the error between the function fitted by the trained text classification model and the training set is larger.

Overfitting: the function fitted by the trained text classification model perfectly matches the data in the training set, but the generalization capability is not sufficient. I.e. perform excessively well on training text information and not well on other data.

Learning rate: in the model training process, the step length of the model parameters is adjusted at each iteration.

As shown in fig. 1, an embodiment of the present disclosure provides a method for predicting a category of text information, where the method may include:

step S102: receiving text information to be processed;

step S104: predicting the text information to be processed by adopting a plurality of first text classification models trained in advance respectively, and obtaining first prediction categories output by each first text classification model; wherein at least one of the respective first text classification models is an under-fitted model;

step S106: and acquiring the real category of the text information to be processed according to the first prediction category output by each first text classification model.

The steps in the embodiments of the present description may be performed by an intelligent robot customer service located at the server side. For step S102, the text information to be processed may be sent to the intelligent robot customer service by the user through the client. The user can input the text information to be processed on the client, and the client can send the text information to be processed to the intelligent robot customer service. The client may be an application program installed on an electronic device such as a smart phone, a tablet computer, or a desktop computer. For example, it may be an application such as a panning, internet banking or payment device. The text information to be processed entered by the user on the client may be text information related to account operations, such as: how to register an account or how to bind a mobile phone number for an account, etc.; text information related to the order may also be used, for example: how to cancel an order, how long to cancel order refund processing is, and the like; other types of text information to be processed are also possible.

In some embodiments, the user may also send information to the client in other formats, which are other than text. After receiving the information in other formats, the client can extract the text information to be processed from the information and then send the text information to the intelligent robot customer service. For example, when the other format is a picture format, the text information to be processed may be recognized from the picture by OCR (Optical Character Recognition ) technology. Further, for the received or extracted text information to be processed, stop words can be filtered from the text information to be processed, and then the filtered text information to be processed is sent to the intelligent robot customer service.

For step S104, a plurality of under-fitted first text classification models may be used to predict the category of the text information to be processed. At least one of the first text classification models is an under-fitting model, and the original single text classification model is replaced by a plurality of first text classification models containing the under-fitting model, so that the under-fitting model can constrain a final prediction result, the generalization capability of the model is improved, and the accuracy of the classification result is improved.

The under-fitted first text classification model may be a model of a relatively small scale, where the scale may refer to the number of model parameters or the number of layers of the model, as well as to other features of the model. A "small" is relative to an overfitting model, e.g., a number of model parameters for a certain first text classification model is less than the overfitting model, or a number of model layers for a certain first text classification model is less than the overfitting model, indicating that the first text classification model is a smaller-scale text classification model than the overfitting model. In practice, the size of the under-fitted first text classification model may be at least one order of magnitude smaller than the size of the over-fitted model. For example, the number of model parameters of the under-fitted first text classification model is at least one order of magnitude less than the over-fitted model, or the number of model layers of the under-fitted first text classification model is at least one order of magnitude less than the over-fitted model.

Each of the first text classification models may be a text classification model of various types, such as a neural network model, a decision tree model, a bayesian classifier, etc., which is not limited by the present disclosure. The categories of the plurality of first text classification models may be the same, may be partially the same, or may be different, and the present disclosure is not limited thereto.

The under-fitted first text classification model is less capable of extracting features from the first training text information than the over-fitted model. The overfitting model may take some of the characteristics of the training samples themselves as general properties that all potential samples would have, thereby causing overfitting. In some embodiments, each of the first text classification models is an under-fit model. The data prediction is carried out by replacing a single model with a plurality of first text classification models with weak feature extraction capability, so that the phenomenon that the characteristics of training text information are mistaken for general properties of all potential samples in the training process can be effectively avoided, the generalization capability of the model is improved, and the accuracy of classification results is improved. Further, each of the first text classification models is a deep learning model that is under-fitted. Because the model parameters and the model layers of the deep learning model are more, the over-fitting phenomenon is easy to occur. Therefore, a plurality of under-fitted deep learning models are used for replacing a single deep learning model, so that the generalization capability of the deep learning model can be improved, and the classification accuracy of classification by adopting the deep learning model is improved.

The number of the first text classification models can be preset, for example, set to 5, 10 or other numbers, and the more the number of the first text classification models is, the more accurate the classification result is, but the more resources are occupied by the classification process, the larger the time delay is. Therefore, the number of first text classification models may be set according to the preset accuracy constraint and time delay constraint.

For step S106, the first prediction categories output by the respective first text classification models may be fused, so as to obtain the real category of the text information to be processed.

In some embodiments, the step of obtaining the true category of the text information to be processed according to the first prediction category output by each first text classification model includes: and carrying out weighted average on the first prediction categories output by each first text classification model to obtain the real categories of the text information to be processed.

For example, it is assumed that there are M first text classification models, and the first prediction class is assumed to be a probability value of the text information to be processed belonging to a certain class a, the probability values output by the respective first text classification models are P1, P2, …, PM, and the weights corresponding to the respective first text classification models are r1, r2, …, and rM, respectively. The probability value of the text information to be processed belonging to the category a may be noted as:

wherein the values of r1, r2, …, rM may be set according to the ability of the respective first text classification models to extract features. The higher the ability to extract features, the smaller the weight may be set, whereas the lower the ability to extract features, the larger the weight may be set. In the case where each first text classification model is an under-fitting model, the values of r1, r2, …, rM may all be taken as 1, i.e. the first prediction category output by each first text classification model is averaged to obtain the true category.

It should be noted that the probability value may be replaced by a score, and that the higher the score of the text information to be processed belongs to the category a, the higher the probability that the text information to be processed belongs to the category a.

In other embodiments, the step of obtaining the true category of the text information to be processed according to the first prediction category output by each first text classification model includes: taking a target first prediction category meeting preset conditions as the real category, wherein the preset conditions are as follows: the number of the first text classification models outputting the target first prediction category is greater than the number of the first text classification models outputting the other first prediction categories.

For example, for a certain classification problem, assume that 10 first text classification models are used in total, wherein the first prediction category output by 4 first text classification models is a category a, the first prediction category output by 3 first text classification models is a category B, and the first prediction category output by 3 first text classification models is a category C. Since the number of the first text classification models of the class a is output the largest, the class a is taken as the real class.

Further, the number may be a weighted number, and the weight may be set according to the capability of extracting features of each of the first text classification models. The higher the ability to extract features, the smaller the weight may be set, whereas the lower the ability to extract features, the larger the weight may be set. And setting the weight corresponding to each first text classification model to be 1 under the condition that each first text classification model is an under-fitting model.

Other ways of fusing the first prediction categories may also be used, and the embodiments of the present disclosure are not limited in this regard. When each of the first text classification models is an under-fitting model, a schematic diagram of the multi-model prediction result fusion is shown in fig. 2.

In some embodiments, the method further comprises: training the first text classification models by respectively adopting training text information corresponding to each first text classification model. In this embodiment, the training text information for training the first text classification model may be the same or different, which is not limited in this specification.

In the process of training each first text classification model, an ID number of a real class of the training text information can be obtained, the ID number is used for uniquely identifying each real class of the training text information, and the serial number of the real class of the training text information in a class database can be used as the ID number. The training text information may also be converted into a vector, for example, word2vec technology may be used to convert the training text information into a vector, and of course, other manners may also be used to convert the training text information, which is not limited in this specification. Then, the vector is used as input of the first text classification model, the ID number is used as output of the first text classification model, and the first text classification model is trained.

In some embodiments, after training the first text classification models with training text information corresponding to each first text classification model, the method further includes: inputting verification text information corresponding to the first text classification model into the first text classification model to obtain a second prediction category output by the first text classification model; judging whether the loss function corresponding to the second prediction category meets a training termination condition or not; and if yes, terminating training the first text classification model. A schematic diagram of the model training and validation process for one embodiment is shown in fig. 3.

In this embodiment, through multiple iterations, a first text classification model that satisfies the training termination condition is finally obtained. In some embodiments, the training termination condition is used to limit the trained first text classification model to an under-fit model. Specifically, the training termination conditions are: the loss function corresponding to the second prediction category reaches K times of the loss function corresponding to the first prediction category, and K is a constant larger than 1. By setting a larger loss function, the trained first text classification model can be ensured to be an under-fitting model. When the first text classification model includes a plurality of under-fitting models, each under-fitting first text classification model may be limited during the training process by using the training termination condition.

For example, for the ith first text classification model, the first text classification model is assumed to be an under-fitting model, then the first prediction category output by the first text classification model is assumed to be a category to which the training text information belongs, the first prediction category corresponding to the K1 pieces of training text information in the training set is different from the true category of the training text information, and the second prediction category corresponding to the K2 pieces of verification text information in the verification set is different from the true category of the training text information. The following loss function is defined: and if the predicted result is the same as the real category of the training text information, the loss function is 0, and if the predicted result is different from the real category of the training text information, the loss function is 1. The loss function corresponding to the first prediction category is K1, and the loss function corresponding to the second prediction category is K2. When the K2/K1=K is met, stopping training the ith first text classification model, otherwise, continuing training. The above examples are illustrative only and are not limiting of the present description. Other ways of determining the loss function may also be used in the embodiments of the present disclosure, and will not be described herein.

In some embodiments, the learning rate used in training the first text classification model is greater than a preset learning rate threshold. By setting a larger learning rate, the trained first text classification model can be prevented from being overfitted.

As shown in fig. 4, a category predicting apparatus for text information according to an embodiment of the present specification, the apparatus includes:

a receiving module 402, configured to receive text information to be processed;

the first prediction module 404 is configured to predict the text information to be processed by using a plurality of first text classification models trained in advance, so as to obtain a first prediction class output by each first text classification model; wherein at least one of the respective first text classification models is an under-fitted model;

and the second prediction module 406 is configured to obtain the real category of the text information to be processed according to the first prediction category output by each first text classification model.

Specific details of the implementation process of the functions and roles of each module in the text information category prediction device are shown in the implementation process of the corresponding steps in the text information category prediction method, and are not repeated here.

For the device embodiments, reference is made to the description of the method embodiments for the relevant points, since they essentially correspond to the method embodiments. The apparatus embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical, i.e., may be located in one place, or may be distributed over a plurality of network modules. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present description. Those of ordinary skill in the art will understand and implement the present invention without undue burden.

The embodiments of the apparatus of this specification may be applied to a computer device, such as a server or a terminal device. The apparatus embodiments may be implemented by software, or may be implemented by hardware or a combination of hardware and software. Taking software implementation as an example, the device in a logic sense is formed by reading corresponding computer program instructions in the nonvolatile memory into the memory through a processor of the file processing where the device is located. In the hardware level, as shown in fig. 5, a hardware structure diagram of a computer device where the apparatus of the present disclosure is located is shown in fig. 5, and in addition to the processor 502, the memory 504, the network interface 506, and the nonvolatile memory 508 shown in fig. 5, a server or an electronic device where the apparatus is located in an embodiment may generally include other hardware according to an actual function of the computer device, which will not be described herein again.

Accordingly, the present specification embodiment also provides a computer storage medium having a program stored therein, which when executed by a processor, implements the method in any of the above embodiments.

Accordingly, the embodiments of the present disclosure also provide a server, including a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of the embodiments above when executing the program.

Embodiments of the present description may take the form of a computer program product embodied on one or more storage media (including, but not limited to, disk storage, CD-ROM, optical storage, etc.) having program code embodied therein. Computer-usable storage media include both permanent and non-permanent, removable and non-removable media, and information storage may be implemented by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to: phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Disks (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, may be used to store information that may be accessed by the computing device.

Other embodiments of the disclosure will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure herein. This disclosure is intended to cover any adaptations, uses, or adaptations of the disclosure following the general principles of the disclosure and including such departures from the present disclosure as come within known or customary practice within the art to which the disclosure pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the disclosure being indicated by the following claims.

It is to be understood that the present disclosure is not limited to the precise arrangements and instrumentalities shown in the drawings, and that various modifications and changes may be effected without departing from the scope thereof. The scope of the present disclosure is limited only by the appended claims.

The foregoing description of the preferred embodiments of the present disclosure is not intended to limit the disclosure, but rather to cover all modifications, equivalents, improvements and alternatives falling within the spirit and principles of the present disclosure.

Claims

1. A method of category prediction of text information, the method comprising:

receiving text information to be processed;

acquiring the real category of the text information to be processed according to the first prediction category output by each first text classification model;

the first text classification model is trained based on:

after training the first text classification model by respectively adopting training text information corresponding to each first text classification model, inputting verification text information corresponding to the first text classification model into the first text classification model to obtain a second prediction category output by the first text classification model;

and if the loss function corresponding to the second prediction category reaches K times of the loss function corresponding to the first prediction category, K is a constant larger than 1, and training of the first text classification model is terminated.

2. The method of claim 1, wherein the step of obtaining the true category of the text information to be processed from the first predicted category output by each of the first text classification models comprises:

carrying out weighted average on the first prediction categories output by each first text classification model to obtain the real categories of the text information to be processed; or alternatively

Taking a target first prediction category meeting preset conditions as the real category, wherein the preset conditions are as follows: the number of the first text classification models outputting the target first prediction category is greater than the number of the first text classification models outputting the other first prediction categories.

3. The method of claim 1, wherein the first text classification model is trained with a learning rate greater than a preset learning rate threshold.

4. A method according to any one of claims 1 to 3, each first text classification model being a deep learning model of underfilling.

5. A category prediction apparatus of text information, the apparatus comprising:

the second prediction module is used for acquiring the real category of the text information to be processed according to the first prediction category output by each first text classification model;

the first text classification model is trained based on the following modules:

the input module is used for inputting verification text information corresponding to the first text classification model into the first text classification model after training the first text classification model by adopting training text information corresponding to each first text classification model respectively so as to acquire a second prediction category output by the first text classification model;

and the training module is used for stopping training the first text classification model if the loss function corresponding to the second prediction category reaches K times of the loss function corresponding to the first prediction category, wherein K is a constant larger than 1.

6. A computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of any of claims 1 to 4.

7. A server comprising a memory, a processor and a computer program stored on the memory and executable on the processor, the processor implementing the method of any one of claims 1 to 4 when the program is executed.