CN111090753B

CN111090753B - Training method of classification model, classification method, device and computer storage medium

Info

Publication number: CN111090753B
Application number: CN201811244834.1A
Authority: CN
Inventors: 靳丁南; 权圣
Original assignee: Mashang Xiaofei Finance Co Ltd
Current assignee: Mashang Xiaofei Finance Co Ltd
Priority date: 2018-10-24
Filing date: 2018-10-24
Publication date: 2020-11-20
Anticipated expiration: 2038-10-24
Also published as: CN111090753A

Abstract

The application discloses a training method, a training device, a training terminal and a computer storage medium of a classification model, wherein the training method comprises the following steps: randomly acquiring a set number of first unmarked corpora from a network or a historical record; processing the first unmarked corpus to obtain a word vector of the first unmarked corpus; training a preset LSTM language model through word vectors of the corpus to establish a first model; training the second model through the first labeled corpus of the classification model to obtain the classification model; wherein the second model is obtained by adding a classification output model structure to the first model. By the method, the classification model with high accuracy can be obtained on the premise of not needing a large amount of manual labeling.

Description

Training method of classification model, classification method, device and computer storage medium

Technical Field

The present application relates to the field of model application technologies, and in particular, to a training method, a classification method, an apparatus, and a computer storage medium for a classification model.

Background

In real life, in order to solve some practical problems, models, such as classification models, are usually established according to needs, and when the models are initially established, the models are trained through a plurality of test data to obtain various index parameters, and are put into market application after the tests are qualified.

In an intelligent customer service system, a classification model is a very effective and general technology, and most of common tasks such as intention recognition, emotion recognition, entity recognition and the like are used. The classification model belongs to one kind of supervised learning, and the current supervised learning generally needs a large amount of artificial linguistic data, and the performance of the classification model can reach the commercial degree.

However, for most intelligent systems, it is very labor intensive and inefficient to collect and label artificial corpora.

Disclosure of Invention

The technical problem mainly solved by the application is to provide a training method, a classification method, a device and a computer storage medium for a classification model, and the classification model with high accuracy can be obtained on the premise of not needing a large amount of manual labeling.

In order to solve the above technical problem, the first technical solution adopted by the present application is: a training method of a classification model is provided, which comprises the following steps: randomly acquiring a set number of first unmarked corpora from a network or a historical record;

processing the first unmarked corpus to obtain a word vector of the unmarked corpus;

training a preset LSTM language model through a word vector of a first unmarked corpus to establish a first model;

training the second model through the labeled corpus of the classification model to obtain a classification model; wherein the second model is obtained by adding a classification output model structure to the first model.

In order to solve the above technical problem, the second technical solution adopted by the present application is: a classification method based on a classification model is provided, the classification model is obtained by adding a classification output model structure to a trained first model and then retraining, and the classification method comprises the following steps:

receiving a corpus to be classified;

inputting the corpus into the classification model, processing the corpus through the classification model to obtain a feature vector of the corpus, and inquiring a prediction result from a fully-connected neural network based on the feature vector;

and outputting the prediction result.

In order to solve the above technical problem, the third technical solution adopted by the present application is: provides a training device of a classification model, which comprises a corpus acquisition module, a preprocessing module, a first model building module and a second model training module,

the corpus acquisition module is used for randomly acquiring a set number of first unmarked corpuses from a network or a historical record;

the preprocessing module is used for processing the first unmarked corpus to obtain a word vector of the first unmarked corpus;

the first model building module is used for training a preset LSTM language model through word vectors of linguistic data to build a first model;

the second model training module is used for training the second model through the labeled corpora of the classification model to obtain the classification model; wherein the second model comprises the first model and a classification output model structure added on the first model.

In order to solve the above technical problem, a fourth technical solution adopted by the present application is: provides an intelligent customer service system, which comprises a classification model, wherein the classification model is obtained by adding a classification output model structure on a trained first model and then retraining the first model, and comprises a receiving module, a classification module and an output module,

the receiving module is used for receiving the linguistic data to be classified;

the classification module is used for inputting the corpus into the classification model, processing the corpus through the classification model to obtain a feature vector of the corpus, and inquiring a prediction result from the fully-connected neural network based on the feature vector;

the output module is used for outputting the prediction result.

In order to solve the above technical problem, a fifth technical solution adopted by the present application is: an intelligent system is provided, comprising a communication circuit, a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the training method of the classification model or the steps of the classification method based on the classification model when executing the computer program.

In order to solve the above technical problem, a sixth technical solution adopted in the present application is: there is provided a computer storage medium having stored thereon program data which, when executed by a processor, implements the above-described method of training a classification model or a method of classifying a model.

Compared with the prior art, the beneficial effects of this application are: in this embodiment, the existing LSTM language model is trained using the non-standardized first unlabeled corpus obtained from the network or the history to obtain the first model of the base model of the classification model. And training a second model formed by adding a classification output to the first model by using the labeling corpus of the classification model to obtain the classification model with excellent performance. Compared with the conventional common mode of directly carrying out model training through a large number of labeled corpora, the non-standardized corpora obtained from the network or the historical records do not depend on the limitation of the field of classification tasks, and the corpora do not need to be labeled, so that a large amount of manpower, financial resources and time cost can be saved, and the method is more applicable. And because the corpus of training this classification model is abundant, the classification model that obtains through this mode training is strong in universality, can carry out different classification tasks in different fields such as intelligent customer service field, reduce the cost of establishing and starting up different types of intelligent customer service systems.

Drawings

FIG. 1 is a schematic flow chart diagram illustrating an embodiment of a method for training a classification model according to the present application;

FIG. 2 is a flow diagram illustrating an embodiment of processing the unlabeled corpus in FIG. 1;

FIG. 3 is a schematic structural diagram of an embodiment of a second model of the present application;

FIG. 4 is a schematic flow diagram of one embodiment of the training of the second mold of FIG. 1;

FIG. 5 is a schematic flow chart diagram illustrating another embodiment of a method for training a classification model according to the present application;

FIG. 6 is a schematic flow chart diagram of an embodiment of the classification method of the present application;

FIG. 7 is a schematic structural diagram of an embodiment of a training apparatus for classification models according to the present application;

FIG. 8 is a schematic structural diagram of another embodiment of the training apparatus for classification models of the present application;

FIG. 9 is a schematic block diagram of an embodiment of an intelligent customer service device according to the present application;

FIG. 10 is a schematic block diagram of an embodiment of the intelligence system of the present application;

FIG. 11 is a schematic structural diagram of an embodiment of a computer storage medium according to the present application.

Detailed Description

The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

A model is a concept in machine learning, and refers to an algorithm that processes multivariate feature input values. In the supervised machine learning, a model can be understood as a multivariate function mapping relation, and the model is based on a large number of known input and output sample sets, trains to obtain a regulation and control coefficient of the function relation, and finally is applied to an actual use scene to predict a result.

The classification model is a set of input feature values, such as data to be classified, are input into the model, and a result with the highest probability is selected from a limited result set, such as classification categories. The classification model provided by the embodiment is suitable for an intelligent customer service system, and can realize classification tasks such as intention identification, emotion identification and entity identification of a user.

LSTM (Long Short-Term Memory) is a Long Short-Term Memory network, a time recurrent neural network, suitable for processing and predicting important events with relatively Long intervals and delays in time series.

The classification model of the application is converted from the LSTM language model based on the migration idea, wherein the corpus of the LSTM language model is trained to be the unmarked corpus.

As shown in fig. 1, fig. 1 is a schematic flow chart of an embodiment of a training method of a classification model according to the present application. The method comprises the following steps:

step 101: and randomly acquiring a set number of first unmarked linguistic data from a network or a historical record.

Specifically, the first unmarked corpus is corpus information crawled from the network by a crawler, and in some embodiments, the first unmarked corpus is a chinese corpus, but not limited to a chinese corpus, and in other embodiments, the first unmarked corpus may be extended to a commonly used multilingual corpus, such as english, and the like, without limitation.

In order to make the classification model more universal, the first unlabeled corpus is unlabeled spoken corpus information. And in order to facilitate the subsequent data preprocessing, the crawled text is shorter, and the number of characters is less than a preset value, such as corpus information of 20 characters. In addition, the first unmarked corpus may also be a corpus accumulated by a training terminal such as an intelligent customer service system, such as a corpus stored in a history record.

The set number may be as large as possible, and is specifically determined according to the memory and processing capability of the processor of the training terminal.

Step 102: and processing the first unmarked corpus to obtain a word vector of the first unmarked corpus.

Because the non-labeled corpus obtained directly from the network or from the history record is spoken corpus information, in order to enable a training terminal such as an intelligent customer service system to identify and process, after the first non-labeled corpus is obtained, the first non-labeled corpus is processed to obtain a word vector without the first labeled corpus. The method comprises the following steps:

step 1021: and preprocessing the first unmarked corpus, and obtaining word cutting characters of the first unmarked corpus and characteristic values of the word cutting characters.

Specifically, the training terminal, such as the intelligent customer service system, first removes the set characters, such as emoticons, discourse words, punctuation characters, etc., from the first unlabeled corpus.

The first unmarked corpus is generally a question-answer corpus and generally a sentence corpus, and the sentence corpus is inconvenient and has poor generality when training a model. In order to make the training terminal more convenient to process, reduce the calculated amount and increase the universality of the linguistic data, after the set characters in the linguistic data of the sentences are removed, the linguistic data of the sentences are cut into words to obtain the word-cutting characters of each sentence. The standard of word segmentation is the minimum unit capable of expressing accurate meaning, such as that verbs can be directly segmented into words, and indivisible linguistic data such as nouns can be segmented into words, such as place names and special nouns.

After the sentence corpus is cut into words, because the sentences are scattered into word cutting characters, in order to facilitate the recognition of the sentences, a beginning character is added to the beginning of each sentence after word cutting, and an ending character is added to the end of the sentence. And counting the word frequency of each word cutting character, and calculating the weighted value of the corresponding word cutting character according to a set algorithm and the word frequency of each word cutting character, so that the subsequent use is facilitated.

In order to improve the generality and data accuracy of the corpus and reduce the calculation amount, in the present embodiment, the word cutting characters with the word frequency lower than the set value are deleted before the weighted value calculation is performed on the word cutting characters, where the set value may be 20 or 30, and may also be determined according to the ratio of the set number of the selection range of the first unmarked corpus.

Step 1022: and performing unsupervised learning on the preprocessed first unmarked corpus by using a word embedding method based on the word cutting characters and the characteristic values thereof to obtain a word vector of the first unmarked corpus.

And performing unsupervised learning on the preprocessed first unmarked corpus by using a word embedding method based on the word cutting characters with the word frequency larger than the set value and the weighted values thereof to obtain a word vector of the first unmarked corpus.

Step 103: and training a preset LSTM language model through the word vector of the first unmarked corpus to establish a first model.

As described above, the present application builds classification models based on LSTM language models, and specifically, the LSTM language model is a two-layer LSTM structured language model. The LSTM language model is trained with the word vectors processed in step 102 to adjust model parameters of the LSTM language model.

In an alternative embodiment, the prototype of the LSTM language model is a model whose output is the word vector corresponding to the next word segment of the sentence in which the current word vector is located. And then adjusting the model parameters of the LSTM language model in real time according to the comparison between the output result and the actual result. Because the number of word vectors for training the LSTM language model is huge, a neural network language model with large capacity can be obtained.

Furthermore, a word embedding layer is added at the input end of the LSTM language model to obtain a first model, so that the first model can directly process word vectors of the first unmarked corpus.

Step 104: and training a second model through the labeled corpus of the classification model to obtain the classification model. Wherein the second model is obtained by adding a classification output model structure to the first model.

For the sake of clarity of the structure and training method of the second model, as shown in fig. 3, fig. 3 is a schematic structural diagram of an embodiment of the second model of the present application. The second model itself is obtained by adding a classification output model structure 302 on the basis of the first model 301. In particular, the first model 301 includes a word embedding layer 3011, an LSTM language model layer 3012. The classification output model structure 302 includes a feature splicing layer 3021, a fully connected network layer 3022, and an output layer 3023.

The word embedding layer 3011 is configured to process the labeled corpus of the second training model and convert the labeled corpus into word vectors, and the LSTM language model layer 3012 is configured to further process the word vectors. The feature concatenation layer 3021 is configured to concatenate the word vector output by the word embedding layer 3011 and the word vector output by the LSTM language model into a feature vector of a sentence corpus. Then, classification prediction is performed through the full-connection network layer 3022, and finally, output is performed through the output layer 3023.

For the purpose of clearly explaining the above-mentioned training method for the second model, as further shown in fig. 4, the method includes the following steps:

step 1041: and inputting the labeled linguistic data of the classification model into the first model for processing to obtain a word vector of the labeled linguistic data of the classification model and an output vector of the word vector.

Specifically, the labeled corpus used for training the second model is input into the word embedding layer 3011, and words are cut after setting characters such as expressions and language words are removed by the word embedding layer 3011, so that word-cut characters of the labeled corpus are obtained. The standard of word segmentation is the minimum unit capable of expressing accurate meaning, such as that verbs can be directly segmented into words, and indivisible linguistic data such as nouns can be segmented into words, such as place names and special nouns.

After the sentence marking corpus is cut into words through the word embedding layer 3011, a start symbol is added to the beginning of each sentence after word cutting, and an end symbol is added to the end of the sentence, so as to facilitate sentence recognition. And counting the word frequency of each word cutting character, calculating the weighted value of the corresponding word cutting character according to a set algorithm and the word frequency of each word cutting character, and obtaining the word vector of the labeled corpus sentence of the classification model based on the word cutting character with the word frequency larger than a set value and the weighted value thereof.

After the word vectors of the sentences marked with the linguistic data are obtained, the word vectors are identified through the LSTM language model layer 3012, and output vectors of the word vectors are obtained.

Step 1042: and splicing the word vector and the output vector thereof to obtain the characteristic vector of the sentence corresponding to the labeled corpus of the classification model.

Since the LSTM language model layer 3012 includes a plurality of hidden nodes, starting from the first hidden node, the received output vector is sent to the next hidden node, and therefore, theoretically, each hidden node includes feature information of a word vector currently processed, in an optional embodiment, the feature concatenation layer 3021 may directly perform end-to-end concatenation on the output vector of the word vector output by the last hidden node and the word vector of the same sentence output by the word embedding layer 3011, so as to form a feature vector of the sentence.

In order to prevent the hidden nodes from losing or damaging data when the output vectors are propagated, and the output vector data output by the last hidden node is inaccurate. In another embodiment, the average of the output vectors of all hidden nodes of the LSTM language model layer 3012 is obtained as the feature vector of the sentence. And in order to further improve the accuracy of the feature vector, the average value of the word vectors of the same sentence output by the word embedding layer 3011 is obtained through calculation, and the average value of the output vector and the average value of the word vector are subjected to head-to-tail splicing through the feature splicing layer 3021 to obtain the feature vector of the sentence.

Step 1043: and carrying out classification prediction on the labeled linguistic data of the classification model through the feature vector of the sentence to obtain a prediction result.

Specifically, the feature vectors of the sentences are sequentially input to the fully-connected neural network layer 3022 to query the prediction result, and the prediction result is output through the output layer 3023.

Step 1044: and adjusting the model parameters of the second model based on the prediction result to obtain a new classification model.

Specifically, the training terminal further obtains a prediction result output by the second model, and calculates a loss function of the second model according to the prediction result. In this embodiment, the loss function refers to a function that maps an event (an element in a sample space) to a real number expressing the opportunity cost associated with its event. More generally, the loss function is a function that measures the degree of loss and error in statistics. The larger the loss function value, the higher the error rate, and the smaller the loss function value, the lower the error rate. In the present embodiment, the model parameters of the second model are adjusted by a back propagation algorithm based on the loss function, so as to obtain a new classification model.

It should be noted that, in the present embodiment, the adjustment of the model parameters is not performed after all the markup corpora are processed, but is performed dynamically according to a certain frequency. For example, a small batch of prediction results are randomly collected to calculate a loss function in the time period, parameter adjustment is performed according to the loss function to obtain a new classification model, and the new classification model is used as a next adjustment basis. By the method, the actual adjustment effect can be achieved on the labeled corpora as much as possible, so that the performance of the classification model is optimal.

Unlike the prior art, in the present embodiment, the existing LSTM language model is trained through the non-standardized first unlabeled corpus obtained from the network or the history, so as to obtain the first model of the basic model of the classification model. And training a second model formed by adding a classification output to the first model by using the labeling corpus of the classification model to obtain the classification model with excellent performance. Compared with the conventional common mode of directly carrying out model training through a large number of labeled corpora, the non-standardized corpora obtained from the network or the historical records do not depend on the limitation of the field of classification tasks, and the corpora do not need to be labeled, so that a large amount of manpower, financial resources and time cost can be saved, and the method is more applicable. And because the corpus of training this classification model is abundant, the classification model that obtains through this mode training is strong in universality, can carry out different classification tasks in different fields such as intelligent customer service field, reduce the cost of establishing and starting up different types of intelligent customer service systems.

Further, in order to further improve the performance of the classification model, such as accuracy and precision, as shown in fig. 5, fig. 5 is a schematic flowchart of another embodiment of the training method of the classification model of the present application, and includes the following steps:

step 501: and randomly acquiring a set number of first unmarked linguistic data from a network or a historical record.

The step is the same as step 101, and please refer to the related description of step 101, which is not described herein again.

Step 502: and processing the first unmarked corpus to obtain a word vector of the first unmarked corpus.

The step is the same as the step 102, and please refer to the related description of the step 102, which is not described herein again.

Step 503: and training a preset LSTM language model through the word vector of the first unmarked corpus to establish a first model.

The step is the same as step 103, and please refer to the related description of step 103, which is not described herein again.

Step 504: and training the first language model through the unmarked corpus of the classification model to obtain the trained first model.

In an alternative embodiment, the unlabeled corpus of the classification model is different from the unlabeled corpus obtained by crawling from the network or from the history, and the unlabeled corpus prepared for training the classification model to complete the classification task corresponds to the labeled corpus of the classification model. It can be understood that the number of the unlabeled corpora of the classification model is much larger than that of the labeled corpora of the classification model, so as to reduce the cost of training the classification model.

Specifically, further refer to the structural schematic diagram of the second model in fig. 3. The first model 301 also includes an output layer 3013. The classification terminal inputs the non-labeled corpus of the classification model into the word embedding layer 3011, and performs word segmentation after removing the set characters such as expressions and language words through the word embedding layer 3011, so as to obtain the word segmentation characters of the non-labeled corpus of the classification model. The standard of word segmentation is the minimum unit capable of expressing accurate meaning, such as that verbs can be directly segmented into words, and indivisible linguistic data such as nouns can be segmented into words, such as place names and special nouns.

After the sentence corpus is cut into words by the word embedding layer 3011, a start symbol is added to the beginning of each sentence after word cutting, and an end symbol is added to the end of the sentence, so as to facilitate sentence recognition. And counting the word frequency of each word cutting character, calculating a weighted value corresponding to the word cutting character according to a set algorithm and the word frequency of each word cutting character, and performing unsupervised learning on the unmarked corpus of the preprocessed classification model by using a word embedding method based on the word cutting characters with the word frequency larger than a set value and the weighted values thereof to obtain a word vector of the unmarked corpus sentence of the classification model.

After the word vectors of the sentences without the labeled corpus of the classification model are obtained, the word vectors are identified through the LSTM language model layer 3012, and output vectors of the word vectors are obtained.

Since the LSTM language model layer 3012 includes a plurality of hidden nodes, starting from the first hidden node, the received output vector is sent to the next hidden node, and therefore, each hidden node includes feature information of the word vector currently processed.

And outputting the prediction result by using the output layer 3013, and adjusting the model parameters of the first model based on the prediction result to obtain the trained first model.

Specifically, the training terminal further obtains a prediction result output by the output layer 3013 of the first model, and calculates a loss function of the first model according to the prediction result. And adjusting the model parameters of the first model through a back propagation algorithm based on the loss function to obtain a new first model.

It should be noted that, in the present embodiment, the adjustment of the model parameters is not performed after the unlabeled corpus of all classification models is processed, but is performed dynamically according to a certain frequency. For example, a small batch of prediction results are randomly collected to calculate a loss function in the time period, parameter adjustment is performed according to the loss function to obtain a new classification model, and the new classification model is used as a next adjustment basis. By the method, the unmarked linguistic data of the classification model can be actually adjusted as much as possible, so that the performance of the classification model can be optimal.

Step 505: training a second model through the labeled corpus of the classification model to obtain the classification model; wherein the second model is obtained by adding a classification output model structure to the trained first model.

The execution manner of this step is the same as that of step 104, and the difference is that the first model of this embodiment is a new first model trained by the unlabeled corpus of the classification model, and details are not repeated here.

Different from the foregoing embodiment, in this embodiment, the first model is trained through the unlabeled corpus of the classification model, and then the second model is established through the trained first model, and the second model is trained through the labeled corpus of the classification model, so that the performance of the classification model can be further improved, and because a large amount of corpora trained on the first model are the unlabeled corpus of the classification model, no additional cost or labor cost is generated. By the semi-supervised method of firstly training the first model through the unmarked corpora of the classification model and then training the second model through the marked corpora of the classification model, the high-performance classification model can be obtained on the premise of low cost and high efficiency.

In the present embodiment, the existing LSTM language model is trained with the first label-free information obtained from the network or the history, and the first model of the base model of the classification model is obtained. And training a second model formed by adding a classification output to the first model by using the labeling corpus of the classification model to obtain the classification model with excellent performance. Compared with the conventional common mode of directly carrying out model training through a large number of labeled corpora, the first non-standardized corpora obtained from the network or the historical records do not depend on the limitation of the field of classification tasks, and the corpora do not need to be labeled, so that a large amount of manpower, financial resources and time cost can be saved, and the method is more applicable. And because the corpus of training this classification model is abundant, the classification model that obtains through this mode training is strong in universality, can carry out different classification tasks in different fields such as intelligent customer service field, reduce the cost of establishing and starting up different types of intelligent customer service systems.

Referring to fig. 6, fig. 6 is a schematic flowchart of an embodiment of the classification method based on the classification model according to the present application.

The classification model of the present embodiment is obtained by adding a classification output model structure to the trained first model and then retraining the model. The first model is obtained by training a set LSTM language model, wherein the LSTM language model is a two-layer structured LSTM language model. The first model includes a word embedding layer and an LSTM language model layer. The classification output model structure comprises a feature splicing layer, a full-connection network layer and an output layer.

Specifically, the classification model is obtained by training the classification model according to any one of the embodiments shown in fig. 1 to 5 and the associated text descriptions. Please refer to fig. 1 to 5 and the related text descriptions, which are not repeated herein.

The classification method of the present embodiment includes:

step 601: and receiving the corpus to be classified.

When a user inquires a problem or classifies data through an intelligent device such as an intelligent customer service system, the user inputs the linguistic data to be classified in a text or voice mode through a manual interface of the intelligent customer service system. Correspondingly, the corpus is received by an intelligent device, such as an intelligent customer service system.

Step 602: and inputting the corpus into a classification model, processing the corpus through the classification model to obtain a feature vector of the corpus, and inquiring a classification prediction result from a fully-connected neural network based on the feature vector.

And after receiving the linguistic data, the intelligent device inputs the linguistic data into the classification model. The classification model firstly cuts words of the speech through a word embedding layer, adds a start symbol to the beginning of each sentence after word cutting, and adds an end symbol to the tail end of the sentence so as to conveniently identify the sentence. And counting the word frequency of each word cutting character, calculating the weighted value of the corresponding word cutting character according to a set algorithm and the word frequency of each word cutting character, and obtaining the word vector of the corpus sentence based on the word cutting character with the word frequency larger than a set value and the weighted value thereof.

And then the word vectors are identified and processed through the LSTM language model layer to obtain the output vectors of the word vectors. And splicing the word vectors and the output vectors thereof through a feature splicing layer of the classification model to obtain the feature vectors of the sentences corresponding to the labeled corpus.

Because the LSTM language model layer includes a plurality of hidden nodes, starting from the first hidden node, the received output vector is sent to the next hidden node, and therefore, theoretically, each hidden node includes the feature information of the word vector currently processed, in an optional implementation, the feature concatenation layer can directly perform end-to-end concatenation on the output vector of the word vector output by the last hidden node and the word vector of the same sentence output by the word embedding layer to form the feature vector of the sentence.

In order to prevent the hidden nodes from losing or damaging data when the output vectors are propagated, and the output vector data output by the last hidden node is inaccurate. In this embodiment, the average value of the output vectors of all hidden nodes of the LSTM language model layer is obtained by the feature concatenation layer and is used as the feature vector of the sentence. And in order to further improve the accuracy of the feature vector, calculating to obtain the average value of the word vectors of the same sentence output by the word embedding layer, and then performing head-to-tail splicing on the average value of the output vector and the average value of the word vector to obtain the feature vector of the sentence.

And finally, inputting the feature vector of the sentence into a fully-connected neural network layer of the classification model to inquire a prediction result.

Step 603: and outputting the classification prediction result.

The output layer of the classification model obtains the classification prediction result queried by the full-connection network layer, and the intelligent device, such as an intelligent customer service system, outputs the classification prediction result through the artificial intelligent interface in the formats of characters, voice broadcast, pictures and the like, without limitation.

Different from the prior art, the intelligent device of the embodiment classifies the corpus and outputs the prediction result by inputting the corpus to be classified into the classification model and classifying the corpus through the classification model. The classification model trains the existing LSTM language model through the non-standardized first unlabeled corpus acquired from the network or the historical records to obtain a basic model first model of the classification model. And training a second model formed by adding a classification output to the first model by using the labeling corpus of the classification model to obtain the classification model with excellent performance. Compared with the conventional common mode of directly carrying out model training through a large number of labeled corpora, the first unlabeled corpora obtained from the network or the historical records do not depend on the limitation of the field of classification tasks, and the corpora do not need to be labeled, so that a large amount of manpower, financial resources and time cost can be saved, and the method is more applicable. And because the corpus of training this classification model is abundant, the classification model that obtains through this mode training is strong in universality, can carry out different classification tasks in different fields such as intelligent customer service field, reduce the cost of establishing and starting up different types of intelligent customer service systems.

Referring to fig. 7, fig. 7 is a schematic flow chart of an embodiment of the training device of the present application. The training device comprises a corpus obtaining module 701, a preprocessing module 702, a first model establishing module 703 and a second model training module 704.

Specifically, the corpus acquiring module 701 is configured to randomly acquire a set number of first unmarked corpuses from a network or a history.

Specifically, the first unmarked corpus is corpus information crawled from the network by the crawler by the corpus acquiring module 701, which is a chinese corpus in some embodiments, but not limited to the chinese corpus, and in other embodiments, the first unmarked corpus may also be extended to a commonly used multilingual corpus, such as the united kingdom, and the like, without limitation.

In order to make the classification model more universal, the first unlabeled corpus is nonstandard spoken corpus information. And in order to facilitate the subsequent data preprocessing, the crawled text is shorter, and the number of characters is less than a preset value, such as corpus information of 20 characters. In addition, the first unmarked corpus may also be a corpus accumulated by a training terminal such as an intelligent customer service system, such as a corpus stored in a history record. The set number may be as large as possible, and is specifically determined according to the memory and the processing capability of the processor.

The preprocessing module 702 is configured to process the first unmarked corpus to obtain a word vector of the first unmarked corpus.

Because the first unmarked corpus directly obtained from the network crawling or from the historical record is spoken corpus information, in order to enable a training terminal such as an intelligent customer service system to identify and process, after the first unmarked corpus is obtained, the first unmarked corpus is processed to obtain a word vector of the first unmarked corpus.

Specifically, the preprocessing module 702 first preprocesses the first unmarked corpus, and obtains word segmentation characters of the first unmarked corpus and feature values of the word segmentation characters.

Specifically, the preprocessing module 702 first removes the set characters, such as emoticons, tone words, punctuation characters, etc., from the first unlabeled corpus.

The non-labeled corpus is generally a question-answer corpus and is generally a sentence corpus, and the sentence corpus is inconvenient and has poor universality when a model is trained. In order to make the training device more convenient to process, reduce the amount of calculation, and increase the generality of the corpus, the preprocessing module 702 performs word segmentation on the sentence corpus after removing the set characters in the sentence corpus, so as to obtain the word-segmented characters of each sentence. The standard of word segmentation is the minimum unit capable of expressing accurate meaning, such as that verbs can be directly segmented into words, and indivisible linguistic data such as nouns can be segmented into words, such as place names and special nouns.

The preprocessing module 702 then performs unsupervised learning on the preprocessed first unmarked corpus by using a word embedding method based on the word cutting characters with the word frequency larger than the set value and the weighted values thereof to obtain a word vector of the first unmarked corpus.

The first model building module 703 is configured to train a preset LSTM language model through the word vector of the first unmarked corpus, so as to build a first model.

As described above, the present application builds a classification model based on the LSTM language model, and specifically, the present embodiment uses a language model with a two-layer LSTM structure. In an optional embodiment, the prototype of the LSTM language model is the next word vector of a sentence corresponding to the output current word vector in the question-and-answer model, and the model parameters of the LSTM language model are adjusted in real time according to the comparison between the output result and the actual result. Because the number of word vectors for training the LSTM language model is huge, a neural network language model with large capacity can be obtained.

Furthermore, a word embedding layer is added at the input end of the LSTM language model to obtain a first model, so that the first model can directly process word vectors of the corpus.

The second model training module 704 is configured to train a second model through the labeled corpus of the classification model to obtain the classification model; wherein the second model comprises the first model and a classification output model structure added on the first model. The classification output model structure comprises a feature splicing layer, a full-connection network layer and an output layer.

The second model training module 704 inputs the labeled corpus of the classification model into the first model for processing, so as to obtain a word vector of the labeled corpus of the classification model and an output vector of the word vector.

Specifically, the labeled corpus of the classification model for training the second model is input into the word embedding layer, and word segmentation is performed after the word embedding layer removes set characters such as expressions and language words, so as to obtain word segmentation characters of the labeled corpus of the classification model. The standard of word segmentation is the minimum unit capable of expressing accurate meaning, such as that verbs can be directly segmented into words, and indivisible linguistic data such as nouns can be segmented into words, such as place names and special nouns.

After words are cut in the sentence corpus through the word embedding layer, a start symbol is added to the beginning of each sentence after the words are cut, and an end symbol is added to the tail end of the sentence, so that the sentences can be conveniently recognized. And counting the word frequency of each word cutting character, calculating the weighted value of the corresponding word cutting character according to a set algorithm and the word frequency of each word cutting character, and obtaining the word vector of the labeled corpus sentence of the classification model based on the word cutting character with the word frequency larger than a set value and the weighted value thereof.

And after the word vectors of the sentences marked with the linguistic data are obtained, the word vectors are identified through an LSTM language model layer, and output vectors of the word vectors are obtained.

The second model training module 704 further splices the word vectors and the output vectors thereof to obtain feature vectors of sentences corresponding to the labeled corpus of the classification model.

Since the LSTM language model layer includes a plurality of hidden nodes, starting from a first hidden node, the received output vector is sent to a next hidden node, and therefore, theoretically, each hidden node includes feature information of a word vector currently being processed. In order to prevent the hidden nodes from losing or being damaged when propagating the output vector, which results in inaccurate output vector data output by the last hidden node, in this embodiment, the second model training module 704 obtains an average value of output vectors of all hidden nodes of the LSTM language model layer as a feature vector of the sentence. And in order to further improve the accuracy of the feature vector, calculating to obtain the average value of the word vectors of the same sentence output by the word embedding layer, and performing head-to-tail splicing on the average value of the output vector and the average value of the word vector by the feature splicing layer to obtain the feature vector of the sentence.

The second model training module 704 performs classification prediction on the labeled corpus of the classification model through the feature vector of the sentence to obtain a prediction result.

Specifically, the second model training module 704 sequentially inputs the feature vectors of the sentences to the fully-connected neural network layer for querying the prediction result, and outputs the prediction result through the output layer.

Finally, the second model training module 704 adjusts the model parameters of the second model based on the prediction result to obtain a new classification model.

Specifically, the second model training module 704 obtains a prediction result output by the second model, and calculates a loss function of the second model according to the prediction result. In this embodiment, the loss function refers to a function that maps an event (an element in a sample space) to a real number expressing the opportunity cost associated with its event. More generally, the loss function is a function that measures the degree of loss and error in statistics. The larger the loss function value, the higher the error rate, and the smaller the loss function value, the lower the error rate. In the present embodiment, the model parameters of the second model are adjusted by a back propagation algorithm based on the loss function, so as to obtain a new classification model.

Different from the prior art, in this embodiment, the first model building module trains the existing LSTM language model through the non-standardized first unlabeled corpus obtained by the corpus obtaining module from the network or the history record, so as to obtain the first model of the basic model of the classification model. And the second model training module trains a second model formed by adding a classification model output structure to the first model by utilizing the labeled corpus of the classification model to obtain the classification model with excellent performance. Compared with the conventional common mode of directly carrying out model training through a large number of labeled corpora, the non-standardized corpora obtained from the network or the historical records do not depend on the limitation of the field of classification tasks, and the corpora do not need to be labeled, so that a large amount of manpower, financial resources and time cost can be saved, and the method is more applicable. And because the corpus of training this classification model is abundant, the classification model that obtains through this mode training is strong in universality, can carry out different classification tasks in different fields such as intelligent customer service field, reduce the cost of establishing and starting up different types of intelligent customer service systems.

In another embodiment, as shown in fig. 8, fig. 8 is a schematic structural diagram of another embodiment of the training apparatus for classification models of the present application, and unlike the training apparatus of the above embodiment, the training apparatus of the present application further includes a first model training module 805. The first model training module 805 is configured to train the first language model through the unlabeled corpus of the classification model to obtain a trained first model.

The unlabeled corpora of the classification model are different from the unlabeled corpora obtained by crawling from the network or the historical records, and correspond to the unlabeled corpora for the classification task, and it can be understood that the number of the unlabeled corpora is far greater than that of the labeled corpora of the classification model, so that the cost for training the classification model is reduced.

The first model training module 805 adds a start symbol to the beginning of each sentence after word segmentation and adds an end symbol to the end of the sentence after word segmentation after the word segmentation is performed on the sentence corpus corresponding to the unlabeled corpus of the classification model by the word embedding layer, so as to facilitate sentence recognition. And counting the word frequency of each word cutting character, calculating a weighted value corresponding to the word cutting character according to a set algorithm and the word frequency of each word cutting character, and performing unsupervised learning on the preprocessed unmarked corpus of the classification model by using a word embedding method based on the word cutting character with the word frequency larger than a set value and the weighted value thereof to obtain a word vector of the unmarked corpus sentence of the classification model.

After the word vectors of the sentences without the labeled corpus are obtained, the first model training module 805 performs recognition processing on the word vectors through the LSTM language model layer to obtain output vectors of the word vectors.

In the embodiment, because the data volume is large, in order to reduce the calculation amount and the training time, the output vector of the word vector directly output through the last hidden node is the output vector of the word vector corresponding to the unlabeled corpus of the classification model.

The first model training module 805 outputs the prediction result by using the output layer, and adjusts the model parameters of the first model based on the prediction result to obtain the trained first model.

Specifically, the training terminal further obtains a prediction result output by the first model, and calculates a loss function of the first model according to the prediction result. And adjusting the model parameters of the first model through a back propagation algorithm based on the loss function to obtain a new first model.

Different from the foregoing embodiment, in this embodiment, the first model training module is used to train the unlabeled corpus of the classification model first, and then the second model is established by the trained first model and the labeled corpus of the classification model is used to train the second model, so that the performance of the classification model can be further improved, and since a large amount of corpora trained on the first model are the unlabeled corpus of the classification model, no extra cost and labor cost are generated. The classification model is trained by the semi-supervised method of firstly training the first model through the unmarked corpus of the classification model and then training the second model through the marked corpus of the classification model, so that the high-performance classification model can be obtained on the premise of low cost and high efficiency.

Referring to fig. 9, fig. 9 is a schematic structural diagram of an embodiment of the intelligent customer service device according to the present application. The intelligent customer service device of the embodiment comprises a classification model, wherein the classification model of the embodiment is obtained by adding a classification output model structure to a trained first model and then retraining the model. The first model is obtained by training a set LSTM language model, wherein the LSTM language model is a two-layer structured LSTM language model. The first model includes a word embedding layer and an LSTM language model layer. The classification output model structure comprises a feature splicing layer, a full-connection network layer and an output layer.

The method specifically comprises the following steps: a receiving module 901, a classifying module 902 and an outputting module 903.

The receiving module 901 is configured to receive a corpus to be classified.

When a user inquires a problem or classifies data through an intelligent device such as an intelligent customer service system, the user inputs the linguistic data to be classified in a text or voice mode through a manual interface of the intelligent customer service system. Correspondingly, the receiving module 901 receives the corpus.

The classification module 902 is configured to input the corpus into the classification model, process the corpus through the classification model to obtain a feature vector of the corpus, and query a classification prediction result from a fully-connected neural network based on the feature vector.

After the receiving module 901 receives the corpus, the classifying module 902 cuts the corpus by using the word embedding layer thereof, adds a start symbol to the beginning of each sentence after word cutting, and adds an end symbol to the end of the sentence, so as to identify the sentences conveniently. And counting the word frequency of each word cutting character, calculating the weighted value of the corresponding word cutting character according to a set algorithm and the word frequency of each word cutting character, and obtaining the word vector of the corpus sentence based on the word cutting character with the word frequency larger than a set value and the weighted value thereof.

The classification module 902 identifies the word vector through the LSTM language model layer of the classification model to obtain an output vector of the word vector. And splicing the word vectors and the output vectors thereof through a feature splicing layer of the classification model to obtain the feature vectors of the sentences corresponding to the labeled corpus.

In order to prevent the hidden nodes from losing or damaging data when the output vectors are propagated, and the output vector data output by the last hidden node is inaccurate. In this embodiment, the classification module 902 obtains an average value of output vectors of all hidden nodes of the LSTM language model layer through the feature concatenation layer as a feature vector of the sentence. And in order to further improve the accuracy of the feature vector, calculating to obtain the average value of the word vectors of the same sentence output by the word embedding layer, and then performing head-to-tail splicing on the average value of the output vector and the average value of the word vector to obtain the feature vector of the sentence.

Finally, the classification module 902 inputs the feature vector of the sentence into the fully-connected neural network layer of the classification model to query the classification prediction result.

The output module 903 is used for outputting the prediction result.

The output layer of the classification model obtains the classification prediction result queried by the full-connection network layer, and the output module 903 outputs the prediction result through the artificial intelligence interface in the formats of characters, voice broadcast, pictures and the like, which is not limited herein.

Different from the prior art, the classification module of the embodiment inputs the corpus to be classified received by the receiving module into the classification model, classifies the corpus through the classification model, and outputs the prediction result through the output module. The classification model trains the existing LSTM language model through the non-standardized first unlabeled corpus acquired from the network or the historical records to obtain a basic model first model of the classification model. And training a second model formed by adding a classification output to the first model by using the labeling corpus of the classification model to obtain the classification model with excellent performance. Compared with the conventional common mode of directly carrying out model training through a large number of labeled corpora, the non-standardized corpora obtained from the network or the historical records do not depend on the limitation of the field of classification tasks, and the corpora do not need to be labeled, so that a large amount of manpower, financial resources and time cost can be saved, and the method is more applicable. And because the corpus of training this classification model is abundant, the classification model that obtains through this mode training is strong in universality, can carry out different classification tasks in different fields such as intelligent customer service field, reduce the cost of establishing and starting up different types of intelligent customer service systems.

Referring to fig. 10, fig. 10 is a schematic structural diagram of an embodiment of the intelligent system of the present application. The intelligent system of the embodiment can be an intelligent mobile terminal, such as a computer, and can also be an intelligent customer service terminal, such as a network customer service terminal. The intelligent system 100 of this embodiment comprises a communication circuit 1001, a memory 1002, a processor 1003, and a computer program stored on the memory and executable on the processor, coupled by a bus. The processor 1003, when executing the computer program, can implement the method for training the classification model according to any one of the embodiments shown in fig. 1 and 5 and the associated text description, and can also implement the method for classification according to any one of the embodiments shown in fig. 6 and the associated text description.

Referring to fig. 11, the present application further provides a schematic structural diagram of an embodiment of a computer storage medium. In this embodiment, the computer storage medium 110 stores computer instructions 1101 executable by a processor, where the computer instructions 1101 are configured to execute the method for training the classification model according to any one of the embodiments shown in fig. 1 and 5 and their associated text descriptions, and also to implement the classification method according to any one of the embodiments shown in fig. 6 and the associated text descriptions.

The computer storage medium 110 may be a medium that can store computer instructions, such as a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, or an optical disk, or may be a server that stores computer instructions, and the server may send the stored computer instructions to other devices for operation or may self-operate the stored computer instructions.

In the several embodiments provided in the present application, it should be understood that the disclosed method and apparatus may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, e.g., a unit or division of units is merely a logical division, and other divisions may be realized in practice, e.g., a plurality of units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.

Units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the embodiment.

In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.

The integrated unit, if implemented in the form of a software functional unit and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application may be substantially implemented or contributed to by the prior art, or all or part of the technical solution may be embodied in a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, a network device, or the like) or a processor (processor) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The above description is only for the purpose of illustrating embodiments of the present application and is not intended to limit the scope of the present application, and all modifications of equivalent structures and equivalent processes, which are made by the contents of the specification and the drawings of the present application or are directly or indirectly applied to other related technical fields, are also included in the scope of the present application.

Claims

1. A training method of a classification model is characterized by comprising the following steps:

randomly acquiring a set number of first unmarked corpora from a network or a historical record;

processing the first unmarked corpus to obtain a word vector of the first unmarked corpus;

training a preset LSTM language model through the word vectors of the first unmarked corpus to establish a first model; the first model sequentially comprises a word embedding layer and an LSTM language model layer;

training a second model through the labeled corpus of the classification model to obtain the classification model; the second model is obtained by adding a classification output model structure to the first model, wherein the classification output model structure comprises a feature splicing layer, a full-connection network layer and an output layer;

training a second model through the labeled corpus of the classification model to obtain the classification model comprises the following steps:

inputting the labeling linguistic data of the classification model into the first model for processing to obtain word vectors of the labeling linguistic data of the classification model and output vectors of the word vectors;

splicing the word vector and the output vector thereof to obtain a feature vector of a sentence corresponding to the labeled corpus of the classification model;

classifying and predicting the labeled linguistic data of the classification model through the feature vector of the sentence to obtain a prediction result;

and adjusting the model parameters of the second model based on the prediction result to obtain the classification model.

2. Training method according to claim 1,

the step of inputting the markup corpus of the classification model into the first model for processing to obtain the word vectors of the markup corpus of the classification model and the output vectors of the word vectors includes:

inputting the labeling linguistic data of the classification model into the word embedding layer for processing to obtain word vectors of the labeling linguistic data of the classification model;

identifying the word vectors through the LSTM language model layer to obtain output vectors of the word vectors;

the step of splicing the word vector and the output vector thereof to obtain the feature vector of the sentence corresponding to the labeled corpus of the classification model comprises the following steps:

inputting the output vector of the word vector and the word vector into the feature splicing layer for splicing to obtain the feature vector of the sentence corresponding to the labeled corpus of the classification model;

the step of performing classification prediction on the labeled corpus of the classification model through the feature vector of the sentence to obtain a prediction result specifically includes:

and inputting the feature vector of the sentence into the fully-connected neural network to inquire a prediction result, and outputting the prediction result through an output layer.

3. The training method according to claim 2, wherein the step of splicing the word vector and the output vector thereof to obtain the feature vector of the sentence corresponding to the labeled corpus of the classification model specifically comprises:

and performing ending splicing on the average value of the output vectors of all the hidden nodes of the LSTM language model layer and the average value of the word vectors to obtain the characteristic vector of the sentence corresponding to the labeled corpus of the classification model.

4. The training method of claim 2, wherein the first model further comprises an output layer,

after the step of establishing the first model by training a preset LSTM language model through the word vector of the first unmarked corpus, the step of obtaining the classification model by training a second model through the marked corpus of the classification model further comprises the following steps:

training the first model through the unmarked corpus of the classification model to obtain a trained first model; the second model is obtained by adding a classification output model structure to the trained first model.

5. A classification method based on a classification model is characterized in that the classification model is obtained by adding a classification output model structure to a trained first model and then retraining;

the classification method comprises the following steps:

receiving a corpus to be classified;

inputting the corpus into the classification model, processing the corpus through the classification model to obtain a feature vector of the corpus, and inquiring a classification prediction result from a fully-connected neural network based on the feature vector;

and outputting the classification prediction result.

6. The classification method according to claim 5, wherein the classification model is obtained by training the classification model according to any one of claims 1 to 4.

7. A training device for classification models is characterized by comprising a corpus acquisition module, a preprocessing module, a first model building module and a second model training module,

the corpus acquiring module is used for randomly acquiring a set number of first unmarked corpuses from a network or a historical record;

the first model building module is used for training a preset LSTM language model through the word vector of the first unmarked corpus and building a first model, and the first model sequentially comprises a word embedding layer and an LSTM language model layer;

the second model training module is used for training a second model through the labeled corpus of the classification model to obtain the classification model; the second model comprises the first model and a classification output model structure added on the first model, and the classification output model structure comprises a feature splicing layer, a fully-connected network layer and an output layer;

the second model training module is used for training the second model through the labeled corpus of the classification model to obtain the classification model, and comprises:

8. An intelligent customer service device, characterized in that the intelligent customer service device comprises a classification model, the classification model is obtained by adding a classification output model structure on a trained first model and then retraining, in particular, the classification model is obtained by training the training method of the classification model according to any one of claims 1-4, and comprises a receiving module, a classification module and an output module,

the output module is used for outputting the prediction result.

9. An intelligent system, comprising:

communication circuitry, a memory, a processor, and a computer program stored on the memory and executable on the processor, the processor implementing the steps of the training method of the classification model according to any one of claims 1 to 4 or the steps of the classification method according to any one of claims 5 to 6 when executing the computer program.

10. A computer storage medium having stored thereon program data which, when executed by a processor, implements a method of training a classification model according to any one of claims 1 to 4 or a method of classification according to any one of claims 5 to 6.