CN116861302B

CN116861302B - Automatic case classifying and distributing method

Info

Publication number: CN116861302B
Application number: CN202311133641.XA
Authority: CN
Inventors: 杨伊态; 许继伟; 段春先; 黄亚林; 张兆文; 李成涛; 陈胜鹏; 刘惠娟; 王敬佩; 李颖
Original assignee: Geospace Information Technology Co ltd
Current assignee: Geospace Information Technology Co ltd
Priority date: 2023-09-05
Filing date: 2023-09-05
Publication date: 2024-01-23
Anticipated expiration: 2043-09-05
Also published as: CN116861302A

Abstract

The invention provides an automatic case classifying and distributing method, which comprises the steps of designing a neural network model with fewer parameters and simpler structure as a student model, taking a trained and complex case classifying and distributing model as a teacher model, distilling knowledge of the teacher model to the student model through a knowledge distillation technology, and finally classifying and distributing cases by using the student model as an reasoning model. Compared with the existing neural network method, the method can greatly improve the reasoning speed; compared with the direct use of data to train student models, the model generalization capability obtained based on knowledge distillation is better, and the accuracy is higher.

Description

Automatic case classifying and distributing method

Technical Field

The invention relates to the field of information processing, in particular to an automatic case classification and allocation method.

Background

In the city government service, citizens report their own appeal information through a plurality of approaches such as telephone, website, weChat applet, etc. The government hot line business personnel are required to classify each complaint case and distribute the complaint cases to the authorities for processing according to the case types and other information. With the trend of busy government hotlines, the manual classification and allocation mode of complaint cases is difficult to meet business requirements. The method for realizing automatic classification and allocation of complaint cases by using machine learning, artificial intelligence and other technologies has been widely used in government hotlines gradually.

However, the existing government hot line classification allocation method has higher accuracy, but because of large model parameters and complex model structure, the model processing time is longer, so that the service response efficiency is low, and the service requirement is difficult to meet on hardware with limited cost.

The existing methods for classifying and distributing cases are mainly divided into three types:

the first category is classification methods based on rule matching. The method automatically classifies and distributes the input case text by constructing classification and distribution rules, such as keyword matching. When the rules are fewer, the method has high reasoning speed; however, in large-scale application, rules are gradually increased and complicated, and reasoning speed is gradually slow.

The second category is machine learning based methods. The method uses historical sample data to train machine learning algorithms or models such as TD-IDF, support vector machines and the like, and then uses the trained algorithms or models to automatically classify and allocate case texts. Such methods are fast to infer compared to neural network-based methods. However, when the classification of the cases is not clear and the type of the department director is not clear, the classification and allocation accuracy of the cases is low.

The third class is neural network based methods. The method comprises the steps of constructing a neural network, training a neural network model by using historical sample data, and then automatically classifying and distributing cases by using the trained model. Compared with the method based on rule matching and machine learning, the method has higher accuracy. However, the existing case classification and allocation model based on the neural network is low in reasoning speed, and is difficult to meet timeliness of government hot line business.

Disclosure of Invention

The invention provides an automatic case classifying and distributing method aiming at the technical problems in the prior art, which comprises the following steps:

acquiring a training sample set, wherein the training sample set comprises a plurality of samples, and each sample comprises case information, case categories and department categories;

inputting each sample into a first-stage linear layer of a teacher model, and outputting a first primary case type prediction vector and a first primary department type prediction vector; inputting each sample into a first-stage linear layer and a second-stage linear layer of a student model, and respectively outputting a second primary case type prediction vector, a second primary department type prediction vector, a second case type prediction vector and a second department type prediction vector, wherein the student model is less than the teacher model in number of hierarchical modules, the dimension of each hierarchical model is small, and the teacher model is trained in advance;

calculating a loss of the student model based on the first primary case category prediction vector, the first primary department category prediction vector, the second primary case category prediction vector, the second primary department category prediction vector, the second case category prediction vector, and the second department category prediction vector;

based on the loss, adjusting and updating parameters of the student model to obtain a trained student model;

and inputting the classified case information into the trained student model to obtain the case type and department type output by the student model.

According to the case automatic classifying and distributing method provided by the invention, a neural network model with fewer parameters and simpler structure is designed to serve as a student model, a trained and complex case classifying and distributing model serves as a teacher model, and knowledge of the teacher model is distilled to the student model through a knowledge distillation technology. And finally, classifying and allocating the cases by using the student model as an inference model. Compared with the existing neural network method, the method can greatly improve the reasoning speed; compared with the direct use of data to train student models, the model generalization capability obtained based on knowledge distillation is better, and the accuracy is higher.

Drawings

FIG. 1 is a flow chart of a case automatic classification and allocation method provided by the invention;

FIG. 2 is a schematic diagram of teacher model training and reasoning;

FIG. 3 is a schematic diagram of a student model training;

fig. 4 is a schematic diagram of a teacher model and a student model training.

Detailed Description

For the purpose of making the objects, technical solutions and advantages of the embodiments of the present invention more apparent, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention, and it is apparent that the described embodiments are some embodiments of the present invention, but not all embodiments of the present invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention. In addition, the technical features of each embodiment or the single embodiment provided by the invention can be combined with each other at will to form a feasible technical scheme, and the combination is not limited by the sequence of steps and/or the structural composition mode, but is necessarily based on the fact that a person of ordinary skill in the art can realize the combination, and when the technical scheme is contradictory or can not realize, the combination of the technical scheme is not considered to exist and is not within the protection scope of the invention claimed.

The invention provides a case automatic classification and allocation method, which mainly comprises 2 parts and a training stage: training a student model using the history data and the teacher model; reasoning: and classifying and distributing the government affair hot line cases by using a student model. Referring to fig. 1, the automatic case classification and allocation method provided by the invention comprises the following steps:

step 1, a training sample set is obtained, wherein the training sample set comprises a plurality of samples, and each sample comprises case information, case categories and department categories.

It is understood that each piece of the sample set is in the format of [ case information, area information, case category, department category ], where the area information is optional. The sample set is proportionally divided into a training sample set and a validation sample set.

Such as: sample a [ "daily noise gives people no way to sleep, complaints, and has gone through an hour, too slow", "pyridazine", "440", "105" ], where 440 and 105 are case category code and department category code, respectively.

Step 2, inputting each sample into a first-stage linear layer of a teacher model, and outputting a first primary case type prediction vector and a first primary department type prediction vector; and inputting each sample into a first-stage linear layer and a second-stage linear layer of a student model, and respectively outputting a second primary case type prediction vector, a second primary department type prediction vector, a second case type prediction vector and a second department type prediction vector, wherein the student model is less than the teacher model in number of hierarchical modules, the dimension of each hierarchical model is small, and the teacher model is trained in advance.

It will be appreciated that each sample in the training sample set is input into a teacher model and a student model, respectively, wherein the teacher model is in a state in which the parameters of the teacher model are fixed after training.

Fig. 2 is a schematic diagram of a teacher model, where the teacher model includes a text embedding model, two first-stage linear layers, i.e., the linear layer 11 and the linear layer 12 in fig. 2, four second-stage linear layers, i.e., the linear layer 13, the linear layer 14, the linear layer 15 and the linear layer 16 in fig. 2, and two class embedding matrices. The text embedded model in the teacher model is a Bert model, and the Bert model comprises 12 transducer modules.

Fig. 3 is a schematic diagram of a structure of a student model, which includes a text embedding model, two first-stage linear layers, i.e., the linear layer 21 and the linear layer 22 in fig. 3, four second-stage linear layers, i.e., the linear layer 23, the linear layer 24, the linear layer 25 and the linear layer 26 in fig. 3, and two category embedding matrices. The text embedding model in the student model is a TinyBert model, which includes 4 transform modules. The two first-stage linear layers, the four second-stage linear layers and the two category embedding matrices in the student model are smaller in dimension than the two first-stage linear layers, the four second-stage linear layers and the two category embedding matrices in the teacher model.

Specifically, the text embedding model of the teacher model is a Bert model, and the Bert model comprises 12 Transformer modules, and the dimension of the text embedding vector is 768 dimensions; the teacher model also contains 6 linear layers and 2 class embedding matrices. The linear layer 11 is the number of case categories, wherein the input dimension of the primary case prediction module is 768, the output dimension is Nclass, and Nclass is the number of case categories. The linear layer 13 and the linear layer 14 are secondary case prediction modules, wherein the input and output dimensions of the linear layer 13 are 768, the input dimension of the linear layer 14 is 768, and the output dimension is 1. The linear layer 12 inputs 768 dimensions for the primary department prediction module, ndepa for the output dimensions and Ndepa for the number of department categories. The linear layer 15 and the linear layer 16 are secondary department prediction modules, wherein the input and output dimensions of the linear layer 15 are 768, and the input and output dimensions 768 and the output dimension 1 of the linear layer 16 are respectively. The 2 category matrices are a case category embedding matrix 1 and a department category embedding matrix 1, respectively, wherein the dimension of the case category embedding matrix is [ Nclass,768], and the dimension of the department category embedding matrix is [ Npepa, 768].

The text embedding model of the student model is a TinyBert model, the TinyBert model comprises 4 transducer modules, and the dimension of the text embedding vector is 312 dimensions; the student model also contained 6 linear layers and 2 class embedding matrices. The linear layer 21 is the number of case categories with an input dimension of 312, an output dimension of Nclass, and Nclass. The linear layer 23 and the linear layer 24 are secondary case prediction modules, wherein the input and output dimensions of the linear layer 23 are 312, the input dimension of the linear layer 24 is 312, and the output dimension is 1. The linear layer 22 inputs a dimension 312 for the primary department prediction module, and outputs a dimension Ndepa, which is the number of department categories. The linear layer 25 and the linear layer 26 are secondary department prediction modules, wherein the input and output dimensions of the linear layer 25 are 312, the input dimension of the linear layer 26 is 312, and the output dimension is 1. The 2 category matrices are a case category embedding matrix 2 and a department category embedding matrix 2, respectively, wherein the dimension of the case category embedding matrix is [ Nclass, 312], and the dimension of the department category embedding matrix is [ Npepa, 312].

Wherein, the teacher model is a neural network model which has been trained, namely, the Bert model, the linear layer 11, the linear layer 12, the linear layer 13, the linear layer 14, the linear layer 15, the linear layer 16, the case type embedding matrix 1 and the department type embedding matrix 1 are always fixed and do not change in the training process.

Wherein, as an embodiment, the inputting each sample into the first level linear layer of the teacher model outputs the first primary case category prediction vector and the first primary department category prediction vector, and includes: and inputting each sample into a Bert model of the teacher model to obtain a first text embedded vector, respectively inputting the first text embedded vector into two first-stage linear layers of the teacher model to obtain a first primary case type prediction vector output by one first-stage linear layer and a first primary department type prediction vector output by the other first-stage linear layer.

Inputting each sample into a first-stage linear layer and a second-stage linear layer of the student model, respectively outputting a second primary case type predictive vector, a second primary department type predictive vector, a secondary case type predictive vector and a secondary department type predictive vector, wherein the method comprises the following steps: and inputting each sample into a TinyBert model of the student model to obtain a second text embedded vector, respectively inputting the second text embedded vector into two first-stage linear layers of the student model to obtain a second primary case type prediction vector output by one first-stage linear layer and a second primary department type prediction vector output by the other first-stage linear layer. Based on a second primary case type prediction vector output by the student model, acquiring a related case type embedding matrix from a case type related linked list, inputting the second primary case type prediction vector and the related case type embedding matrix into a second linear layer of the student model, and outputting a secondary case type prediction vector; based on the second primary department category prediction vector output by the student model, acquiring a related department category embedding matrix from a department category related linked list, inputting the second primary department category prediction vector and the related department category embedding matrix into a second linear layer of the student model, and outputting a secondary department category prediction vector; the case type related linked list and the department type related linked list are generated after training of the teacher model.

Based on the training sample set and the trained teacher model, a schematic diagram of training the student model can be seen in fig. 4, and the training process is as follows:

for each sample in the training sample set, respectively inputting a Bert model in a teacher model and a TinyBert model of a student model, and converting the Bert model and the TinyBert model into text embedded vectors, wherein the steps are as follows:

if the region information exists in the sample, the region information and the case information are combined to obtain the combined case information.

Such as: the case information after the combination of the samples a is that' the noise of the pyridazine area on the day can not cause people to sleep, complaints are generated, people do not exist after one hour, and the case information is too slow and too slow.

And converting the combined case information into corresponding word element codes by using a Bert model word segmentation device in the teacher model, and adding special character codes [ CLS ], [ SEP ] at the head and the tail of the codes to form the word element codes of the case information. The Bert model of the teacher model in the invention is a Chinese-Bert-wwm-ext Bert (Bidirectional Encoder Representation from Transformers) pre-training model.

Such as: the case information after the combination of the sample a is converted into the character code based on the Bert word segmentation device in the teacher model, and the character code is as follows: [101, 1515, 1515, 1277, 1921, 1692, 7509, 6375, 782, 3187, 3791, 4717, 6230, 8024, 2347, 2832, 6401, 8024, 6814, 749, 671, 702, 2207, 3198, 738, 3766, 782, 5052, 8024, 1922, 2714, 8024, 1922, 2714, 102]. Where 101 is the encoding of the special character 'CLS' and 102 is the encoding of the special character 'SEP'. For each entity word vector, it starts with code "101" and ends with code "102".

And respectively inputting the word element codes of the case information into a Bert model in the teacher model and a TinyBert model in the student model. The teacher model and the student model respectively obtain the embedded vector E of the special character CLS _CLS-T ，E _CLS-S The method comprises the steps of carrying out a first treatment on the surface of the Wherein E is _CLS-T Representing semantic features of the whole input text after being converted by the Bert model; e (E) _CLS-S Representing semantic features of the entire input text after conversion by the TinyBert model.

Will embed vector E _CLS-T Inputting the vector into a linear layer 11 in a teacher model to obtain a primary case type prediction vector ClassDist1 _T , ClassDist1 _T Is N _class A one-dimensional vector of dimensions; will embed vector E _CLS-T The linear layer 12 is input into the teacher model to obtain the first department class prediction vector DepaDist1 _T , Depa Dist1 _T Is N _depa A one-dimensional vector of dimensions.

Will embed vector E _CLS-S The linear layer 21 is input into the student model to obtain a primary case type prediction vector ClassDist1 _S , ClassDist1 _S Is N _class A one-dimensional vector of dimensions; will embed vector E _CLS-S The linear layer 22 is input into the student model to obtain the primary department class prediction vector DepaDist1 _S , DepaDist1 _S Is N _depa A one-dimensional vector of dimensions.

Primary class prediction vector ClassDist1 of student model _T After a primary case type prediction result is obtained through a Softmax layer, finding out a case related type from a case type related linked list according to the primary case type prediction result, and obtaining a related case type embedding matrix RelateEncoder from a case type embedding matrix according to the found case related type _class There may be a plurality of case related categories, and the case related categories include self category, relateEncoder _class Is one [ M ] _class ，312]Matrix of dimensions, where M _class Representing the number of related categories. Similarly, the related department category embedding matrix RelateEncoder can be obtained _depa . The case type related linked list and the department type related linked list of the student model are generated after the teacher model is trained, and the related case type corresponding to each case type is recorded.

The case type related chain table and the department type related chain table of the student model are case type related chain tables and department type related chain tables which are trained in the teacher model are directly copied (but the case type embedding matrix and the department type embedding matrix are not because the teacher model is 768-dimensional, and the student model is only 312-dimensional).

Taking the case category related linked list as an example: the linked list is composed of a plurality of key value pairs, each key is a case category, and the value corresponding to each key is the case category related to the case category, wherein the case category related to the case category is calculated by the category.

If the case category predicted for the first time is 449, the key 449 is found from the case category related linked list, and then the corresponding values 362,1 and 449 are obtained, i.e. category 362, category 1 and category 449 are all 449 related categories. Then extracting the vectors of the corresponding categories 362,1 and 449 from the case category embedding matrix to obtain a matrix of [3, 312].

Will embed vector E _CLS S is directly and directly dot multiplied by each row of the related case category embedding matrix according to dimensions, the dot multiplied result is used as input of a linear layer 25, and a case category prediction result Res2 of the student model is obtained through the linear layer 25 and the linear layer 26 _class 。

Embedding vector E _CLS-S Is [1, 312]]Vector of dimensions, assuming that there are 4 related case categories, the related case category embedding matrix has dimensions of [4, 312]. Embedding vector E _CLS-S Each row of the embedding matrix of the related case category is directly and directly dot-multiplied according to dimensions, and the obtained result is [4, 312]. The linear layer 25 is input, and the intermediate result dimension is [4, 312]. The intermediate result is input to a linear layer 26, the final result being obtained in dimensions [4,1]。

And 3, calculating the loss of the student model based on the first primary case type prediction vector, the first primary department type prediction vector, the second primary case type prediction vector, the second primary department type prediction vector, the second case type prediction vector and the second department type prediction vector.

The loss of the student model is as follows: calculating case category KL divergence between the first primary case category prediction vector and the second primary case category prediction vector, and calculating department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector; calculating a first cross entropy loss between the actual case category of the sample and the second primary case category prediction vector, and calculating a second cross entropy loss between the actual department category of the sample and the second primary department category prediction vector; calculating a third cross entropy loss between the actual case type of the sample and the secondary case type prediction vector, and calculating a fourth cross entropy loss between the actual department type of the sample and the secondary department type prediction vector; the loss of the student model is calculated based on the case category KL divergence, the department category KL divergence, the first cross entropy loss, the second cross entropy loss, the third cross entropy loss, and the fourth cross entropy loss.

It will be appreciated that vector E will be embedded _CLS-T Line input into teacher modelThe sex layer 11 obtains a primary case type prediction vector ClassDist1 _T Will embed vector E _CLS-T The linear layer 12 is input into the teacher model to obtain the first department class prediction vector DepaDist1 _T 。

Will embed vector E _CLS-S The linear layer 21 is input into the student model to obtain a primary case type prediction vector ClassDist1 _S The method comprises the steps of carrying out a first treatment on the surface of the Will embed vector E _CLS-S The linear layer 22 is input into the student model to obtain the primary department class prediction vector DepaDist1 _S 。

Calculation of ClassDist1 _T And ClassDist1 _S KL divergence of (2)，DepaDist1 _T And DepaDist1 _S KL divergence KL (DepaDist 1) _T , DepaDist1 _S ). KL divergence can measure the similarity between two probability distributions, +.>The calculation formula of (2) is as follows:

；

wherein the method comprises the steps ofIs the primary case category predictive vector of the teacher modelProbability distribution of->Is the prediction vector of the first case category of the student model +.>Is a probability distribution of (c). The former term in the formula is +.>Divergence, the latter term is +>Dispersion degree [ ]And->This is characteristic of KL divergence) the KL divergence is calculated in the present invention using the KL divergence loss function f.kl _ div provided by the pytorch framework.

Similarly, KL (DepaDist 1 _T ,DepaDist1 _S ) The calculation formula of (2) is as follows:

；

wherein the method comprises the steps ofIs the first primary department category predictive vector output by the teacher modelProbability distribution of->Second department primary class prediction vector, which is output by student model +.>Is a probability distribution of (c).

And calculating the KL divergence between the primary case type prediction result of the student model and the primary case type prediction result of the teacher model, and the KL divergence between the primary department type prediction result of the student model and the primary department type prediction result of the teacher model.

Then, calculating cross entropy loss CrossEntropy1 between the actual case type of the sample and the primary case type prediction result of the student model _class . Cross entropy loss is a common loss function in neural networks, and is mainly used in multi-classification tasks to measure the difference between the predicted value of a model and an actual label. Calculation in this applicationCross entropy loss the cross entropy loss function f. Cross entropyloss provided by the pytorch framework is used. Similarly, cross entropy loss CrossEntropy1 between actual department category of the sample and primary department category prediction result of student model is calculated _depa 。

And then predicting result Res2 according to the secondary case category of the student model _class Calculating a two-class cross entropy loss BinaryEntropy2 of the case class with the actual case class of the sample _class . Similarly, a two-class cross entropy loss BinaryEntropy2 for the department class can be obtained _depa . The present invention computes a two-class cross entropy loss using a two-class cross entropy loss function f. Binary_cross_entry_with_logits provided by the pytorch framework.

The total Loss of model Loss is calculated and the parameters of the student model are updated using gradient descent. The calculation formula of the total Loss is as follows:

；

wherein,for the case category KL divergence between the first primary case category predictive vector and the second primary case category predictive vector,/for the case category KL divergence between the first primary case category predictive vector and the second primary case category predictive vector>For department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector,、/>、/>、respectively the first cross entropy loss and the firstA second cross entropy loss, a third cross entropy loss, and a fourth cross entropy loss.

And 4, based on the loss, adjusting and updating parameters of the student model to obtain the trained student model.

It can be understood that each sample in the training sample set is trained according to the methods of step 1, step 2 and step 3, the loss of the student model is calculated, the parameters of the student model are adjusted based on the loss, the parameters of the student model are continuously adjusted by traversing each sample in the training sample set until the loss reaches the minimum, and the trained student model is obtained.

After training the student model, the method further comprises the following steps: acquiring a sample verification set; inputting the number of verification samples in the sample verification set into a trained student model, and outputting a predicted case category and a predicted department category corresponding to each verification sample; according to the actual case category and the actual department category of each verification sample and the corresponding predicted case category and predicted department category, respectively calculating the case category accuracy and the department category accuracy; and adjusting and updating parameters of the student model according to the case type accuracy and the department type accuracy, and obtaining the trained and verified student model.

It can be understood that the student model is trained repeatedly on the training set, the accuracy of the student model is verified by using the verification sample set, and the first edition of student model with the highest verification accuracy is used as the trained student model. And inputting each sample in the verification sample set into the student model, and obtaining a corresponding prediction case category and a prediction department category for each sample. For each verification sample, if the predicted case category is consistent with the actual case category in the sample, the prediction is correct, otherwise, the prediction is incorrect. Department category prediction is the same. The accuracy is the number of verification correct divided by the total number of verification samples, and the verification accuracy of the student model is (case accuracy + department accuracy)/2. If the accuracy is higher than the existing highest accuracy, the model parameters of the version are saved, and the existing highest accuracy is changed into the accuracy of the version.

And 5, inputting the classified case information into the trained student model to obtain the case type and department type output by the student model.

It can be understood that after the student model is trained and verified, the student model can be used for predicting the case type and the department type, for example, the case information to be classified is input into the trained student model, the case type and the department type output by the student model are obtained, and the case is allocated to the related departments for processing according to the case type and the department type.

The invention provides an automatic case classification and distribution method, which enables a simple student model to learn text characteristics of a complex teacher model in a knowledge distillation mode, and finally uses the simple student model to infer case texts. Compared with the reasoning directly using a complex teacher model, the reasoning time of using a simple student model is less, and the service response is faster. Compared with the method for training the simple student model by directly using the data, the distillation mode can enable the simple student model to learn part of more complex text features in the teacher model, so that generalization of the model is improved, and accuracy is higher.

In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

It will be appreciated by those skilled in the art that embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. The automatic case classifying and distributing method is characterized by comprising the following steps:

inputting each sample into a first-stage linear layer of a teacher model, and outputting a first primary case type prediction vector and a first primary department type prediction vector; inputting each sample into a first-stage linear layer and a second-stage linear layer of a student model, and respectively outputting a second primary case type prediction vector, a second primary department type prediction vector, a second case type prediction vector and a second department type prediction vector, wherein the student model is fewer than the teacher model in number of hierarchical modules, the dimension of each hierarchy is small, and the teacher model is trained in advance;

inputting case information to be classified into the trained student model, and obtaining the case type and department type output by the student model;

the teacher model comprises a text embedding model, two first-stage linear layers, four second-stage linear layers and two category embedding matrixes, wherein the text embedding model in the teacher model is a Bert model, and the Bert model comprises 12 Transformer modules;

the student model comprises a text embedding model, two first-stage linear layers, four second-stage linear layers and two category embedding matrixes, wherein the text embedding model in the student model is a TinyBert model, and the TinyBert model comprises 4 Transformer modules;

the two first-stage linear layers, the four second-stage linear layers and the two category embedding matrices in the student model are smaller than the dimensions of the two first-stage linear layers, the four second-stage linear layers and the two category embedding matrices in the teacher model;

the step of inputting each sample into the first-stage linear layer and the second-stage linear layer of the student model and respectively outputting a second primary case type prediction vector, a second primary department type prediction vector, a secondary case type prediction vector and a secondary department type prediction vector comprises the following steps:

inputting each sample into a TinyBert model of the student model to obtain a second text embedded vector, respectively inputting the second text embedded vector into two first-stage linear layers of the student model to obtain a second primary case type prediction vector output by one first-stage linear layer and a second primary department type prediction vector output by the other first-stage linear layer;

based on a second primary case type prediction vector output by the student model, acquiring a related case type embedding matrix from a case type related linked list, inputting the second primary case type prediction vector and the related case type embedding matrix into a second linear layer of the student model, and outputting a secondary case type prediction vector; based on the second primary department category prediction vector output by the student model, acquiring a related department category embedding matrix from a department category related linked list, inputting the second primary department category prediction vector and the related department category embedding matrix into a second linear layer of the student model, and outputting a secondary department category prediction vector;

wherein the case category related linked list and the department category related linked list are generated after training by the teacher model;

the calculating the loss of the student model based on the first primary case category prediction vector, the first primary department category prediction vector, the second primary case category prediction vector, the second primary department category prediction vector, the second case category prediction vector and the second department category prediction vector includes:

calculating case category KL divergence between the first primary case category prediction vector and the second primary case category prediction vector, and calculating department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector;

calculating a first cross entropy loss between the actual case category of the sample and the second primary case category prediction vector, and calculating a second cross entropy loss between the actual department category of the sample and the second primary department category prediction vector;

calculating a third cross entropy loss between the actual case type of the sample and the secondary case type prediction vector, and calculating a fourth cross entropy loss between the actual department type of the sample and the secondary department type prediction vector;

the loss of the student model is calculated based on the case category KL divergence, the department category KL divergence, the first cross entropy loss, the second cross entropy loss, the third cross entropy loss, and the fourth cross entropy loss.

2. The case automatic classification and allocation method according to claim 1, wherein inputting each sample into a first level linear layer of a teacher model, outputting a first primary case category prediction vector and a first primary department category prediction vector, comprises:

and inputting each sample into a Bert model of the teacher model to obtain a first text embedded vector, respectively inputting the first text embedded vector into two first-stage linear layers of the teacher model to obtain a first primary case type prediction vector output by one first-stage linear layer and a first primary department type prediction vector output by the other first-stage linear layer.

3. The automatic case classification and allocation method according to claim 2, wherein the inputting each sample into the Bert model of the teacher model to obtain the first text embedded vector includes:

inputting each sample into a Bert model word segmentation device of a teacher model, converting case information in the sample into corresponding word element codes through the Bert model word segmentation device, inputting the word element codes into the Bert model in the teacher model, and outputting a first text embedded vector;

inputting each sample into a TinyBert model of the student model to obtain a second text embedded vector, wherein the method comprises the following steps of:

and inputting the word element code into a TinyBert model of the student model, and outputting a second text embedded vector.

4. The automatic case classification and allocation method according to claim 2, wherein the obtaining the relevant case category embedding matrix from the case category relevant linked list based on the second primary case category prediction vector output by the student model includes:

according to the second primary case type prediction result, finding out case related types from the case type related linked list, and according to the found case related types, obtaining a related case type embedding matrix RelateEncoderclass from the case type embedding matrix;

the second primary department category prediction vector output based on the student model acquires a related department category embedding matrix from a department category related linked list, and the method comprises the following steps:

finding out department related categories from the department category related linked list according to the second primary department category prediction result, and obtaining a related department category embedding matrix RelateEncoder from the department category embedding matrix according to the found department related categories _depa 。

5. The case automatic classification and allocation method according to claim 1, wherein the calculating a case category KL divergence between the first primary case category prediction vector and the second primary case category prediction vector includes:

wherein softmax (ClassDist 1) _T ) The first primary case category prediction vector ClassDist1 output by the teacher model _T Is a probability distribution of softmax (ClassDist 1) _S ) Is a second primary case category predictive vector ClassDist1 output by the student model _S Probability distribution of (2);

the calculating the department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector comprises:

wherein softmax (DepaDist 1) _T ) First primary department category prediction vector DepaDist1 output by teacher model _T Is a probability distribution of softmax (DepaDist 1) _S ) The first class predictive vector DepaDist1 of the second department output by the student model _S Is a probability distribution of (c).

6. The case automatic classification and allocation method according to claim 1, wherein the calculating the loss of the student model based on the case category KL divergence, the department category KL divergence, the first cross entropy loss, the second cross entropy loss, the third cross entropy loss, and the fourth cross entropy loss includes:

Loss＝KL(ClassDist1 _T ,ClassDist1 _S )+KL(DepaDist1 _T ,DepaDist1 _S )+CrossEntropy1 _class +CrossEntropy1 _depa +BinaryEntropy2 _class +BinaryEntropy2 _depa ；

wherein KL (ClassDist 1) _T ,ClassDist1 _S ) For the case category KL divergence between the first primary case category prediction vector and the second primary case category prediction vector, KL (Departst 1) _T ,DepaDist1 _S ) Cross Entropy1 for department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector _class 、CrossEntropy1 _depa 、BinaryEntropy2 _class 、BinaryEntropy2 _depa The first cross entropy loss, the second cross entropy loss, the third cross entropy loss, and the fourth cross entropy loss, respectively.

7. The case automatic classification and allocation method according to claim 1, further comprising:

acquiring a verification sample set;

inputting the number of verification samples in the verification sample set into a trained student model, and outputting a prediction case category and a prediction department category corresponding to each verification sample;

according to the actual case category and the actual department category of each verification sample and the corresponding predicted case category and predicted department category, respectively calculating the case category accuracy and the department category accuracy;

and adjusting and updating parameters of the student model according to the case type accuracy and the department type accuracy, and obtaining the trained and verified student model.