CN116861302A - Automatic case classifying and distributing method - Google Patents

Automatic case classifying and distributing method Download PDF

Info

Publication number
CN116861302A
CN116861302A CN202311133641.XA CN202311133641A CN116861302A CN 116861302 A CN116861302 A CN 116861302A CN 202311133641 A CN202311133641 A CN 202311133641A CN 116861302 A CN116861302 A CN 116861302A
Authority
CN
China
Prior art keywords
category
case
model
department
primary
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311133641.XA
Other languages
Chinese (zh)
Other versions
CN116861302B (en
Inventor
杨伊态
许继伟
段春先
黄亚林
张兆文
李成涛
陈胜鹏
刘惠娟
王敬佩
李颖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geospace Information Technology Co ltd
Original Assignee
Geospace Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geospace Information Technology Co ltd filed Critical Geospace Information Technology Co ltd
Priority to CN202311133641.XA priority Critical patent/CN116861302B/en
Publication of CN116861302A publication Critical patent/CN116861302A/en
Application granted granted Critical
Publication of CN116861302B publication Critical patent/CN116861302B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/10Services
    • G06Q50/26Government or public services
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Human Resources & Organizations (AREA)
  • Strategic Management (AREA)
  • Tourism & Hospitality (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Economics (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computational Linguistics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application provides an automatic case classifying and distributing method, which comprises the steps of designing a neural network model with fewer parameters and simpler structure as a student model, taking a trained and complex case classifying and distributing model as a teacher model, distilling knowledge of the teacher model to the student model through a knowledge distillation technology, and finally classifying and distributing cases by using the student model as an reasoning model. Compared with the existing neural network method, the method can greatly improve the reasoning speed; compared with the direct use of data to train student models, the model generalization capability obtained based on knowledge distillation is better, and the accuracy is higher.

Description

Automatic case classifying and distributing method
Technical Field
The application relates to the field of information processing, in particular to an automatic case classification and allocation method.
Background
In the city government service, citizens report their own appeal information through a plurality of approaches such as telephone, website, weChat applet, etc. The government hot line business personnel are required to classify each complaint case and distribute the complaint cases to the authorities for processing according to the case types and other information. With the trend of busy government hotlines, the manual classification and allocation mode of complaint cases is difficult to meet business requirements. The method for realizing automatic classification and allocation of complaint cases by using machine learning, artificial intelligence and other technologies has been widely used in government hotlines gradually.
However, the existing government hot line classification allocation method has higher accuracy, but because of large model parameters and complex model structure, the model processing time is longer, so that the service response efficiency is low, and the service requirement is difficult to meet on hardware with limited cost.
The existing methods for classifying and distributing cases are mainly divided into three types:
the first category is classification methods based on rule matching. The method automatically classifies and distributes the input case text by constructing classification and distribution rules, such as keyword matching. When the rules are fewer, the method has high reasoning speed; however, in large-scale application, rules are gradually increased and complicated, and reasoning speed is gradually slow.
The second category is machine learning based methods. The method uses historical sample data to train machine learning algorithms or models such as TD-IDF, support vector machines and the like, and then uses the trained algorithms or models to automatically classify and allocate case texts. Such methods are fast to infer compared to neural network-based methods. However, when the classification of the cases is not clear and the type of the department director is not clear, the classification and allocation accuracy of the cases is low.
The third class is neural network based methods. The method comprises the steps of constructing a neural network, training a neural network model by using historical sample data, and then automatically classifying and distributing cases by using the trained model. Compared with the method based on rule matching and machine learning, the method has higher accuracy. However, the existing case classification and allocation model based on the neural network is low in reasoning speed, and is difficult to meet timeliness of government hot line business.
Disclosure of Invention
The application provides an automatic case classifying and distributing method aiming at the technical problems in the prior art, which comprises the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of samples, and each sample comprises case information, case categories and department categories;
inputting each sample into a first-stage linear layer of a teacher model, and outputting a first primary case type prediction vector and a first primary department type prediction vector; inputting each sample into a first-stage linear layer and a second-stage linear layer of a student model, and respectively outputting a second primary case type prediction vector, a second primary department type prediction vector, a second case type prediction vector and a second department type prediction vector, wherein the student model is less than the teacher model in number of hierarchical modules, the dimension of each hierarchical model is small, and the teacher model is trained in advance;
calculating a loss of the student model based on the first primary case category prediction vector, the first primary department category prediction vector, the second primary case category prediction vector, the second primary department category prediction vector, the second case category prediction vector, and the second department category prediction vector;
based on the loss, adjusting and updating parameters of the student model to obtain a trained student model;
and inputting the classified case information into the trained student model to obtain the case type and department type output by the student model.
According to the case automatic classifying and distributing method provided by the application, a neural network model with fewer parameters and simpler structure is designed to serve as a student model, a trained and complex case classifying and distributing model serves as a teacher model, and knowledge of the teacher model is distilled to the student model through a knowledge distillation technology. And finally, classifying and allocating the cases by using the student model as an inference model. Compared with the existing neural network method, the method can greatly improve the reasoning speed; compared with the direct use of data to train student models, the model generalization capability obtained based on knowledge distillation is better, and the accuracy is higher.
Drawings
FIG. 1 is a flow chart of a case automatic classification and allocation method provided by the application;
FIG. 2 is a schematic diagram of teacher model training and reasoning;
FIG. 3 is a schematic diagram of a student model training;
fig. 4 is a schematic diagram of a teacher model and a student model training.
Detailed Description
For the purpose of making the objects, technical solutions and advantages of the embodiments of the present application more apparent, the technical solutions of the embodiments of the present application will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments of the present application. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to be within the scope of the application. In addition, the technical features of each embodiment or the single embodiment provided by the application can be combined with each other at will to form a feasible technical scheme, and the combination is not limited by the sequence of steps and/or the structural composition mode, but is necessarily based on the fact that a person of ordinary skill in the art can realize the combination, and when the technical scheme is contradictory or can not realize, the combination of the technical scheme is not considered to exist and is not within the protection scope of the application claimed.
The application provides a case automatic classification and allocation method, which mainly comprises 2 parts and a training stage: training a student model using the history data and the teacher model; reasoning: and classifying and distributing the government affair hot line cases by using a student model. Referring to fig. 1, the automatic case classification and allocation method provided by the application comprises the following steps:
step 1, a training sample set is obtained, wherein the training sample set comprises a plurality of samples, and each sample comprises case information, case categories and department categories.
It is understood that each piece of the sample set is in the format of [ case information, area information, case category, department category ], where the area information is optional. The sample set is proportionally divided into a training sample set and a validation sample set.
Such as: sample a [ "daily noise gives people no way to sleep, complaints, and has gone through an hour, too slow", "pyridazine", "440", "105" ], where 440 and 105 are case category code and department category code, respectively.
Step 2, inputting each sample into a first-stage linear layer of a teacher model, and outputting a first primary case type prediction vector and a first primary department type prediction vector; and inputting each sample into a first-stage linear layer and a second-stage linear layer of a student model, and respectively outputting a second primary case type prediction vector, a second primary department type prediction vector, a second case type prediction vector and a second department type prediction vector, wherein the student model is less than the teacher model in number of hierarchical modules, the dimension of each hierarchical model is small, and the teacher model is trained in advance.
It will be appreciated that each sample in the training sample set is input into a teacher model and a student model, respectively, wherein the teacher model is in a state in which the parameters of the teacher model are fixed after training.
Fig. 2 is a schematic diagram of a teacher model, where the teacher model includes a text embedding model, two first-stage linear layers, i.e., the linear layer 11 and the linear layer 12 in fig. 2, four second-stage linear layers, i.e., the linear layer 13, the linear layer 14, the linear layer 15 and the linear layer 16 in fig. 2, and two class embedding matrices. The text embedded model in the teacher model is a Bert model, and the Bert model comprises 12 transducer modules.
Fig. 3 is a schematic diagram of a structure of a student model, which includes a text embedding model, two first-stage linear layers, i.e., the linear layer 21 and the linear layer 22 in fig. 3, four second-stage linear layers, i.e., the linear layer 23, the linear layer 24, the linear layer 25 and the linear layer 26 in fig. 3, and two category embedding matrices. The text embedding model in the student model is a TinyBert model, which includes 4 transform modules. The two first-stage linear layers, the four second-stage linear layers and the two category embedding matrices in the student model are smaller in dimension than the two first-stage linear layers, the four second-stage linear layers and the two category embedding matrices in the teacher model.
Specifically, the text embedding model of the teacher model is a Bert model, and the Bert model comprises 12 Transformer modules, and the dimension of the text embedding vector is 768 dimensions; the teacher model also contains 6 linear layers and 2 class embedding matrices. The linear layer 11 is the number of case categories, wherein the input dimension of the primary case prediction module is 768, the output dimension is Nclass, and Nclass is the number of case categories. The linear layer 13 and the linear layer 14 are secondary case prediction modules, wherein the input and output dimensions of the linear layer 13 are 768, the input dimension of the linear layer 14 is 768, and the output dimension is 1. The linear layer 12 inputs 768 dimensions for the primary department prediction module, ndepa for the output dimensions and Ndepa for the number of department categories. The linear layer 15 and the linear layer 16 are secondary department prediction modules, wherein the input and output dimensions of the linear layer 15 are 768, and the input and output dimensions 768 and the output dimension 1 of the linear layer 16 are respectively. The 2 category matrices are a case category embedding matrix 1 and a department category embedding matrix 1, respectively, wherein the dimension of the case category embedding matrix is [ Nclass,768], and the dimension of the department category embedding matrix is [ Npepa, 768].
The text embedding model of the student model is a TinyBert model, the TinyBert model comprises 4 transducer modules, and the dimension of the text embedding vector is 312 dimensions; the student model also contained 6 linear layers and 2 class embedding matrices. The linear layer 21 is the number of case categories with an input dimension of 312, an output dimension of Nclass, and Nclass. The linear layer 23 and the linear layer 24 are secondary case prediction modules, wherein the input and output dimensions of the linear layer 23 are 312, the input dimension of the linear layer 24 is 312, and the output dimension is 1. The linear layer 22 inputs a dimension 312 for the primary department prediction module, and outputs a dimension Ndepa, which is the number of department categories. The linear layer 25 and the linear layer 26 are secondary department prediction modules, wherein the input and output dimensions of the linear layer 25 are 312, the input dimension of the linear layer 26 is 312, and the output dimension is 1. The 2 category matrices are a case category embedding matrix 2 and a department category embedding matrix 2, respectively, wherein the dimension of the case category embedding matrix is [ Nclass, 312], and the dimension of the department category embedding matrix is [ Npepa, 312].
Wherein, the teacher model is a neural network model which has been trained, namely, the Bert model, the linear layer 11, the linear layer 12, the linear layer 13, the linear layer 14, the linear layer 15, the linear layer 16, the case type embedding matrix 1 and the department type embedding matrix 1 are always fixed and do not change in the training process.
Wherein, as an embodiment, the inputting each sample into the first level linear layer of the teacher model outputs the first primary case category prediction vector and the first primary department category prediction vector, and includes: and inputting each sample into a Bert model of the teacher model to obtain a first text embedded vector, respectively inputting the first text embedded vector into two first-stage linear layers of the teacher model to obtain a first primary case type prediction vector output by one first-stage linear layer and a first primary department type prediction vector output by the other first-stage linear layer.
Inputting each sample into a first-stage linear layer and a second-stage linear layer of the student model, respectively outputting a second primary case type predictive vector, a second primary department type predictive vector, a secondary case type predictive vector and a secondary department type predictive vector, wherein the method comprises the following steps: and inputting each sample into a TinyBert model of the student model to obtain a second text embedded vector, respectively inputting the second text embedded vector into two first-stage linear layers of the student model to obtain a second primary case type prediction vector output by one first-stage linear layer and a second primary department type prediction vector output by the other first-stage linear layer. Based on a second primary case type prediction vector output by the student model, acquiring a related case type embedding matrix from a case type related linked list, inputting the second primary case type prediction vector and the related case type embedding matrix into a second linear layer of the student model, and outputting a secondary case type prediction vector; based on the second primary department category prediction vector output by the student model, acquiring a related department category embedding matrix from a department category related linked list, inputting the second primary department category prediction vector and the related department category embedding matrix into a second linear layer of the student model, and outputting a secondary department category prediction vector; the case type related linked list and the department type related linked list are generated after training of the teacher model.
Based on the training sample set and the trained teacher model, a schematic diagram of training the student model can be seen in fig. 4, and the training process is as follows:
for each sample in the training sample set, respectively inputting a Bert model in a teacher model and a TinyBert model of a student model, and converting the Bert model and the TinyBert model into text embedded vectors, wherein the steps are as follows:
if the region information exists in the sample, the region information and the case information are combined to obtain the combined case information.
Such as: the case information after the combination of the samples a is that' the noise of the pyridazine area on the day can not cause people to sleep, complaints are generated, people do not exist after one hour, and the case information is too slow and too slow.
And converting the combined case information into corresponding word element codes by using a Bert model word segmentation device in the teacher model, and adding special character codes [ CLS ], [ SEP ] at the head and the tail of the codes to form the word element codes of the case information. The Bert model of the teacher model in the application is a Chinese-Bert-wwm-ext Bert (Bidirectional Encoder Representation from Transformers) pre-training model.
Such as: the case information after the combination of the sample a is converted into the character code based on the Bert word segmentation device in the teacher model, and the character code is as follows: [101, 1515, 1515, 1277, 1921, 1692, 7509, 6375, 782, 3187, 3791, 4717, 6230, 8024, 2347, 2832, 6401, 8024, 6814, 749, 671, 702, 2207, 3198, 738, 3766, 782, 5052, 8024, 1922, 2714, 8024, 1922, 2714, 102]. Where 101 is the encoding of the special character 'CLS' and 102 is the encoding of the special character 'SEP'. For each entity word vector, it starts with code "101" and ends with code "102".
And respectively inputting the word element codes of the case information into a Bert model in the teacher model and a TinyBert model in the student model. The teacher model and the student model respectively obtain the embedded vector E of the special character CLS CLS-T ,E CLS-S; wherein ECLS-T Representing semantic features of the whole input text after being converted by the Bert model; e (E) CLS-S Representing semantic features of the entire input text after conversion by the TinyBert model.
Will embed vector E CLS-T Inputting the vector into a linear layer 11 in a teacher model to obtain a primary case type prediction vector ClassDist1 T , ClassDist1 T Is N class A one-dimensional vector of dimensions; will embed vector E CLS-T The linear layer 12 is input into the teacher model to obtain the first department class prediction vector DepaDist1 T , Depa Dist1 T Is N depa A one-dimensional vector of dimensions.
Will embed vector E CLS-S The linear layer 21 is input into the student model to obtain a primary case type prediction vector ClassDist1 S , ClassDist1 S Is N class A one-dimensional vector of dimensions; will embed vector E CLS-S The linear layer 22 is input into the student model to obtain the primary department class prediction vector DepaDist1 S , DepaDist1 S Is N depa A one-dimensional vector of dimensions.
Primary class prediction vector ClassDist1 of student model T After a primary case type prediction result is obtained through a Softmax layer, finding out a case related type from a case type related linked list according to the primary case type prediction result, and obtaining a related case type embedding matrix RelateEncoder from a case type embedding matrix according to the found case related type class There may be a plurality of case related categories, and the case related categories include self category, relateEncoder class Is one [ M ] class ,312]Matrix of dimensions, where M class Representing the number of related categories. Similarly, the related department category embedding matrix RelateEncoder can be obtained depa . The case type related linked list and the department type related linked list of the student model are generated after the teacher model is trained, and the related case type corresponding to each case type is recorded.
The case type related chain table and the department type related chain table of the student model are case type related chain tables and department type related chain tables which are trained in the teacher model are directly copied (but the case type embedding matrix and the department type embedding matrix are not because the teacher model is 768-dimensional, and the student model is only 312-dimensional).
Taking the case category related linked list as an example: the linked list is composed of a plurality of key value pairs, each key is a case category, and the value corresponding to each key is the case category related to the case category, wherein the case category related to the case category is calculated by the category.
If the case category predicted for the first time is 449, the key 449 is found from the case category related linked list, and then the corresponding values 362,1 and 449 are obtained, i.e. category 362, category 1 and category 449 are all 449 related categories. Then extracting the vectors of the corresponding categories 362,1 and 449 from the case category embedding matrix to obtain a matrix of [3, 312].
Will embed vector E CLS S is directly and directly dot multiplied by each row of the related case category embedding matrix according to dimensions, the dot multiplied result is used as input of a linear layer 25, and a case category prediction result Res2 of the student model is obtained through the linear layer 25 and the linear layer 26 class
Embedding vector E CLS-S Is [1, 312]]Vector of dimensions, assuming that there are 4 related case categories, the related case category embedding matrix has dimensions of [4, 312]. Embedding vector E CLS-S Each row of the embedding matrix of the related case category is directly and directly dot-multiplied according to dimensions, and the obtained result is [4, 312]. The linear layer 25 is input, and the intermediate result dimension is [4, 312]. The intermediate result is input to a linear layer 26, the final result being obtained in dimensions [4,1]。
And 3, calculating the loss of the student model based on the first primary case type prediction vector, the first primary department type prediction vector, the second primary case type prediction vector, the second primary department type prediction vector, the second case type prediction vector and the second department type prediction vector.
The loss of the student model is as follows: calculating case category KL divergence between the first primary case category prediction vector and the second primary case category prediction vector, and calculating department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector; calculating a first cross entropy loss between the actual case category of the sample and the second primary case category prediction vector, and calculating a second cross entropy loss between the actual department category of the sample and the second primary department category prediction vector; calculating a third cross entropy loss between the actual case type of the sample and the secondary case type prediction vector, and calculating a fourth cross entropy loss between the actual department type of the sample and the secondary department type prediction vector; the loss of the student model is calculated based on the case category KL divergence, the department category KL divergence, the first cross entropy loss, the second cross entropy loss, the third cross entropy loss, and the fourth cross entropy loss.
It will be appreciated that vector E will be embedded CLS-T Input into teacher modelThe linear layer 11 obtains a primary case type prediction vector ClassDist1 T Will embed vector E CLS-T The linear layer 12 is input into the teacher model to obtain the first department class prediction vector DepaDist1 T
Will embed vector E CLS-S The linear layer 21 is input into the student model to obtain a primary case type prediction vector ClassDist1 S The method comprises the steps of carrying out a first treatment on the surface of the Will embed vector E CLS-S The linear layer 22 is input into the student model to obtain the primary department class prediction vector DepaDist1 S
Calculation of ClassDist1 T And ClassDist1 S KL divergence of (2),DepaDist1 T And DepaDist1 S KL divergence KL (DepaDist 1) T , DepaDist1 S ). KL divergence can measure the similarity between two probability distributions, +.>The calculation formula of (2) is as follows:
wherein Is the mr model primary case category predictive vector +.>Probability distribution of->Is the prediction vector of the first case category of the student model +.>Is a probability distribution of (c). The former term in the formula is +.>Divergence, the latter term isDivergence (+)>And (3) withThis is characteristic of KL divergence) the KL divergence is calculated in the present application using the KL divergence loss function f.kl _ div provided by the pytorch framework.
Similarly, KL (DepaDist 1 T ,DepaDist1 S ) The calculation formula of (2) is as follows:
wherein Is the first primary department category predictive vector output by the teacher modelProbability distribution of->Second department primary class prediction vector, which is output by student model +.>Is a probability distribution of (c).
And calculating the KL divergence between the primary case type prediction result of the student model and the primary case type prediction result of the teacher model, and the KL divergence between the primary department type prediction result of the student model and the primary department type prediction result of the teacher model.
Then, calculating cross entropy loss CrossEntropy1 between the actual case type of the sample and the primary case type prediction result of the student model class . Cross entropy loss is a common loss function in neural networks, and is mainly used in multi-classification tasks to measure the difference between the predicted value of a model and an actual label. The cross entropy is calculated in the applicationThe loss uses the cross entropy loss function f. Cross entropyloss provided by the pytorch framework. Similarly, cross entropy loss CrossEntropy1 between actual department category of the sample and primary department category prediction result of student model is calculated depa
And then predicting result Res2 according to the secondary case category of the student model class Calculating a two-class cross entropy loss BinaryEntropy2 of the case class with the actual case class of the sample class . Similarly, a two-class cross entropy loss BinaryEntropy2 for the department class can be obtained depa . The present application computes a two-class cross entropy loss using a two-class cross entropy loss function f. Binary_cross_entry_with_logits provided by the pytorch framework.
The total Loss of model Loss is calculated and the parameters of the student model are updated using gradient descent. The calculation formula of the total Loss is as follows:
wherein ,for the case category KL divergence between the first primary case category predictive vector and the second primary case category predictive vector,/for the case category KL divergence between the first primary case category predictive vector and the second primary case category predictive vector>For department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector,、/>、/>respectively a first cross entropy loss and a second cross entropy lossLoss, third cross entropy loss, and fourth cross entropy loss.
And 4, based on the loss, adjusting and updating parameters of the student model to obtain the trained student model.
It can be understood that each sample in the training sample set is trained according to the methods of step 1, step 2 and step 3, the loss of the student model is calculated, the parameters of the student model are adjusted based on the loss, the parameters of the student model are continuously adjusted by traversing each sample in the training sample set until the loss reaches the minimum, and the trained student model is obtained.
After training the student model, the method further comprises the following steps: acquiring a sample verification set; inputting the number of verification samples in the sample verification set into a trained student model, and outputting a predicted case category and a predicted department category corresponding to each verification sample; according to the actual case category and the actual department category of each verification sample and the corresponding predicted case category and predicted department category, respectively calculating the case category accuracy and the department category accuracy; and adjusting and updating parameters of the student model according to the case type accuracy and the department type accuracy, and obtaining the trained and verified student model.
It can be understood that the student model is trained repeatedly on the training set, the accuracy of the student model is verified by using the verification sample set, and the first edition of student model with the highest verification accuracy is used as the trained student model. And inputting each sample in the verification sample set into the student model, and obtaining a corresponding prediction case category and a prediction department category for each sample. For each verification sample, if the predicted case category is consistent with the actual case category in the sample, the prediction is correct, otherwise, the prediction is incorrect. Department category prediction is the same. The accuracy is the number of verification correct divided by the total number of verification samples, and the verification accuracy of the student model is (case accuracy + department accuracy)/2. If the accuracy is higher than the existing highest accuracy, the model parameters of the version are saved, and the existing highest accuracy is changed into the accuracy of the version.
And 5, inputting the classified case information into the trained student model to obtain the case type and department type output by the student model.
It can be understood that after the student model is trained and verified, the student model can be used for predicting the case type and the department type, for example, the case information to be classified is input into the trained student model, the case type and the department type output by the student model are obtained, and the case is allocated to the related departments for processing according to the case type and the department type.
The application provides an automatic case classification and distribution method, which enables a simple student model to learn text characteristics of a complex teacher model in a knowledge distillation mode, and finally uses the simple student model to infer case texts. Compared with the reasoning directly using a complex teacher model, the reasoning time of using a simple student model is less, and the service response is faster. Compared with the method for training the simple student model by directly using the data, the distillation mode can enable the simple student model to learn part of more complex text features in the teacher model, so that generalization of the model is improved, and accuracy is higher.
In the foregoing embodiments, the descriptions of the embodiments are focused on, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.
It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present application have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. It is therefore intended that the following claims be interpreted as including the preferred embodiments and all such alterations and modifications as fall within the scope of the application.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present application without departing from the spirit or scope of the application. Thus, it is intended that the present application also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (8)

1. The automatic case classifying and distributing method is characterized by comprising the following steps:
acquiring a training sample set, wherein the training sample set comprises a plurality of samples, and each sample comprises case information, case categories and department categories;
inputting each sample into a first-stage linear layer of a teacher model, and outputting a first primary case type prediction vector and a first primary department type prediction vector; inputting each sample into a first-stage linear layer and a second-stage linear layer of a student model, and respectively outputting a second primary case type prediction vector, a second primary department type prediction vector, a second case type prediction vector and a second department type prediction vector, wherein the student model is fewer than the teacher model in number of hierarchical modules, the dimension of each hierarchy is small, and the teacher model is trained in advance;
calculating a loss of the student model based on the first primary case category prediction vector, the first primary department category prediction vector, the second primary case category prediction vector, the second primary department category prediction vector, the second case category prediction vector, and the second department category prediction vector;
based on the loss, adjusting and updating parameters of the student model to obtain a trained student model;
inputting case information to be classified into the trained student model, and obtaining the case type and department type output by the student model;
the teacher model comprises a text embedding model, two first-stage linear layers, four second-stage linear layers and two category embedding matrixes, wherein the text embedding model in the teacher model is a Bert model, and the Bert model comprises 12 Transformer modules;
the student model comprises a text embedding model, two first-stage linear layers, four second-stage linear layers and two category embedding matrixes, wherein the text embedding model in the student model is a TinyBert model, and the TinyBert model comprises 4 Transformer modules;
the two first-stage linear layers, the four second-stage linear layers and the two category embedding matrices in the student model are smaller in dimension than the two first-stage linear layers, the four second-stage linear layers and the two category embedding matrices in the teacher model.
2. The case automatic classification and allocation method according to claim 1, wherein inputting each sample into a first level linear layer of a teacher model, outputting a first primary case category prediction vector and a first primary department category prediction vector, comprises:
inputting each sample into a Bert model of a teacher model to obtain a first text embedded vector, respectively inputting the first text embedded vector into two first-stage linear layers of the teacher model, and obtaining a first primary case type prediction vector output by one of the first-stage linear layers and a first primary department type prediction vector output by the other first-stage linear layer;
inputting each sample into a first-stage linear layer and a second-stage linear layer of the student model, respectively outputting a second primary case type predictive vector, a second primary department type predictive vector, a secondary case type predictive vector and a secondary department type predictive vector, wherein the method comprises the following steps:
inputting each sample into a TinyBert model of the student model to obtain a second text embedded vector, respectively inputting the second text embedded vector into two first-stage linear layers of the student model to obtain a second primary case type prediction vector output by one first-stage linear layer and a second primary department type prediction vector output by the other first-stage linear layer;
based on a second primary case type prediction vector output by the student model, acquiring a related case type embedding matrix from a case type related linked list, inputting the second primary case type prediction vector and the related case type embedding matrix into a second linear layer of the student model, and outputting a secondary case type prediction vector; based on the second primary department category prediction vector output by the student model, acquiring a related department category embedding matrix from a department category related linked list, inputting the second primary department category prediction vector and the related department category embedding matrix into a second linear layer of the student model, and outputting a secondary department category prediction vector;
the case type related linked list and the department type related linked list are generated after training of the teacher model.
3. The automatic case classification and allocation method according to claim 2, wherein the inputting each sample into the Bert model of the teacher model to obtain the first text embedded vector includes:
inputting each sample into a Bert model word segmentation device of a teacher model, converting case information in the sample into corresponding word element codes through the Bert model word segmentation device, inputting the word element codes into the Bert model in the teacher model, and outputting a first text embedded vector;
inputting each sample into a TinyBert model of the student model to obtain a second text embedded vector, wherein the method comprises the following steps of:
and inputting the word element code into a TinyBert model of the student model, and outputting a second text embedded vector.
4. The automatic case classification and allocation method according to claim 2, wherein the obtaining the relevant case category embedding matrix from the case category relevant linked list based on the second primary case category prediction vector output by the student model includes:
according to the second primary case type prediction result, finding out case related types from the case type related linked list, and according to the found case related types, obtaining a related case type embedding matrix RelateEncoderclass from the case type embedding matrix;
the second primary department category prediction vector output based on the student model acquires a related department category embedding matrix from a department category related linked list, and the method comprises the following steps:
finding out department related categories from the department category related linked list according to the second primary department category prediction result, and obtaining a related department category embedding matrix RelateEncoder from the department category embedding matrix according to the found department related categories depa
5. The case automatic classification and allocation method according to claim 1, wherein the calculating the loss of the student model based on the first primary case category prediction vector, the first primary department category prediction vector, the second primary case category prediction vector, the second primary department category prediction vector, the second case category prediction vector, and the second department category prediction vector comprises:
calculating case category KL divergence between the first primary case category prediction vector and the second primary case category prediction vector, and calculating department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector;
calculating a first cross entropy loss between the actual case category of the sample and the second primary case category prediction vector, and calculating a second cross entropy loss between the actual department category of the sample and the second primary department category prediction vector;
calculating a third cross entropy loss between the actual case type of the sample and the secondary case type prediction vector, and calculating a fourth cross entropy loss between the actual department type of the sample and the secondary department type prediction vector;
the loss of the student model is calculated based on the case category KL divergence, the department category KL divergence, the first cross entropy loss, the second cross entropy loss, the third cross entropy loss, and the fourth cross entropy loss.
6. The case automatic classification and allocation method according to claim 5, wherein calculating a case category KL divergence between the first primary case category prediction vector and the second primary case category prediction vector includes:
wherein Is the first primary case category predictive vector output by the teacher modelProbability distribution of->Second primary case category predictive vector +.>Probability distribution of (2);
the calculating the department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector comprises:
wherein Is the first primary department category predictive vector output by the teacher modelProbability distribution of->Second department primary class prediction vector, which is output by student model +.>Is a probability distribution of (c).
7. The case automatic classification and allocation method according to claim 5, wherein the calculating the loss of the student model based on the case category KL divergence, the department category KL divergence, the first cross entropy loss, the second cross entropy loss, the third cross entropy loss, and the fourth cross entropy loss includes:
wherein ,for the case category KL divergence between the first primary case category predictive vector and the second primary case category predictive vector,/for the case category KL divergence between the first primary case category predictive vector and the second primary case category predictive vector>For department category KL divergence between the first primary department category prediction vector and the second primary department category prediction vector,、/>、/>the first cross entropy loss, the second cross entropy loss, the third cross entropy loss, and the fourth cross entropy loss, respectively.
8. The case automatic classification and allocation method according to claim 1, further comprising:
acquiring a verification sample set;
inputting the number of verification samples in the verification sample set into a trained student model, and outputting a prediction case category and a prediction department category corresponding to each verification sample;
according to the actual case category and the actual department category of each verification sample and the corresponding predicted case category and predicted department category, respectively calculating the case category accuracy and the department category accuracy;
and adjusting and updating parameters of the student model according to the case type accuracy and the department type accuracy, and obtaining the trained and verified student model.
CN202311133641.XA 2023-09-05 2023-09-05 Automatic case classifying and distributing method Active CN116861302B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311133641.XA CN116861302B (en) 2023-09-05 2023-09-05 Automatic case classifying and distributing method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311133641.XA CN116861302B (en) 2023-09-05 2023-09-05 Automatic case classifying and distributing method

Publications (2)

Publication Number Publication Date
CN116861302A true CN116861302A (en) 2023-10-10
CN116861302B CN116861302B (en) 2024-01-23

Family

ID=88229026

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311133641.XA Active CN116861302B (en) 2023-09-05 2023-09-05 Automatic case classifying and distributing method

Country Status (1)

Country Link
CN (1) CN116861302B (en)

Citations (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111611377A (en) * 2020-04-22 2020-09-01 淮阴工学院 Knowledge distillation-based multi-layer neural network language model training method and device
CN111767405A (en) * 2020-07-30 2020-10-13 腾讯科技(深圳)有限公司 Training method, device and equipment of text classification model and storage medium
CN112131366A (en) * 2020-09-23 2020-12-25 腾讯科技(深圳)有限公司 Method, device and storage medium for training text classification model and text classification
US20210142164A1 (en) * 2019-11-07 2021-05-13 Salesforce.Com, Inc. Multi-Task Knowledge Distillation for Language Model
US20210150340A1 (en) * 2019-11-18 2021-05-20 Salesforce.Com, Inc. Systems and Methods for Distilled BERT-Based Training Model for Text Classification
CN113435208A (en) * 2021-06-15 2021-09-24 北京百度网讯科技有限公司 Student model training method and device and electronic equipment
CN113673698A (en) * 2021-08-24 2021-11-19 平安科技(深圳)有限公司 Distillation method, device, equipment and storage medium suitable for BERT model
CN114676256A (en) * 2022-03-30 2022-06-28 淮阴工学院 Text classification method based on multi-teaching-assistant model knowledge distillation training
CN114722805A (en) * 2022-06-10 2022-07-08 苏州大学 Little sample emotion classification method based on size instructor knowledge distillation
CN114782742A (en) * 2022-04-06 2022-07-22 浙江工业大学 Output regularization method based on teacher model classification layer weight
CN114818902A (en) * 2022-04-21 2022-07-29 浪潮云信息技术股份公司 Text classification method and system based on knowledge distillation
US20220343139A1 (en) * 2021-04-15 2022-10-27 Peyman PASSBAN Methods and systems for training a neural network model for mixed domain and multi-domain tasks
US11487944B1 (en) * 2019-12-09 2022-11-01 Asapp, Inc. System, method, and computer program for obtaining a unified named entity recognition model with the collective predictive capabilities of teacher models with different tag sets using marginal distillation
CN115456166A (en) * 2022-08-29 2022-12-09 浙江工业大学 Knowledge distillation method for neural network classification model of passive domain data
CN115481249A (en) * 2022-09-22 2022-12-16 淮阴工学院 Knowledge distillation chemical text classification method and device based on Gate-mix up data enhancement
CN115526332A (en) * 2022-08-17 2022-12-27 阿里巴巴(中国)有限公司 Student model training method and text classification system based on pre-training language model
US20230058194A1 (en) * 2021-03-12 2023-02-23 Tencent Technology (Shenzhen) Company Limited Text classification method and apparatus, device, and computer-readable storage medium
CN115935245A (en) * 2023-03-10 2023-04-07 吉奥时空信息技术股份有限公司 Automatic classification and distribution method for government affair hotline cases
CN116306869A (en) * 2023-03-07 2023-06-23 支付宝(杭州)信息技术有限公司 Method for training text classification model, text classification method and corresponding device

Patent Citations (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20210142164A1 (en) * 2019-11-07 2021-05-13 Salesforce.Com, Inc. Multi-Task Knowledge Distillation for Language Model
US20210150340A1 (en) * 2019-11-18 2021-05-20 Salesforce.Com, Inc. Systems and Methods for Distilled BERT-Based Training Model for Text Classification
US11487944B1 (en) * 2019-12-09 2022-11-01 Asapp, Inc. System, method, and computer program for obtaining a unified named entity recognition model with the collective predictive capabilities of teacher models with different tag sets using marginal distillation
CN111611377A (en) * 2020-04-22 2020-09-01 淮阴工学院 Knowledge distillation-based multi-layer neural network language model training method and device
CN111767405A (en) * 2020-07-30 2020-10-13 腾讯科技(深圳)有限公司 Training method, device and equipment of text classification model and storage medium
CN112131366A (en) * 2020-09-23 2020-12-25 腾讯科技(深圳)有限公司 Method, device and storage medium for training text classification model and text classification
US20230058194A1 (en) * 2021-03-12 2023-02-23 Tencent Technology (Shenzhen) Company Limited Text classification method and apparatus, device, and computer-readable storage medium
US20220343139A1 (en) * 2021-04-15 2022-10-27 Peyman PASSBAN Methods and systems for training a neural network model for mixed domain and multi-domain tasks
CN113435208A (en) * 2021-06-15 2021-09-24 北京百度网讯科技有限公司 Student model training method and device and electronic equipment
CN113673698A (en) * 2021-08-24 2021-11-19 平安科技(深圳)有限公司 Distillation method, device, equipment and storage medium suitable for BERT model
WO2023024427A1 (en) * 2021-08-24 2023-03-02 平安科技(深圳)有限公司 Distillation method and apparatus suitable for bert model, device, and storage medium
CN114676256A (en) * 2022-03-30 2022-06-28 淮阴工学院 Text classification method based on multi-teaching-assistant model knowledge distillation training
CN114782742A (en) * 2022-04-06 2022-07-22 浙江工业大学 Output regularization method based on teacher model classification layer weight
CN114818902A (en) * 2022-04-21 2022-07-29 浪潮云信息技术股份公司 Text classification method and system based on knowledge distillation
CN114722805A (en) * 2022-06-10 2022-07-08 苏州大学 Little sample emotion classification method based on size instructor knowledge distillation
CN115526332A (en) * 2022-08-17 2022-12-27 阿里巴巴(中国)有限公司 Student model training method and text classification system based on pre-training language model
CN115456166A (en) * 2022-08-29 2022-12-09 浙江工业大学 Knowledge distillation method for neural network classification model of passive domain data
CN115481249A (en) * 2022-09-22 2022-12-16 淮阴工学院 Knowledge distillation chemical text classification method and device based on Gate-mix up data enhancement
CN116306869A (en) * 2023-03-07 2023-06-23 支付宝(杭州)信息技术有限公司 Method for training text classification model, text classification method and corresponding device
CN115935245A (en) * 2023-03-10 2023-04-07 吉奥时空信息技术股份有限公司 Automatic classification and distribution method for government affair hotline cases

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
LUO J ET AL: "Research on civic hotline complaint text classification model based on word-2vec", 2018 INTERNATIONAL CONFERENCE ON CYBERENAB-LED DISTRIBUTED COMPUTING AND KNOWLEDGE DISCOVERY (CYBERC) *
任莹;: "基于预训练BERT模型的客服工单自动分类研究", 云南电力技术, no. 01 *
孔祥夫等: "基于BERT的民生问题文本分类模型——以浙江省政务热线数据为例", 北京大学学报(自然科学版) *

Also Published As

Publication number Publication date
CN116861302B (en) 2024-01-23

Similar Documents

Publication Publication Date Title
Lakshmanan et al. Machine learning design patterns
CN112100383B (en) Meta-knowledge fine tuning method and platform for multitask language model
US11720615B2 (en) Self-executing protocol generation from natural language text
US20200034737A1 (en) Architectures for natural language processing
CN109992668A (en) A kind of enterprise's the analysis of public opinion method and apparatus based on from attention
US20240185080A1 (en) Self-supervised data obfuscation in foundation models
CN112163099A (en) Text recognition method and device based on knowledge graph, storage medium and server
CN117473431B (en) Airport data classification and classification method and system based on knowledge graph
CN115935245B (en) Automatic classification and allocation method for government affair hot line cases
CN111709225B (en) Event causal relationship discriminating method, device and computer readable storage medium
CN114067308A (en) Intelligent matching method and device, electronic equipment and storage medium
CN114880449B (en) Method and device for generating answers of intelligent questions and answers, electronic equipment and storage medium
CN115600605A (en) Method, system, equipment and storage medium for jointly extracting Chinese entity relationship
CN112950414B (en) Legal text representation method based on decoupling legal elements
CN113627194A (en) Information extraction method and device, and communication message classification method and device
CN110705310B (en) Article generation method and device
CN116861302B (en) Automatic case classifying and distributing method
CN117236384A (en) Training and predicting method and device for terminal machine change prediction model and storage medium
CN111353728A (en) Risk analysis method and system
US11544460B1 (en) Adversarial anonymization and preservation of content
WO2024091291A1 (en) Self-supervised data obfuscation in foundation models
Greedharry et al. A smart mobile application for complaints in Mauritius
G. El Barbary et al. Neutrosophic Logic‐Based Document Summarization
CN114003708A (en) Automatic question answering method and device based on artificial intelligence, storage medium and server
CN114398980A (en) Cross-modal Hash model training method, encoding method, device and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant