CN115935245B - Automatic classification and allocation method for government affair hot line cases - Google Patents

Automatic classification and allocation method for government affair hot line cases Download PDF

Info

Publication number
CN115935245B
CN115935245B CN202310228000.6A CN202310228000A CN115935245B CN 115935245 B CN115935245 B CN 115935245B CN 202310228000 A CN202310228000 A CN 202310228000A CN 115935245 B CN115935245 B CN 115935245B
Authority
CN
China
Prior art keywords
case
model
category
department
training
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310228000.6A
Other languages
Chinese (zh)
Other versions
CN115935245A (en
Inventor
杨伊态
李颖
李军霞
王敬佩
柯宝宝
黄亚林
张兆文
李成涛
陈胜鹏
付卓
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Geospace Information Technology Co ltd
Original Assignee
Geospace Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Geospace Information Technology Co ltd filed Critical Geospace Information Technology Co ltd
Priority to CN202310228000.6A priority Critical patent/CN115935245B/en
Publication of CN115935245A publication Critical patent/CN115935245A/en
Application granted granted Critical
Publication of CN115935245B publication Critical patent/CN115935245B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention is applicable to the field of artificial intelligence, and provides an automatic classification and allocation method for government affair hot line cases, which comprises the following steps: step S1, a sample set is established and a pre-training model is trained; step S2, traversing the sample set according to the trained pre-training model to generate a case type related linked list and a department related linked list; s3, predicting the case type and the main department with highest probability of the sample by using the pre-training model, combining the case type related linked list and the department related linked list to obtain a case type enhancement vector and a department enhancement vector, and obtaining a case classification model and a department allocation model through training and parameter updating; and S4, acquiring input case contents, and outputting predicted case classification results and division results through case classification model and division model processing. The case classification and case allocation accuracy of the method is higher, and complaint cases in the government affair lines of convenience can be automatically classified and allocated to the authorities for processing.

Description

Automatic classification and allocation method for government affair hot line cases
Technical Field
The invention belongs to the technical field of artificial intelligence, and particularly relates to an automatic classification and allocation method for government affair hot line cases.
Background
In the government service hotline, citizens report the complaint information in modes of calling, micro-letter applet, APP, portal messages and the like, operators complete the classification of the case according to the complaint content, and then the complaint information is distributed to relevant responsibility units and disposal departments. With the wide application of government service hotlines, on one hand, the acceptance amount of hotline cases is increased and the manual disposal cost is increased gradually, and when social heat problems occur, the seat is busy, so that the demands of vast citizens are difficult to meet; on the other hand, because the cases of the convenient government service hotline are classified more, the distinction between the classes is not obvious, the case association departments are wide, the hierarchy is complex, and the realization of rapid and accurate case classification and allocation becomes the difficult problem that the office of the government service hotline has to be solved urgently.
At present, automatic classification and allocation methods of government affair hotlines are roughly divided into three types:
the first is a rule decision tree based approach. The method firstly designs filtering and matching rules, and then classifies cases according to the rules. Such as keyword matching based, knowledge base lookup based, etc. The method has good effect on distinguishing obvious cases in category, but has poor classification and distribution accuracy on cases with complex categories or similar categories.
The second category is machine learning based methods. Such methods are to sort and dispatch the case text by designing machine learning algorithms or models. Common methods include methods based on XGBoost algorithm, cosine similarity, SVM support vector machine and the like. The method can learn more category characteristics and partial semantic characteristics, and the classification and allocation accuracy of cases with complicated categories or similar categories is improved compared with that of the first category, but is still not ideal.
The third class is neural network based methods. The method extracts deep semantic features of the case text by designing the multi-layer neural network, and compared with a machine learning method, the method has higher accuracy in classifying similar case categories and distributing similar departments. However, the existing neural network-based method has poor classification accuracy for cases with higher similarity, and has unsatisfactory department accuracy for classifying cases of different categories into different administrative levels.
In order to realize efficient case classification and allocation in business, at present, a method based on a design rule decision tree, a machine learning algorithm and a deep neural network is adopted to realize automatic case classification by using a computer instead of manpower, and then the case classification is allocated to relevant responsible departments for treatment. However, the existing methods have low classification accuracy for distinguishing unobvious categories, such as "site noise (daytime)", "site noise (night)" and "business noise problem" which are difficult to distinguish case categories; the division accuracy rate for departments belonging to the same class and different levels is low, for example, the case processing range between the departments 'city management committee' and 'district management committee' is difficult to distinguish.
Disclosure of Invention
In view of the above problems, the invention aims to provide an automatic classification and allocation method for government affair hot line cases, which aims to solve the technical problem of low accuracy of the existing method.
The invention adopts the following technical scheme:
the invention provides an automatic classification and allocation method for government affair hot line cases, which comprises the following steps:
step S1, a sample set is established and a pre-training model is trained;
step S2, traversing the sample set according to the trained pre-training model to generate a case type related linked list and a department related linked list;
s3, predicting the case type and the main department with highest probability of the sample by using the pre-training model, combining the case type related linked list and the department related linked list to obtain a case type enhancement vector and a department enhancement vector, and obtaining a case classification model and a department allocation model through training and parameter updating;
and S4, acquiring input case contents, and outputting predicted case classification results and division results through case classification model and division model processing.
The beneficial effects of the invention are as follows: the invention provides an automatic classification and allocation method for government affair hot-wire cases, which is based on a neural network model, wherein the case classification model improves the classification accuracy of the model for similar case types by mainly training the differences of similar case types; the department allocation model improves the accuracy rate of the model on case allocation by fusing administrative division information and important training of the distinction of similar case authorities. Compared with the existing method, the method has higher case classification and case allocation accuracy, can automatically classify complaints in the convenience government hot line and allocate the complaints to the authorities for processing, improves the service efficiency of the convenience government hot line, reduces labor cost, improves the intelligent and automatic degree of the hot line, and improves the service level of people.
Drawings
Fig. 1 is a flowchart of a method for automatically classifying and allocating government affair hot line cases according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a process of pre-training a model provided by an embodiment of the present invention;
FIG. 3 is a schematic diagram of the generation of a case category related linked list;
fig. 4 is a process schematic of a case classification model and a department allocation model.
Detailed Description
The present invention will be described in further detail with reference to the drawings and examples, in order to make the objects, technical solutions and advantages of the present invention more apparent. It should be understood that the specific embodiments described herein are for purposes of illustration only and are not intended to limit the scope of the invention.
In order to illustrate the technical scheme of the invention, the following description is made by specific examples.
As shown in fig. 1, the automatic classification and allocation method for government affair hot line cases provided in this embodiment includes the following steps:
and S1, establishing a sample set and training a pre-training model.
The step is used for building a pre-training model and fine-tuning parameters of the pre-training model, and the specific process of the step is as follows in combination with the illustration of fig. 2:
s11, a sample set is established, wherein the sample format is [ case information, area information, case category, and authorities ], the area information is optional, and the sample set is proportionally divided into a training sample set and a verification sample set.
For example, sample a [ "day noise gives people no way to sleep, complaints, and has gone through one hour, too slow", "pyridazine", "440", "105" ], where 440 and 105 are case category code and executive code, respectively.
S12, for each sample in the training sample set, the BERT model is used for converting case information in the sample set into an embedded vector.
First, if there is area information in the sample, the area information is combined with case information to obtain combined case information. Such as: the case information after the combination of the samples a is that' the noise of the pyridazine area on the day can not cause people to sleep, complaints are generated, people do not exist after one hour, and the case information is too slow and too slow.
Then, the combined case information is converted into corresponding word element codes by using a BERT word segmentation device, and special character codes are added at the head and the tail to form the word element codes of the case information. In this example, the BERT model is Chinese-BERT-wwm-extBERT (Bidirectional Encoder Representation from Transformers).
Such as: the information after the merging of the samples a is converted into the word element code which is as follows: [101,1515,1515,1277,1921,1692,7509,6375,782,3187,3791,4717,6230,8024,2347,2832,6401,8024,6814,749,671,702,2207,3198,738,3766,782,5052,8024,1922,2714,8024,1922,2714,102]. Where 101 is the encoding of the special character 'CLS' and 102 is the encoding of the special character 'SEP'.
Finally, inputting the character element code of the case information into the BERT model to obtain an embedded vector E of the special character CLS CLS And a token vector [ E ] for each token 1 ,E 2 ,E 3 …E n ]Where n is the number of tokens.
The output of the BERT model is usually mainly two, one being the embedded vector of the special character "CLS" and the other being the token vector of each word. The embedded vector of the CLS is a vector obtained by pooling all the word vectors. Typically simple classification uses embedded vectors of the CLS directly, but individual token vectors may also be used if the model requires each token vector. In general, the embedded vector of the CLS is not used together with the token vector, and the embedded vector of the CLS is essentially derived from each token vector and represents the semantic feature of the entire text. The embedding vectors obtained after processing are different for different samples.
S13, respectively inputting the embedded vectors into two linear layers, outputting each linear layer to a SOFTMAX layer, respectively obtaining a case class number and a department number, and updating model parameters of the pre-training model by using a gradient descent method.
The two linear layers are respectively linear layers L1 and L2. During the training phase of the pre-training model, the vector E is embedded CLS The linear layer L1 is input, then the loss value is calculated using a cross entropy function, and the model parameters are updated using a gradient descent method. The input dimension of the linear layer L1 and the embedded vector E CLS The dimension of the number of the case categories is consistent, and the output dimension is the number N of the case categories class . Similarly, the vector E will be embedded CLS The linear layer L2 is input, then the loss value is calculated using a cross entropy function, and the model parameters are updated using a gradient descent method. The input dimension of the linear layer L2 and the embedded vector E CLS The dimension of (2) is consistent, and the output dimension is the number N of departments depa . This example uses a pytorch framework using a cross entropy function of cross entropyloss ().
During the prediction phase, the vector E is embedded during the subsequent model use CLS Inputting the case number into a linear layer L1, inputting a softmax layer to obtain the probability value of each case category, and taking the case number with the highest probability value as a prediction result. Similarly, the vector E will be embedded CLS Inputting a linear layer L2, inputting a softmax layer to obtain probability value of each department, and taking out the part with highest probability valueThe gate number is used as a prediction result.
S14, after the training sample set is used for carrying out iterative training on the pre-training model, the accuracy of the model is verified by using the verification sample set, and a model with the highest verification accuracy is used as a trained pre-training model, wherein the output of the pre-training model in the prediction stage is the probability value of each case category and department to which the sample belongs.
And training the model in a pre-training mode on the training set for multiple times in the mode, verifying the accuracy of the model by using the verification sample set, and selecting the model with the highest accuracy. The specific implementation is as follows:
firstly, after the pre-training model traverses all training sample sets, parameters are frozen, and in this embodiment, a pytorch framework is used, and parameters are frozen by using a model.
And then inputting each sample in the verification sample set into the pre-training model, and obtaining a corresponding prediction case type and a prediction department for each sample. For each verification sample, if the predicted case type is consistent with the case type in the sample, the prediction is correct, otherwise, the prediction is incorrect. The department prediction mode is the same. Where accuracy is the number of verification correct divided by the total number of verification samples. The total verification accuracy of the pre-training model is (case accuracy + department accuracy)/2. If the accuracy is higher than the existing highest accuracy, the model parameters of the version are saved, and the existing highest accuracy is changed into the accuracy of the version.
And S2, traversing the sample set according to the trained pre-training model to generate a case type related linked list and a department related linked list.
The process of generating the case category related linked list is as follows:
s211, setting an empty case category related linked list, wherein in the related linked list, the position serial number x stores the related case category of the case category x, and the length of the related linked list is N class
S212, inputting each sample into a trained pre-training model to obtain probability values of each case category to which the sample belongs, and outputting the previous k with the largest probability value in sequence 1 A serial number of the individual case category;
s213, comparing the output case categories with the correct case categories of the sample one by one in sequence, and ending when the comparison is consistent or all the comparison is inconsistent; when inconsistency occurs in the comparison process, recording information in a position corresponding to the correct case category serial number of the sample of the related linked list, and rearranging the information according to the count from large to small; if the current case category is y, the correct case category is z, and the information recording rule is as follows: if the information of the correct case category z exists in the related linked list position y, adding 1 to the information count corresponding to the correct case category z; if there is no information of the correct case category z, the information { case category z: count 1}.
In the step, a training pre-training model is used for each sample to obtain probability values of each case category to which the sample belongs, and the previous k is selected 1 The case category with the maximum probability value is output k from the big to the small according to the probability value 1 Number of individual case categories.
And comparing the output case categories with the case categories with correct samples one by one, starting comparison from the case category with the maximum probability value, and if the current case category y is inconsistent with the correct case category z recorded by the samples, recording information in the position y of the case category related linked list. The rule of recording information is that if the information of the case category z exists in the related linked list position y, the information count corresponding to the correct case category z is increased by 1; if there is no information of the correct case category z, the information { case category z: count 1}.
Repeating the comparison operation until the case category y is consistent with the correct case category z of the sample record, stopping, or k 1 The comparison of all the case categories is stopped. When the model prediction is correct, the first comparison is consistent, and no information is recorded at the moment; when model predicts k 1 If the case categories are inconsistent, k is recorded 1 Secondary information.
The case category related linked list generation schematic shown in fig. 3 illustrates that 4 samples are input, and the first 3 case categories are fetched each time.
Inputting a sample 1, and pre-training a model prediction result 1 of a model: the case category of the case 3 before the predicted probability value is: 449 326, 11; the correct case category is 326;
the predicted case category 449 is inconsistent with the correct case category 326, and {326:1} is recorded at the 449 th position of the case category related linked list;
the predicted case category 326 is consistent with the correct case category 326 and ends.
Input sample 2, model prediction result of pre-training model 2: the case category of the case 3 before the predicted probability value is: 449,440,326; the correct case category is 326;
the predicted case category 449 is inconsistent with the correct case category 326, and the correct case category 326 is recorded at the 449 th position of the case category related linked list, so that the count of the correct case category 326 is only increased by 1 to be {326:2};
the predicted case category 440 is inconsistent with the correct case category 326, and {326:1} is recorded at the position of the case category related linked list 440;
the predicted case category 326 is consistent with the correct case category 326 and ends.
Input sample 3, model prediction result of pre-training model 3: the case category of the case 3 before the predicted probability value is: 449,440,1; the correct case category is 1;
predicted case category 449 is inconsistent with correct case category 1, record {1:1} at the location of case category related linked list 449.
And ordering the information of 449 positions in the related chain table of the case category according to the size of the category count, and recording as follows: {326:2},{1:1}.
The predicted case category 440 is inconsistent with the correct case category 1, and {1:1} is recorded at the location of the case category related linked list 440.
And ordering the information of 440 positions in the case category related linked list according to the category count size, and recording as follows: {326:1},{1:1}.
The predicted case category 1 is consistent with the correct case category 1, and the process is finished.
Input sample 4, model prediction result 4 of pre-trained model: the case category of the case 3 before the predicted probability value is: 1,2,449; the correct case category is 1.
The predicted case category 1 is consistent with the correct case category 1, and the process is finished.
The generation of the case category related linked list is completed.
The generation mode of the department related linked list is the same as that of the case type related linked list, and the method is as follows: s321, setting an empty department related linked list; s322, inputting each sample into a trained pre-training model to obtain probability values of each department to which the sample belongs, and outputting the previous k with the largest probability value in sequence 1 Serial numbers of individual departments; s323, comparing the output departments with the correct departments of the sample one by one in sequence, and ending when the comparison is consistent or all the comparison is inconsistent; when inconsistency occurs in the comparison process, recording information in a position corresponding to the current department serial number of the corresponding Guan Lianbiao department, and rearranging the information from large to small according to the count; if the current department is y ', the correct department is z', the information recording rule is as follows: if the information of the correct department z ' exists in the related linked list position y ', adding 1 to the information count corresponding to the correct department z '; if there is no information for the correct department z ', add the information { department z': count 1}.
And S3, predicting the case type and the main department with highest probability of the sample by using the pre-training model, combining the case type related linked list and the department related linked list to obtain a case type enhancement vector and a department enhancement vector, and obtaining a case classification model and a department allocation model through training and parameter updating.
As shown in fig. 4, the training process of the case classification model and the department allocation model is the same. For obtaining a case classification model, the specific process is as follows:
s311, predicting a case type c with the highest probability value of the current sample by using the pre-training model.
Firstly, inputting a sample set, wherein the sample set is proportionally divided into a training sample set and a verification sample set, and predicting a case category c with highest probability of the sample through a pre-training model.
After the sample a is converted into the character encoding input bert model, the embedding direction is obtainedQuantity E CLS And a token vector [ E ] 1 ,E 2 ,E 3 …E n ]. Embedding vector E CLS The linear layer L1 is input, and then the case category c with the highest probability value is obtained through prediction by a softmax layer. Assume that the predicted case category is 449 and the department is 105.
S312, taking out the previous k from the position of the serial number c of the case type related linked list 2 The case categories related to the case category c are respectively marked as c 1 ,c 2 …c k2
S313, respectively taking out serial numbers c from the case type vector group 1 ,c 2 …c k2 And case category vectors with serial numbers of c, respectively recorded as Rc 1 ,Rc 2 …Rc k2 ,Rc。
The case class vector group is composed of a case class vector group with a dimension of [ N ] class ,E]Matrix formation, where N class For the case category number, E is the dimension of the word element, and the embedded vector E cls Is generally 768.
Such as: assume that the predicted case category is 449.K (K) 2 =2, the first 2 most relevant categories of case category 449 is 362,1; the case category vectors of lines 362,1 and 449 are taken out of the case category vector group.
S314, calculating case type enhancement vectors corresponding to the case type vectors
Figure GDA0004175856530000091
Wherein the method comprises the steps of
Figure GDA0004175856530000092
Figure GDA0004175856530000093
In E j And the j-th word element vector of the sample represents point multiplication.
S315, the case type enhancement vectors are sequentially input to two linear layers and then input to a SOFTMAX layer.
Enhancing vectors for case categories
Figure GDA0004175856530000094
Inputting two linear layers L3 and L4, inputting/outputting dimension of the linear layer L3, inputting dimension of the linear layer L4 and embedding vector E cls The output dimension of the linear layer L4 is the case category number N class
For example, for the class vector corresponding to the case class 362, the dimension is [1, e ], the vector of each word is [1, e ], the class vector is multiplied with each word point to obtain n class word vectors with the dimension of [1, e ], and then the n class word vectors are added to obtain the case class enhancement vector with the dimension of [1, e ].
Dimension [1, E]The case type enhancement vector of (1) is input into a linear layer L3, and the obtained result dimension is [1, E]The method comprises the steps of carrying out a first treatment on the surface of the Then inputting the result into a linear layer L4 to obtain the result with the dimension of [1, N ] class ]. Finally, the data are input into a SOFTMAX layer for normalization.
S316, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent method, wherein only the linear layers L3 and L4 and the case class vector group are trained, and parameters of the Bert model, the linear layer L1 and the linear layer L2 are frozen, namely, the parameters are not updated in the iterative training of the pre-training model.
S317, after the case classification model is subjected to iterative training by using the training sample set, the accuracy of the case classification model is verified by using the verification sample set, and the first model with the highest verification accuracy is used as the trained case classification model.
And after the case classification model traverses all training sample sets, freezing parameters. The present embodiment uses the pytorch framework, freezing parameters using the model. Eval () function, i.e., the case classification model does not update parameters during verification. Inputting each sample in the verification sample set into a case classification model, and obtaining a corresponding predicted case type for each sample; for each verification sample, if the predicted case type is consistent with the case type in the sample, the prediction is correct, otherwise, the prediction is incorrect. Accuracy is the number of verification correct divided by the total number of verification samples. The verification accuracy of the case classification model is (case accuracy + department accuracy)/2; if the accuracy is higher than the existing highest accuracy, the case classification model parameters of the version are saved, and the existing highest accuracy is changed into the accuracy of the version.
The training process of the department allocation model is the same and can be simply described as: s321, predicting a department c with highest probability value of the current sample by using the pre-training model, specifically, embedding a vector E CLS The linear layer L2 is input, and then the main department d with the highest probability value is obtained through prediction by a softmax layer. S322, taking out the front k from the position of the serial number d of the department related linked list 2 The case categories related to the department d are respectively marked as d 1 ,d 2 …d k2 The method comprises the steps of carrying out a first treatment on the surface of the S323, taking out serial numbers d from the department vector groups 1 ,d 2 …d k2 And department vectors with serial number d, respectively denoted Rd 1 ,Rd 2 …Rd k2 Rd; for example, the first 2 most relevant departments of the departments 105 are 100,13, and the department vectors of lines 100,13, and 449 are taken from the group of department category vectors. S324, calculating department enhancement vectors corresponding to the department vectors
Figure GDA0004175856530000101
Wherein->
Figure GDA0004175856530000102
Figure GDA0004175856530000103
In E j Representing point multiplication for the j-th word element vector of the sample; the department vector group is composed of a unit vector group with a dimension of N depa ,E]Matrix formation, where N depa Is the number of departments. S325, sequentially inputting department enhancement vectors into two linear layers and then into a SOFTMAX layer; for example, the department enhancement vector +.>
Figure GDA0004175856530000104
Inputting two linear layers L5 and L6, inputting and outputting dimension of the linear layer L5, inputting dimension of the linear layer L6 and embedding vector E cls The output dimension of the linear layer L6 is the department category number N depa . S326, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent method; s327, performing iterative training on the department allocation model by using a training sample set, and using a first model with the highest verification accuracy as a trained department allocation model by using the verification sample set to verify the accuracy of the model.
And S4, acquiring input case contents, and outputting predicted case classification results and division results through case classification model and division model processing.
Inputting the case content first, and inputting the area information selectively;
and then, predicting according to the pre-training model to obtain the case class number and the department number with highest probability.
And then, according to the case classification model and the department allocation model, obtaining output results of the linear layer L4 and the linear layer L6, inputting the output result of the linear layer L4 into a softmax layer to obtain a probability value of each case category, and then taking the case serial number with the highest score as a predicted case classification result. And inputting the output result of the linear layer L6 into a softmax layer to obtain the probability value of each department, and then taking the department serial number with the highest probability value as a predicted division department result.
And finally, outputting the predicted case classification result and the distribution department result as final output.
In summary, the invention provides an automatic classification and allocation method for government affair hot wire cases, which is based on a neural network model, and improves the classification accuracy of the model for similar case categories by using a two-step training method; and fusing administrative division information to improve the accuracy of the model on case allocation. Compared with the existing method, the case classification and case allocation accuracy of the method provided by the invention is higher. The method can automatically classify complaint cases in the convenience government hot line and distribute the complaint cases to the authorities for processing, reduces the labor cost and improves the intelligent and automatic degree of the hot line.
The foregoing description of the preferred embodiments of the invention is not intended to be limiting, but rather is intended to cover all modifications, equivalents, and alternatives falling within the spirit and principles of the invention.

Claims (2)

1. The automatic classification and allocation method for the government affair hot line cases is characterized by comprising the following steps of:
step S1, a sample set is established and a pre-training model is trained;
step S2, traversing the sample set according to the trained pre-training model to generate a case type related linked list and a department related linked list;
s3, predicting the case type and the main department with highest probability of the sample by using the pre-training model, combining the case type related linked list and the department related linked list to obtain a case type enhancement vector and a department enhancement vector, and obtaining a case classification model and a department allocation model through training and parameter updating;
s4, acquiring input case contents, and outputting predicted case classification results and division results through case classification model and division model processing;
in the step S2, the process of generating the case category related linked list and the department related linked list is the same;
for generating a case category related linked list, the process is as follows:
s211, setting an empty case category related linked list, wherein in the related linked list, the position serial number x stores the related case category of the case category x;
s212, inputting each sample into a trained pre-training model to obtain probability values of each case category to which the sample belongs, and outputting the previous k with the largest probability value in sequence 1 A serial number of the individual case category;
s213, comparing the output case categories with the correct case categories of the sample one by one in sequence, and ending when the comparison is consistent or all the comparison is inconsistent; when inconsistency occurs in the comparison process, recording information in a position corresponding to the correct case category serial number of the sample of the related linked list, and rearranging the information according to the count from large to small; if the current case category is y, the correct case category is z, and the information recording rule is as follows: if the information of the correct case category z exists in the related linked list position y, adding 1 to the information count corresponding to the correct case category z; if there is no information of the correct case category z, the information { case category z: count 1};
in the step S3, the process of obtaining the case classification model and the department allocation model is the same;
for obtaining a case classification model, the specific process is as follows:
s311, predicting a case type c with the highest probability value of the current sample by using a pre-training model;
s312, taking out the previous k from the position of the serial number c of the case type related linked list 2 The case categories related to the case category c are respectively marked as c 1 ,c 2 …c k2
S313, respectively taking out serial numbers c from the case type vector group 1 ,c 2 …c k2 And case category vectors with serial numbers of c, respectively recorded as Rc 1 ,Rc 2 …Rc k2 ,Rc;
S314, calculating case type enhancement vectors corresponding to the case type vectors
Figure FDA0004175856520000021
Wherein the method comprises the steps of
Figure FDA0004175856520000022
Figure FDA0004175856520000023
In E j Representing point multiplication for the j-th word element vector of the sample;
s315, sequentially inputting the case type enhancement vectors into two linear layers and then inputting the case type enhancement vectors into a SOFTMAX layer;
s316, calculating a loss value by using a cross entropy function, and updating model parameters by using a gradient descent method;
s317, after the case classification model is subjected to iterative training by using the training sample set, the accuracy of the case classification model is verified by using the verification sample set, and the first model with the highest verification accuracy is used as the trained case classification model.
2. The automatic classification and allocation method for government affair hot-line cases according to claim 1, wherein the specific process of step S1 is as follows:
s11, a sample set is established, wherein the sample format is [ case information, area information, case category, and authorities ], the area information is optional, and the sample set is divided into a training sample set and a verification sample set according to a proportion;
s12, for each sample in the training sample set, converting case information in the sample set into an embedded vector by using a BERT model;
s13, respectively inputting the embedded vectors into two linear layers, outputting each linear layer to a SOFTMAX layer, respectively obtaining a case class number and a department number, and updating model parameters of the pre-training model by using a gradient descent method;
s14, after the training sample set is used for carrying out iterative training on the pre-training model, the accuracy of the model is verified by using the verification sample set, and a model with the highest verification accuracy is used as a trained pre-training model, wherein the output of the pre-training model in the prediction stage is the probability value of each case category and department to which the sample belongs.
CN202310228000.6A 2023-03-10 2023-03-10 Automatic classification and allocation method for government affair hot line cases Active CN115935245B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310228000.6A CN115935245B (en) 2023-03-10 2023-03-10 Automatic classification and allocation method for government affair hot line cases

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310228000.6A CN115935245B (en) 2023-03-10 2023-03-10 Automatic classification and allocation method for government affair hot line cases

Publications (2)

Publication Number Publication Date
CN115935245A CN115935245A (en) 2023-04-07
CN115935245B true CN115935245B (en) 2023-05-26

Family

ID=85818574

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310228000.6A Active CN115935245B (en) 2023-03-10 2023-03-10 Automatic classification and allocation method for government affair hot line cases

Country Status (1)

Country Link
CN (1) CN115935245B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116611453B (en) * 2023-07-19 2023-10-03 天津奇立软件技术有限公司 Intelligent order-distributing and order-following method and system based on big data and storage medium
CN116861302B (en) * 2023-09-05 2024-01-23 吉奥时空信息技术股份有限公司 Automatic case classifying and distributing method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239529A (en) * 2017-05-27 2017-10-10 中国矿业大学 A kind of public sentiment hot category classification method based on deep learning
WO2019149200A1 (en) * 2018-02-01 2019-08-08 腾讯科技(深圳)有限公司 Text classification method, computer device, and storage medium
CN115659974A (en) * 2022-09-30 2023-01-31 中国科学院软件研究所 Software security public opinion event extraction method and device based on open source software supply chain

Family Cites Families (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180204135A1 (en) * 2017-01-18 2018-07-19 Wipro Limited Systems and methods for improving accuracy of classification-based text data processing
CN108710651B (en) * 2018-05-08 2022-03-25 华南理工大学 Automatic classification method for large-scale customer complaint data
CN111177367B (en) * 2019-11-11 2023-06-23 腾讯科技(深圳)有限公司 Case classification method, classification model training method and related products
WO2022093982A1 (en) * 2020-10-30 2022-05-05 Convey, Llc Machine learning event classification and automated case creation
CN112488551B (en) * 2020-12-11 2023-04-07 浪潮云信息技术股份公司 Hot line intelligent order dispatching method based on XGboost algorithm
CN112581106B (en) * 2021-02-23 2021-05-28 苏州工业园区测绘地理信息有限公司 Government affair event automatic order dispatching method fusing grid semantics of handling organization
CN112800232B (en) * 2021-04-01 2021-08-06 南京视察者智能科技有限公司 Case automatic classification method based on big data
CN114547315A (en) * 2022-04-25 2022-05-27 湖南工商大学 Case classification prediction method and device, computer equipment and storage medium
CN115242487B (en) * 2022-07-19 2024-04-05 浙江工业大学 APT attack sample enhancement and detection method based on meta-behavior
CN115344695A (en) * 2022-07-27 2022-11-15 中国人民解放军空军工程大学 Service text classification method based on field BERT model
CN115455315B (en) * 2022-11-10 2023-04-07 吉奥时空信息技术股份有限公司 Address matching model training method based on comparison learning

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107239529A (en) * 2017-05-27 2017-10-10 中国矿业大学 A kind of public sentiment hot category classification method based on deep learning
WO2019149200A1 (en) * 2018-02-01 2019-08-08 腾讯科技(深圳)有限公司 Text classification method, computer device, and storage medium
CN115659974A (en) * 2022-09-30 2023-01-31 中国科学院软件研究所 Software security public opinion event extraction method and device based on open source software supply chain

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
Research on Civic Hotline Complaint Text Classification Model Based on word2vec;JingYu Luo 等;《2018 International Conference on Cyber-Enabled Distributed Computing and Knowledge Discovery》;全文 *
针对民生热线文本的热点挖掘系统设计;薛彬;《中国计量大学学报》;第28卷(第3期);全文 *

Also Published As

Publication number Publication date
CN115935245A (en) 2023-04-07

Similar Documents

Publication Publication Date Title
CN115935245B (en) Automatic classification and allocation method for government affair hot line cases
US20240046043A1 (en) Multi-turn Dialogue Response Generation with Template Generation
CN113905391B (en) Integrated learning network traffic prediction method, system, equipment, terminal and medium
CN110457442A (en) The knowledge mapping construction method of smart grid-oriented customer service question and answer
CN105955951B (en) A kind of method and device of message screening
CN112395404B (en) Voice key information extraction method applied to power dispatching
Chen et al. A deep learning method for judicial decision support
CN117473431A (en) Airport data classification and classification method and system based on knowledge graph
CN115456421A (en) Work order dispatching method and device, processor and electronic equipment
CN116541755A (en) Financial behavior pattern analysis and prediction method based on time sequence diagram representation learning
CN115905538A (en) Event multi-label classification method, device, equipment and medium based on knowledge graph
CN113627194B (en) Information extraction method and device, and communication message classification method and device
CN109543038B (en) Emotion analysis method applied to text data
CN117172508B (en) Automatic dispatch method and system based on city complaint worksheet recognition
CN113920379A (en) Zero sample image classification method based on knowledge assistance
CN115982646B (en) Management method and system for multisource test data based on cloud platform
CN112488736A (en) Method and system for analyzing government affair hotline work order data in field of residential construction
CN109658148B (en) Marketing activity complaint risk prediction method based on natural language processing technology
CN116541166A (en) Super-computing power scheduling server and resource management method
CN113538011B (en) Method for associating non-booked contact information with booked user in electric power system
CN113742498B (en) Knowledge graph construction and updating method
CN115203365A (en) Social event processing method applied to comprehensive treatment field
CN112989054B (en) Text processing method and device
CN115860964A (en) Reimbursement approval process generation method, system, equipment and storage medium
CN114491004A (en) Title generation method and device, electronic equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant