CN117235638A

CN117235638A - Police condition content multilayer classification method based on pre-training model

Info

Publication number: CN117235638A
Application number: CN202311194475.4A
Authority: CN
Inventors: 王明光; 孙孝坤; 那正平; 高进; 蒋维; 徐佳申; 钟浩; 刘红志; 高友光
Original assignee: Daoshu Shanghai Digital Technology Co ltd
Current assignee: Daoshu Shanghai Digital Technology Co ltd
Priority date: 2023-09-15
Filing date: 2023-09-15
Publication date: 2023-12-15

Abstract

The invention discloses a pre-training model-based police condition content multilayer classification method, which comprises the following steps: acquiring alarm condition content in an alarm receiving system and class data of each level corresponding to manual labeling; the original warning condition content and corresponding class data of each level are subjected to preprocessing operations such as cleaning, de-duplication, splicing and the like, so that unified normalized data are obtained; respectively extracting normalized data of each level as a training data set, a verification data set and a test data set of the corresponding level; training a pre-training deep learning network classification model by using each level training data set, and checking a model by using a corresponding level verification data set after training the model in each batch; after training of each level model is completed, testing the level model by using a corresponding level test data set to obtain a model evaluation result; and after the evaluation result reaches the standard, carrying out prediction classification. The method and the device can be used for rapidly and effectively predicting the category corresponding to the warning content, and are high in response speed, high in pertinence and good in prediction effect.

Description

Police condition content multilayer classification method based on pre-training model

Technical Field

The invention belongs to the technical field of artificial intelligence, and particularly relates to a police condition content multilayer classification method based on a pre-training model.

Background

When an alarm event occurs, each alarm condition content in the alarm receiving system corresponds to a plurality of levels of alarm condition categories. The multi-level classification type of a large amount of alarm content data is accurate and timely, is beneficial to alarm planning arrangement and is also beneficial to alarm personnel to control the case, so that the alarm efficiency is improved. Therefore, a stable and reliable multi-layer classification model is built for alarm content data, the workload of alarm receiving customer service personnel is reduced, and the improvement of alarm output efficiency, alarm output quality and public safety maintenance are particularly important.

At present, a plurality of layers of police content classification researches are also available; for example, an emergency linkage warning condition automatic classification system disclosed in CN101201835a is specifically realized based on a keyword matching method; the police classification method and the system thereof disclosed in CN110990562A are realized based on the traditional machine learning algorithm, and each class is spliced into a first class as the police classification, so that the hierarchical information is lost, and the accuracy is affected; the technical scheme based on keyword matching and the traditional machine learning algorithm is an old method.

With the rapid development of artificial intelligence technology in recent years, a plurality of new algorithm models are layered endlessly. The novel algorithm model is adopted, hierarchical information is fully utilized, classification models of all the hierarchies are sequentially established, and accuracy is further improved, so that alarm efficiency and alarm quality are further improved.

Disclosure of Invention

The invention aims to overcome the defects of the prior art in the background technology, thereby providing a multi-layer classification method of the warning condition content based on a pre-training model, respectively establishing a training data set, a verification data set and a test data set of each layer through processing the warning condition content and corresponding data of each layer class, establishing a prediction model of each layer through a better pre-training deep learning network classification algorithm Bert-Softmax, and finally evaluating and uploading each layer model, wherein the multi-layer classification method has strong pertinence, can further improve the prediction accuracy of each layer class, and optimizes the multi-layer classification model of the warning condition content of a warning system; thereby, the problems in the background art are solved.

In order to solve the technical problems, the invention is realized by the following technical scheme:

the invention discloses a pre-training model-based police condition content multilayer classification method, which comprises the following steps of:

s1, acquiring alarm condition contents and corresponding manually marked class data of each level in an alarm receiving system;

s2, carrying out pretreatment operations such as data cleaning, data deduplication, data splicing and the like on the original warning condition content and the corresponding class data of each level to obtain the same normalized data;

s3, respectively extracting normalized data of each level to be a training data set, a verification data set and a test data set of the corresponding level;

s4, training a deep learning network classification model Bert-Softmax by using each level training data set respectively, and verifying the data set verification model by using the corresponding level after training the deep learning network classification model in each batch;

s5, after training of each level model is completed, testing the level model by using a corresponding level prediction data set to obtain an evaluation result;

and S6, after the evaluation results of all the levels meet the requirements, the on-line corresponding level model carries out real-time prediction classification on the on-line warning condition content data.

Further, the step S2 specifically includes the following sub-steps:

s21, data cleaning: deleting invalid data and incomplete data;

s22, data conversion: converting the escape characters in the data into normal characters;

s23, data deduplication: deleting the data with identical police condition content and class, and only retaining one piece of the same data;

s24, data splicing: sequentially splicing the alarm categories of each level above each level, and finally splicing the alarm categories with alarm content data to serve as the alarm content data of the level, wherein if the alarm content data is the first level, splicing is not needed;

s25, processing the data into unified and normative data.

Further, the step S3 specifically includes the following sub-steps:

s31, sorting the normalized data of each level according to the police condition category and the police condition content of the corresponding level, and uniformly extracting a certain amount of data from front to back according to a certain step length to ensure that the distribution of the extracted sample data is identical to that of the total data;

s32, respectively extracting training data sets, verification data sets and test data sets of all levels according to the method, wherein the quantity proportion is determined according to the situation, and the data of the three data sets cannot be crossed.

Further, the step S4 specifically includes the following sub-steps:

s41, training a pre-training deep learning network classification model Bert-Softmax in batches by using each level training data set, verifying the data set verification model by using a corresponding level after each batch of data training models, and storing the model if the verification result is improved, wherein each training of the training data set is completed once;

s42, repeating Torons training of the corresponding level model by using each level training data set until the verification result is not improved after Torons training, and finishing training.

Further, the step S5 is specifically to test the hierarchical model with the corresponding hierarchical test data set after training the hierarchical model to obtain a model evaluation result: accuracy, precision, recall, F1 comprehensive index.

Further, in the step S6, specifically, after the evaluation result of each level meets the requirement, the on-line corresponding level model is used to predict on-line alarm condition content data in real time, and finally, the alarm condition category of the corresponding level is obtained. And the warning condition category prediction data of each level above each level are spliced in sequence, and finally the warning condition content data on the line are spliced to form the warning condition content data of the level, and if the warning condition content data is the first level, the warning condition category prediction data does not need to be spliced.

Compared with the prior art, the invention has the following beneficial effects:

(1) The invention realizes multi-level class data processing of alarm condition contents, respectively establishes a training data set, a verification data set and a test data set of each level, establishes a prediction model of each level through a more advanced pre-training deep learning network classification algorithm Bert-Softmax, and finally evaluates, goes online and has strong pertinence on each level model, thereby further improving the prediction accuracy of each level class and optimizing the multi-level class classification model of the alarm condition contents of the alarm receiving system.

(2) The invention can rapidly and effectively predict the category corresponding to the warning condition content, and has the advantages of high response speed, strong pertinence and good prediction effect.

Of course, it is not necessary for any one product to practice the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed for the description of the embodiments will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a method for classifying alert content multi-layer classes based on a pre-training model according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The invention will now be described in detail with reference to the drawings and specific examples.

The experimental test of the invention is that 2 8-core CPU Intel (R) Xeon (R) CPU E5630@2.53GHz,2 GPU Geforce RTX 3090 and Python with Python version of Python3.11.

The following details of each step involved in the technical scheme of the present invention are given according to the flowchart of fig. 1: as shown in fig. 1, a method for classifying alert content in multiple layers based on a pre-training model includes the following steps:

s1, acquiring alarm condition contents in an alarm receiving system and corresponding manually marked class data of each level.

For example, the alarm condition content and the corresponding manual labeling category data on the alarm receiving list are obtained from the alarm receiving system database, and the alarm condition category data has four levels: category names, type names, subclass names, and the like, as shown in table 1 below.

Table 1, police condition category data example table

S2, preprocessing operations such as cleaning, de-duplication, splicing and the like of the original warning content and the corresponding class data of each level are carried out, and unified and normalized data are obtained, specifically:

s21, cleaning data, and deleting invalid data and incomplete data;

s22, converting the data to convert the escape characters in the data into normal characters;

s23, data deduplication is performed, data with identical police condition content and police condition category are deleted, and only one piece of the same data is reserved. For example, as in table 1, the 1 st and 2 nd pieces of data are identical, the duplicates are deleted, and only one piece of data is reserved.

S24, data are spliced, the police condition category data of each level above each level are spliced in sequence, and finally the police condition category data and the police condition content data are spliced to form the police condition content data of the level, and if the police condition content data is the first level, the police condition category data does not need to be spliced; for example, "category name", "subclass name", "alert content" are spliced into the alert content of the fourth hierarchy by "- -". For example, as in table 1, the spliced alert content data of the fourth level of the 4 th data is: "criminal police-infringement of property rights-theft-2023-06-14 20:07:59 extension XX (189 XXXXXXXX) alarm, said to occur at (Sichuan xxxxx): police proof battery car theft "

S25, processing the data into unified and normative data, specifically, hierarchical spliced police information content and hierarchical category two-column corresponding data;

s3, respectively extracting normalized data of each level to be a training data set, a verification data set and a test data set of the corresponding level, wherein the specific steps are as follows:

s31, sorting the normalized data of all levels according to the police condition category and the police condition content of the corresponding level, and uniformly extracting a certain amount of data from front to back according to a certain step length to ensure that the distribution of the extracted sample data is identical to that of the total data;

s32, respectively extracting training data sets, verification data sets and test data sets of all levels according to the method, wherein the quantity proportion is determined according to the situation, and the data of the three data sets cannot be crossed;

for example, the data set has 30 ten thousand pieces of data, and 3 ten thousand pieces of data are respectively extracted as a training data set, a verification data set and a test data set. The step size of the data extracted from the data set is s=30/3=10, the data set indexes are 1, 11, 21, … …, (n×s+1) are used as training data sets, the data set indexes are 4, 14, 24, … …, (n×s+4) are used as verification data sets, the data set indexes are 7, 17, 27, … …, (n×s+7) are used as test data sets, and N is a natural number. The three data sets are taken out of order for use.

S4, training a pre-training deep learning network classification model (Bert-Softmax) by using each level training data set, and checking a model by using a corresponding level verification data set after each batch of training models, wherein the method specifically comprises the following steps:

s41, training a pre-training deep learning network classification model (Bert-Softmax) in batches by using each level training data set, verifying the data set verification model by using a corresponding level after each batch of training models, and storing the model if the verification result is improved, wherein the training data set completes one round of calculation every time training, and the size of one batch of data is determined according to the hardware condition;

s42, repeating Torons training of the corresponding level model by using each level training data set until the verification result is not lifted after the Torons training, and finishing the training, wherein the number of training wheels is determined according to specific conditions;

for example, 3 ten thousand pieces of data are 100 pieces of data per batch, and the model is trained in 300 batches. After each batch of training is completed, the loss function score of the verification result is reduced compared with the last model preservation, and the F1 comprehensive index score is increased compared with the last model preservation, so that the model is preserved, and meanwhile, the loss function score and the F1 comprehensive index score are preserved.

S5, after training of each level model is completed, testing the level model by using a corresponding level test data set to obtain a model evaluation result, wherein the model evaluation result is specifically as follows:

after training of each level model is completed, the level model is tested by the corresponding level test data set to obtain a model evaluation result: accuracy, precision, recall, F1 comprehensive index.

S6, after the evaluation results of all the levels meet the requirements, the on-line corresponding level model predicts and classifies the on-line warning condition content data in real time, and specifically comprises the following steps:

after the evaluation result of each level meets the requirement, for example, the score of the F1 comprehensive index reaches more than 80%, and the online corresponding level model is achieved. And predicting the on-line police condition content data in real time through an interface service form, and finally obtaining the police condition category of the corresponding level. And the warning condition category prediction data of each level above each level are spliced in sequence, and finally the warning condition content data on the line are spliced to form the warning condition content data of the level, and if the warning condition content data is the first level, the warning condition category prediction data does not need to be spliced.

In sum, the multi-layer classification method of the warning content based on the pre-training model can further improve the prediction accuracy of the warning categories of each layer, and plays an important role in optimizing the multi-layer classification model of the warning content in the warning receiving system.

The preferred embodiments of the invention disclosed above are intended only to assist in the explanation of the invention. The preferred embodiments are not exhaustive or to limit the invention to the precise form disclosed. Obviously, many modifications and variations are possible in light of the above teaching. The embodiments were chosen and described in order to best explain the principles of the invention and the practical application, to thereby enable others skilled in the art to best understand and utilize the invention. The invention is limited only by the claims and the full scope and equivalents thereof.

Claims

1. The warning condition content multilayer classification method based on the pre-training model is characterized by comprising the following steps of:

s4, training a deep learning network classification model Bert-Softmax by using each level of training data set, and verifying the data set verification model by using a corresponding level after training the deep learning network classification model each time;

2. The method for classifying alert content multi-layer classes based on a pre-training model according to claim 1, wherein the step S2 specifically comprises the following classifying steps:

s21, data cleaning: deleting invalid data and incomplete data;

s25, processing the data into unified and normative data.

3. The method for classifying alert content multi-layer classes based on a pre-training model according to claim 1, wherein the step S3 specifically comprises the following classifying steps:

4. The method for classifying alert content multi-layer classes based on a pre-training model according to claim 1, wherein the step S4 specifically comprises the following classifying steps:

5. The method for classifying alert content layers based on pre-training models according to claim 1, wherein the step S5 is specifically to test each layer model with a corresponding layer test data set to obtain a model evaluation result after training of the layer model is completed: accuracy, precision, recall, F1 comprehensive index.

6. The method for classifying the police condition content in multiple layers based on the pre-training model according to claim 1, wherein the step S6 is specifically that after the evaluation result of each layer meets the requirement, the on-line corresponding layer model predicts the on-line police condition content data in real time, and finally the police condition category of the corresponding layer is obtained. And the warning condition category prediction data of each level above each level are spliced in sequence, and finally the warning condition content data on the line are spliced to form the warning condition content data of the level, and if the warning condition content data is the first level, the warning condition category prediction data does not need to be spliced.