CN114003683A

CN114003683A - Alarm condition analysis method based on natural language processing and association rule

Info

Publication number: CN114003683A
Application number: CN202111303071.5A
Authority: CN
Inventors: 黄淑兵; 张亚洲; 蔡岗; 缪新顿; 陆杨; 朱键; 陆伟佳; 张长辉
Original assignee: Traffic Management Research Institute of Ministry of Public Security
Current assignee: Traffic Management Research Institute of Ministry of Public Security
Priority date: 2021-11-04
Filing date: 2021-11-04
Publication date: 2022-02-01

Abstract

The invention relates to the technical field of alarm condition analysis, and particularly discloses an alarm condition analysis method based on natural language processing and association rules, wherein the alarm condition analysis method comprises the following steps: acquiring original data of an alarm receiving and handling service; processing the original data of the alarm handling service by a tool based on natural language to obtain event triples; matching the event triple with an accident factor, and binding the structured data in the event triple with the accident factor, wherein the accident factor represents a mapping relation table of a natural language of the alarm and a digital language of the alarm; mining the association rules of the multiple pieces of structured data bound by each accident factor according to an association rule mining algorithm to obtain a set with the association rules; and processing the set with the association rule to obtain an alarm condition analysis result. The alarm condition analysis method based on the natural language processing and association rules can effectively utilize historical alarm condition data to analyze the alarm condition.

Description

Alarm condition analysis method based on natural language processing and association rule

Technical Field

The invention relates to the technical field of alarm condition analysis, in particular to an alarm condition analysis method based on natural language processing and association rules.

Background

At present, a large amount of unstructured alarm condition text data are often generated in an alarm receiving and processing service, and in the face of the analysis work requirement of alarm condition contents, the existing conventional processing and analysis methods such as database query are difficult to dig out the association relationship which is helpful for alarm condition analysis and judgment from a large amount of alarm condition information, and the historical alarm condition data cannot be effectively utilized to construct an analysis early warning model.

Disclosure of Invention

The invention provides a warning condition analysis method based on natural language processing and association rules, which solves the problem that the historical warning condition data cannot be effectively utilized to carry out warning condition analysis and early warning in the related technology.

As an aspect of the present invention, there is provided an alert analysis method based on natural language processing and association rules, including:

acquiring original data of an alarm receiving and handling service;

processing the original data of the alarm receiving and processing service by a tool based on natural language to obtain event triples;

matching the event triple with an accident factor, and binding the structured data in the event triple with the accident factor, wherein the accident factor represents a mapping relation table of a natural language of an alarm and a digital language of the alarm;

mining the association rules of the multiple pieces of structured data bound by each accident factor according to an association rule mining algorithm to obtain a set with the association rules;

and processing the frequent item set to obtain an alarm condition analysis result.

Further, the raw data of the alarm receiving and processing service comprises: structured data and unstructured data, the structured data including an alert ticket number, a data source, an alarm receiver, a jurisdiction, an alert type, an alert time, a treatment result flag, a feedback person, a feedback department, a feedback time, an alert reverse check flag, and an alert check flag; the unstructured data includes alarm content and feedback content.

Further, the processing the raw data of the alarm receiving and processing service by a tool based on natural language to obtain an event triple includes:

carrying out reduction processing on the structured data and carrying out data cleaning on the unstructured data;

and performing word segmentation, part-of-speech tagging, syntactic structure description and semantic dependency analysis on the unstructured data after data cleaning according to a tool based on natural voice, and constructing an event triple.

Further, the matching the event triplet with the accident factor and binding the structured data in the event triplet with the accident factor includes:

classifying the accident factors according to the alarm condition types;

matching the classified accident factors with the event triples one by one;

binding the structured data in the matched event triples with the accident factors;

and repeating the steps until all the structured data in the matched event triples are bound with the accident factor.

Further, the mining of association rules for the multiple pieces of structured data bound to each accident factor according to an association rule mining algorithm to obtain a set with association rules includes:

establishing a set of items to be mined for the plurality of pieces of structured data bound by each accident factor, wherein the set of items represents a set of the plurality of pieces of structured data bound by each accident factor;

traversing the item set according to a preset minimum support threshold to obtain a frequent item set;

and traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold value to obtain a set with association rules.

Further, the traversing the item set according to a preset minimum support threshold to obtain a frequent item set includes:

setting a minimum support threshold;

calculating the support rate of the item set in the item set according to a support rate calculation formula;

and traversing the item set, and if the support rate of the item set of the current item set is not less than the minimum support rate, marking the item set as a frequent item set.

Further, the support rate calculation formula is as follows:

wherein the content of the first and second substances,

representing a collection m of items_jSupport ratio of (1), Num (m)_j) Representing a set m of items of structured data D_jNum (D) represents the number of tasks of the structured data D;

the minimum support threshold

The value range of (A) is 25 to 35 percent.

Further, the traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold to obtain a set with association rules includes:

setting a minimum confidence threshold;

calculating a confidence of a set of items within the set of items according to a confidence calculation formula;

traversing the non-empty subset of the frequent item set, and if the confidence of the item set of the non-empty subset of the current frequent item set is not less than the minimum confidence threshold, marking the item set as a set with association rules.

Further, the confidence calculation formula is:

wherein m is_aDenotes the cause, m, in the structured data D_bRepresents the conclusion in the structured data D,

indicates that the reason was concluded

The degree of confidence of (a) is,

indicates that the reason was concluded

The rate of support of (a) is,

presentation reason m_aThe support ratio of (a);

the minimum confidence threshold value ranges from 70% to 75%.

Further, the processing the set with the association rule to obtain an alarm analysis result includes:

performing attribute restoration processing on the set with the association rule;

and matching the content subjected to attribute restoration processing with an evaluation factor to obtain an alarm analysis conclusion and conclusion evaluation, wherein the evaluation factor represents a mapping relation table of the alarm analysis conclusion and the conclusion evaluation.

According to the alarm condition analysis method based on natural language processing and association rules, association analysis is established between the main events of the alarm condition text and historical alarm condition data through a processing tool based on natural language and an association rule analysis method, event triple extraction can be carried out on unstructured text information, and association rules are established for different accident incentive types by combining a large amount of historical data, so that the alarm condition analysis capability of an alarm receiving and processing system is improved, and accident reason investigation and related behavior improvement actions can be carried out more pertinently.

Drawings

The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:

fig. 1 is a flowchart of an alarm analysis method based on natural language processing and association rules according to the present invention.

Fig. 2 is a schematic diagram illustrating a traffic accident category item set style description provided by the present invention.

Detailed Description

It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.

In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances in order to facilitate the description of the embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.

In this embodiment, an alarm analysis method based on natural language processing and association rules is provided, and fig. 1 is a flowchart of an alarm analysis method based on natural language processing and association rules according to an embodiment of the present invention, as shown in fig. 1, including:

s110, acquiring original data of an alarm receiving and processing service;

in the embodiment of the present invention, the acquired raw data of the alarm receiving and processing service is acquired, and in order to establish an association rule model, unstructured data and structured data need to be separated, where the raw data of the alarm receiving and processing service includes: structured data and unstructured data, the structured data including an alert ticket number, a data source, an alarm receiver, a jurisdiction, an alert type, an alert time, a treatment result flag, a feedback person, a feedback department, a feedback time, an alert reverse check flag, and an alert check flag; the unstructured data includes alarm content and feedback content.

For example, the alarm content: "alarm person A receives alarm, alarm person (telephone: 13XXXXXXXXX) name: ' My vehicle (SuBXXXXX) is parked at the roadside near the north door of XX district of great path of Longhu lake, children sitting in the rear row in the vehicle suddenly open the vehicle door, because of the slippery rainy road, a human tricycle coming from behind can not be braked to collide with the vehicle door, the person on the tricycle falls down to the ground to be injured, and the person is sent to a hospital for treatment. ' two accident parties negotiate in the hospital at present, and the opinions of the two parties are different, so that an alarm needs to be given. "

And (3) feedback content: "Accident policemen B, site feedback, has negotiated treatment. "

The serial number of the warning notice sheet: "001100011011"

The police receiver: "alarm receiver a", jurisdiction: "XXXXX"

The type of the alert: vehicle and non-motor vehicle "

Alarm time: "yyyy-MM-dd", treatment result flag: "1" …

S120, processing the original data of the alarm receiving and processing service through a tool based on natural language to obtain event triples;

in the embodiment of the present invention, the method specifically includes:

When the structured data is reduced, for example, an alarm condition reverse check mark, a treatment result mark, an alarm condition verification mark, and the like are set as boolean attributes, an alarm condition type, a data source, and the like are set as numerical attributes, and the content corresponding to a specific numerical value belongs to prior knowledge and is continuously maintained for the data.

During data cleaning work on unstructured data, invalid characters are deleted, text information which is not related to modeling and is automatically overlapped by a system is automatically overlapped, such as alarm names, alarm receiver names, law enforcement requirements, law enforcement equipment information and the like which are overlapped by the system before description of alarm content.

When the unstructured data is segmented and part-of-speech labeled, for example, the alarm content is segmented into word sequences and part-of-speech of each time is recognized, that is, verbs, nouns, pronouns, adverbs and the like are recognized.

In the syntax structure description of the unstructured data, the dependency relationship between parts of speech, namely the group dominance relationship, the moving guest relationship, the intervening guest relationship, the parallel relationship, the inter-guest relationship and the like, is identified in the sentence.

Performing semantic dependency analysis on the unstructured data, and constructing a triple of an event, namely a subject predicate object of the event; it is worth noting that the natural language based processing process performed on the textual data may be, but is not limited to, using an open source natural language processing based toolkit.

The triples for screening the relationships such as the SBV major-minor relationship, the VOB moving object and the like are as follows:

(child ', ' open ', ' door ') dependencies: SBV, VOB;

('Tricycle', 'Collision', 'door') dependencies: SBV, VOB;

(dependence of 'Tricycle', 'Fall', 'on ground'): SBV, POB;

('I', 'Send', 'Hospital rescue') dependencies: SVB, POB, VOB;

(both opinion ', ' present ', ' diverge ') dependencies: SBV, VOB;

s130, matching the event triple with an accident factor, and binding the structured data in the event triple with the accident factor, wherein the accident factor represents a mapping relation table of a natural language of an alarm and a digital language of the alarm;

in the embodiment of the present invention, the method may specifically include:

classifying the accident factors according to the alarm condition types;

matching the classified accident factors with the event triples one by one;

Specifically, an accident factor δ is introduced to be matched with the event triplet, and in the embodiment of the present invention, the accident factor is specifically a mapping relationship table between a natural language of the alarm and a digital language of the alarm. And maintaining accident factor data according to the priori knowledge, classifying the accident factor data according to the alarm types, and matching the accident factors with the event triples according to the recorded alarm types. The specific matching process is that the accident factors of the category are matched with the triples of the events one by one. In the embodiment of the invention, the accident factor matching can avoid the problems that the time consumption is too long for directly matching the specific accident type with the warning situation text content, and the text content has no part-of-speech label, so that ambiguity is generated. And binding the structured information of the accident with the accident factor after matching.

The data for which the accident factor δ is of the list type are taken as an example:

the contents of [ open door, push solid line, go backwards, break, escape, roll pedestrian, scrape hit pedestrian, roll over, crash, …, traffic jam ] are maintained artificially. After matching with the triple in the step S6, determining that the accident structured information is bound with the matched accident factor, and expanding the data after binding as follows:

the case time is as follows: "yyyy-MM-dd", location of case: "XXXXXX", illegal action "opening and closing the door to prevent other vehicle illegal and pedestrian traffic", whether to escape: "0", whether there is a scene "0", whether there is an injury "1", the scene traffic situation is "clear", whether the vehicle can move "1", the type of the vehicle involved in the accident "02 car", the type of the accident "vehicle collides with non-motor vehicle", and the accident factor "door open" …

And repeating the processing steps on the historical data until all the data are effectively bound with the accident factor.

S140, mining association rules of the multiple pieces of structured data bound by each accident factor according to an association rule mining algorithm to obtain a set with the association rules;

in the embodiment of the present invention, the method specifically includes:

It should be understood that the association rules are established class by class, and each incident factor corresponds to multiple pieces of bound structured data to form an item set D (at which time the unstructured, semi-structured conversion of the data to structured is completed).

And mining the association rules of the multiple pieces of bound structured data corresponding to each accident factor by constructing an association rule mining algorithm (specifically, an Apriori algorithm can be adopted). Firstly, the items of the item set D are designed differently according to different accident factors and accident types, for example, the items of the alarm condition item sets of the traffic accident class and the criminal security class are not designed to be the same, and the item set D of the traffic accident class is not suitable for the same_transThe following fields may be designed but are not limited to: the time of a case, the place of the case, illegal behaviors, whether to escape, whether to have a scene, whether to be injured, the situation of passing on the scene, whether the vehicle can move, the type of the vehicle involved in an accident, the type of personnel, the type of the accident, the accident factor and the like. D_trans＝{t₁，t₂，t₃，...，t_k，t_nWhere k is the number of tasks in the set of items of that type, and k is Num (D)_trans). A certain task t_kActually corresponding to the alarm condition record after structuring a certain data, t_kM in (1)_jRepresents D_transAll item sets in, t_k＝{m₁，m₂，m₃，...，m_j，}(j＝1，2，3，...，l)。

Further specifically, traversing the item set according to a preset minimum support threshold to obtain a frequent item set, including:

setting a minimum support threshold;

In the embodiment of the present invention, it is,

the support rate calculation formula is as follows:

wherein the content of the first and second substances,

the minimum support threshold

The value range of (A) is 25 to 35 percent.

And sequentially searching by using the frequent item sets obtained in the previous time until all the frequent item sets are obtained.

Further specifically, the traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold to obtain a set with association rules includes:

setting a minimum confidence threshold;

In the embodiment of the present invention, the confidence coefficient calculation formula is:

indicates that the reason was concluded

The degree of confidence of (a) is,

indicates that the reason was concluded

The rate of support of (a) is,

presentation reason m_aThe support ratio of (a);

the minimum confidence threshold value ranges from 70% to 75%.

Specifically, item set D_transComprising m_jThe number of tasks of is the item set m_jThe number of supports of (2) is denoted as Num (m)_j) Then m is_jThe support ratio of (1) is support number/task number 100%, namely:

setting a minimum support threshold, i.e.

It should be noted here that the minimum support threshold may be preset in advance, for example, the minimum support threshold may be set between 25% and 35%. If it is calculated

Not less than

Then m will be at this point_jIs recorded as a frequent item set.

With respect to item set D_transThe pattern description (for ease of analysis, where time is further discretized into a pattern period) is shown in detail in fig. 2.

For item set D_transWherein m is recorded_a、m_bRespectively indicate the cause and the conclusion, and

then

The support rate of (m) is_a∩m_bProbability P (m)_a∩m_b) I.e. by

The concept of confidence in the Apriori algorithm is to describe the cause m_aTo conclude m_bThe degree of confidence of the image data obtained,

the confidence level of may be at D_transThe middle task comprises m_aAlso includes m_bThe conditional probability of (c), i.e.:

setting a minimum confidence threshold, i.e.

Likewise, the minimum confidence threshold may also be preset in advance, for example, the minimum confidence threshold may be set between 70% and 75%.

Traversing a set of data items D_transFinding satisfaction of the calculation condition

Then using the frequent 1-item set to search the frequent 2-item set until all the frequent k-item sets are found, and passing the minimum confidence coefficient on the non-empty subset of the final frequent item set

And screening again to obtain a final association rule set.

For example,

TABLE 1 search results for frequent 1-item set

1-item	Support
		M1	35
M2	26
		M3	48
M4	21
		M5	26
M6	27
		M7	29
M8	41

TABLE 2 search results for frequent 2-item set

2-item	Support
		M1，M2	15
M1，M3	13
		M1，M4	26
M1，M5	12
		M1，M6	11
......	......

S150, processing the set with the association rule to obtain an alarm condition analysis result.

The method specifically comprises the following steps:

It should be understood that the set with the association rule is subjected to attribute restoration processing, which mainly obtains the expression of the digital language after the accident factor matching is performed, and here, the attribute restoration is performed to restore the expression to the natural language.

And matching the restored content with an evaluation factor after attribute restoration, wherein the evaluation factor is a mapping relation table which comprises a mapping relation table of an alarm analysis conclusion and a conclusion evaluation. Therefore, after the content subjected to attribute restoration is matched with the evaluation factor, an alarm condition analysis conclusion and corresponding conclusion evaluation can be obtained.

For example, the data type of the evaluation factor γ is map, the key records the conclusion, and the corresponding value is the evaluation corresponding to the conclusion, such as: reason m_a: no scene, vehicle escape ═ conclusion m_b: some Key value of map of a road section (the attribute of the location is elementary school) near the XXX elementary school of the location of the case is as follows: location attribute of case-Primary school nearby road segment (23_ XXXXX primary school, wherein 23 denotes location attribute value is primary school nearby road segment), the value corresponding to the key is "strengthen school perimeter supervision".

In summary, according to the alarm analysis method based on natural language processing and association rules provided by the embodiment of the invention, association analysis is established between the main events of the alarm text and historical alarm data through the processing tool based on natural language and the association rule analysis method, event triple extraction can be performed on unstructured text information, and association rules are established for different accident incentive types by combining a large amount of historical data, so that the alarm analysis capability of the alarm receiving and processing system is improved, and accident cause investigation and related behavior improvement actions can be performed more specifically.

It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims

1. An alarm condition analysis method based on natural language processing and association rules is characterized by comprising the following steps:

acquiring original data of an alarm receiving and handling service;

and processing the set with the association rule to obtain an alarm condition analysis result.

2. The alarm condition analyzing method based on natural language processing and association rules as claimed in claim 1, wherein the raw data of the alarm receiving and processing service comprises: structured data and unstructured data, the structured data including an alert ticket number, a data source, an alarm receiver, a jurisdiction, an alert type, an alert time, a treatment result flag, a feedback person, a feedback department, a feedback time, an alert reverse check flag, and an alert check flag; the unstructured data includes alarm content and feedback content.

3. A method for analyzing an alarm situation based on natural language processing and association rules according to claim 2, wherein the processing the raw data of the alarm receiving and processing service by a natural language based tool to obtain event triples comprises:

4. A method for alarm analysis based on natural language processing and association rules according to claim 1, wherein the matching the event triples with accident factors and the binding of the structured data in the event triples with the accident factors comprises:

classifying the accident factors according to the alarm condition types;

matching the classified accident factors with the event triples one by one;

5. A method for analyzing an alarm situation based on natural language processing and association rules according to claim 1, wherein the mining association rules for the plurality of pieces of structured data bound by each accident factor according to an association rule mining algorithm to obtain a set with association rules comprises:

6. A method for analyzing a warning situation based on natural language processing and association rules according to claim 5, wherein traversing the set of items according to a preset minimum support threshold to obtain a frequent set of items comprises:

setting a minimum support threshold;

7. The alarm analysis method based on natural language processing and association rules of claim 6, wherein the support rate calculation formula is:

wherein the content of the first and second substances,

the minimum support threshold

The value range of (A) is 25 to 35 percent.

8. The alarm analysis method based on natural language processing and association rules according to claim 5, wherein traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold to obtain a set with association rules comprises:

setting a minimum confidence threshold;

9. A method for alarm analysis based on natural language processing and association rules according to claim 8, wherein the confidence score is calculated by the formula:

indicates that the reason was concluded

The degree of confidence of (a) is,

indicates that the reason was concluded

The rate of support of (a) is,

presentation reason m_aThe support ratio of (a);

the minimum confidence threshold value ranges from 70% to 75%.

10. The alarm analysis method based on natural language processing and association rules according to claim 1, wherein the processing the set with association rules to obtain the alarm analysis result comprises: