CN114003683A - Alarm condition analysis method based on natural language processing and association rule - Google Patents

Alarm condition analysis method based on natural language processing and association rule Download PDF

Info

Publication number
CN114003683A
CN114003683A CN202111303071.5A CN202111303071A CN114003683A CN 114003683 A CN114003683 A CN 114003683A CN 202111303071 A CN202111303071 A CN 202111303071A CN 114003683 A CN114003683 A CN 114003683A
Authority
CN
China
Prior art keywords
alarm
natural language
association rules
data
processing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111303071.5A
Other languages
Chinese (zh)
Inventor
黄淑兵
张亚洲
蔡岗
缪新顿
陆杨
朱键
陆伟佳
张长辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Traffic Management Research Institute of Ministry of Public Security
Original Assignee
Traffic Management Research Institute of Ministry of Public Security
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Traffic Management Research Institute of Ministry of Public Security filed Critical Traffic Management Research Institute of Ministry of Public Security
Priority to CN202111303071.5A priority Critical patent/CN114003683A/en
Publication of CN114003683A publication Critical patent/CN114003683A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/31Indexing; Data structures therefor; Storage structures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis

Abstract

The invention relates to the technical field of alarm condition analysis, and particularly discloses an alarm condition analysis method based on natural language processing and association rules, wherein the alarm condition analysis method comprises the following steps: acquiring original data of an alarm receiving and handling service; processing the original data of the alarm handling service by a tool based on natural language to obtain event triples; matching the event triple with an accident factor, and binding the structured data in the event triple with the accident factor, wherein the accident factor represents a mapping relation table of a natural language of the alarm and a digital language of the alarm; mining the association rules of the multiple pieces of structured data bound by each accident factor according to an association rule mining algorithm to obtain a set with the association rules; and processing the set with the association rule to obtain an alarm condition analysis result. The alarm condition analysis method based on the natural language processing and association rules can effectively utilize historical alarm condition data to analyze the alarm condition.

Description

Alarm condition analysis method based on natural language processing and association rule
Technical Field
The invention relates to the technical field of alarm condition analysis, in particular to an alarm condition analysis method based on natural language processing and association rules.
Background
At present, a large amount of unstructured alarm condition text data are often generated in an alarm receiving and processing service, and in the face of the analysis work requirement of alarm condition contents, the existing conventional processing and analysis methods such as database query are difficult to dig out the association relationship which is helpful for alarm condition analysis and judgment from a large amount of alarm condition information, and the historical alarm condition data cannot be effectively utilized to construct an analysis early warning model.
Disclosure of Invention
The invention provides a warning condition analysis method based on natural language processing and association rules, which solves the problem that the historical warning condition data cannot be effectively utilized to carry out warning condition analysis and early warning in the related technology.
As an aspect of the present invention, there is provided an alert analysis method based on natural language processing and association rules, including:
acquiring original data of an alarm receiving and handling service;
processing the original data of the alarm receiving and processing service by a tool based on natural language to obtain event triples;
matching the event triple with an accident factor, and binding the structured data in the event triple with the accident factor, wherein the accident factor represents a mapping relation table of a natural language of an alarm and a digital language of the alarm;
mining the association rules of the multiple pieces of structured data bound by each accident factor according to an association rule mining algorithm to obtain a set with the association rules;
and processing the frequent item set to obtain an alarm condition analysis result.
Further, the raw data of the alarm receiving and processing service comprises: structured data and unstructured data, the structured data including an alert ticket number, a data source, an alarm receiver, a jurisdiction, an alert type, an alert time, a treatment result flag, a feedback person, a feedback department, a feedback time, an alert reverse check flag, and an alert check flag; the unstructured data includes alarm content and feedback content.
Further, the processing the raw data of the alarm receiving and processing service by a tool based on natural language to obtain an event triple includes:
carrying out reduction processing on the structured data and carrying out data cleaning on the unstructured data;
and performing word segmentation, part-of-speech tagging, syntactic structure description and semantic dependency analysis on the unstructured data after data cleaning according to a tool based on natural voice, and constructing an event triple.
Further, the matching the event triplet with the accident factor and binding the structured data in the event triplet with the accident factor includes:
classifying the accident factors according to the alarm condition types;
matching the classified accident factors with the event triples one by one;
binding the structured data in the matched event triples with the accident factors;
and repeating the steps until all the structured data in the matched event triples are bound with the accident factor.
Further, the mining of association rules for the multiple pieces of structured data bound to each accident factor according to an association rule mining algorithm to obtain a set with association rules includes:
establishing a set of items to be mined for the plurality of pieces of structured data bound by each accident factor, wherein the set of items represents a set of the plurality of pieces of structured data bound by each accident factor;
traversing the item set according to a preset minimum support threshold to obtain a frequent item set;
and traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold value to obtain a set with association rules.
Further, the traversing the item set according to a preset minimum support threshold to obtain a frequent item set includes:
setting a minimum support threshold;
calculating the support rate of the item set in the item set according to a support rate calculation formula;
and traversing the item set, and if the support rate of the item set of the current item set is not less than the minimum support rate, marking the item set as a frequent item set.
Further, the support rate calculation formula is as follows:
Figure BDA0003337770610000021
wherein the content of the first and second substances,
Figure BDA0003337770610000022
representing a collection m of itemsjSupport ratio of (1), Num (m)j) Representing a set m of items of structured data DjNum (D) represents the number of tasks of the structured data D;
the minimum support threshold
Figure BDA0003337770610000029
The value range of (A) is 25 to 35 percent.
Further, the traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold to obtain a set with association rules includes:
setting a minimum confidence threshold;
calculating a confidence of a set of items within the set of items according to a confidence calculation formula;
traversing the non-empty subset of the frequent item set, and if the confidence of the item set of the non-empty subset of the current frequent item set is not less than the minimum confidence threshold, marking the item set as a set with association rules.
Further, the confidence calculation formula is:
Figure BDA0003337770610000023
wherein m isaDenotes the cause, m, in the structured data DbRepresents the conclusion in the structured data D,
Figure BDA0003337770610000024
indicates that the reason was concluded
Figure BDA0003337770610000025
The degree of confidence of (a) is,
Figure BDA0003337770610000026
indicates that the reason was concluded
Figure BDA0003337770610000027
The rate of support of (a) is,
Figure BDA0003337770610000028
presentation reason maThe support ratio of (a);
the minimum confidence threshold value ranges from 70% to 75%.
Further, the processing the set with the association rule to obtain an alarm analysis result includes:
performing attribute restoration processing on the set with the association rule;
and matching the content subjected to attribute restoration processing with an evaluation factor to obtain an alarm analysis conclusion and conclusion evaluation, wherein the evaluation factor represents a mapping relation table of the alarm analysis conclusion and the conclusion evaluation.
According to the alarm condition analysis method based on natural language processing and association rules, association analysis is established between the main events of the alarm condition text and historical alarm condition data through a processing tool based on natural language and an association rule analysis method, event triple extraction can be carried out on unstructured text information, and association rules are established for different accident incentive types by combining a large amount of historical data, so that the alarm condition analysis capability of an alarm receiving and processing system is improved, and accident reason investigation and related behavior improvement actions can be carried out more pertinently.
Drawings
The accompanying drawings, which are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate embodiments of the invention and together with the description serve to explain the principles of the invention and not to limit the invention. In the drawings:
fig. 1 is a flowchart of an alarm analysis method based on natural language processing and association rules according to the present invention.
Fig. 2 is a schematic diagram illustrating a traffic accident category item set style description provided by the present invention.
Detailed Description
It should be noted that the embodiments and features of the embodiments may be combined with each other without conflict. The present invention will be described in detail below with reference to the embodiments with reference to the attached drawings.
In order to make those skilled in the art better understand the technical solution of the present invention, the technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
It should be noted that the terms "first," "second," and the like in the description and claims of the present invention and in the drawings described above are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged under appropriate circumstances in order to facilitate the description of the embodiments of the invention herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or apparatus that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
In this embodiment, an alarm analysis method based on natural language processing and association rules is provided, and fig. 1 is a flowchart of an alarm analysis method based on natural language processing and association rules according to an embodiment of the present invention, as shown in fig. 1, including:
s110, acquiring original data of an alarm receiving and processing service;
in the embodiment of the present invention, the acquired raw data of the alarm receiving and processing service is acquired, and in order to establish an association rule model, unstructured data and structured data need to be separated, where the raw data of the alarm receiving and processing service includes: structured data and unstructured data, the structured data including an alert ticket number, a data source, an alarm receiver, a jurisdiction, an alert type, an alert time, a treatment result flag, a feedback person, a feedback department, a feedback time, an alert reverse check flag, and an alert check flag; the unstructured data includes alarm content and feedback content.
For example, the alarm content: "alarm person A receives alarm, alarm person (telephone: 13XXXXXXXXX) name: ' My vehicle (SuBXXXXX) is parked at the roadside near the north door of XX district of great path of Longhu lake, children sitting in the rear row in the vehicle suddenly open the vehicle door, because of the slippery rainy road, a human tricycle coming from behind can not be braked to collide with the vehicle door, the person on the tricycle falls down to the ground to be injured, and the person is sent to a hospital for treatment. ' two accident parties negotiate in the hospital at present, and the opinions of the two parties are different, so that an alarm needs to be given. "
And (3) feedback content: "Accident policemen B, site feedback, has negotiated treatment. "
The serial number of the warning notice sheet: "001100011011"
The police receiver: "alarm receiver a", jurisdiction: "XXXXX"
The type of the alert: vehicle and non-motor vehicle "
Alarm time: "yyyy-MM-dd", treatment result flag: "1" …
S120, processing the original data of the alarm receiving and processing service through a tool based on natural language to obtain event triples;
in the embodiment of the present invention, the method specifically includes:
carrying out reduction processing on the structured data and carrying out data cleaning on the unstructured data;
and performing word segmentation, part-of-speech tagging, syntactic structure description and semantic dependency analysis on the unstructured data after data cleaning according to a tool based on natural voice, and constructing an event triple.
When the structured data is reduced, for example, an alarm condition reverse check mark, a treatment result mark, an alarm condition verification mark, and the like are set as boolean attributes, an alarm condition type, a data source, and the like are set as numerical attributes, and the content corresponding to a specific numerical value belongs to prior knowledge and is continuously maintained for the data.
During data cleaning work on unstructured data, invalid characters are deleted, text information which is not related to modeling and is automatically overlapped by a system is automatically overlapped, such as alarm names, alarm receiver names, law enforcement requirements, law enforcement equipment information and the like which are overlapped by the system before description of alarm content.
When the unstructured data is segmented and part-of-speech labeled, for example, the alarm content is segmented into word sequences and part-of-speech of each time is recognized, that is, verbs, nouns, pronouns, adverbs and the like are recognized.
In the syntax structure description of the unstructured data, the dependency relationship between parts of speech, namely the group dominance relationship, the moving guest relationship, the intervening guest relationship, the parallel relationship, the inter-guest relationship and the like, is identified in the sentence.
Performing semantic dependency analysis on the unstructured data, and constructing a triple of an event, namely a subject predicate object of the event; it is worth noting that the natural language based processing process performed on the textual data may be, but is not limited to, using an open source natural language processing based toolkit.
The triples for screening the relationships such as the SBV major-minor relationship, the VOB moving object and the like are as follows:
(child ', ' open ', ' door ') dependencies: SBV, VOB;
('Tricycle', 'Collision', 'door') dependencies: SBV, VOB;
(dependence of 'Tricycle', 'Fall', 'on ground'): SBV, POB;
('I', 'Send', 'Hospital rescue') dependencies: SVB, POB, VOB;
(both opinion ', ' present ', ' diverge ') dependencies: SBV, VOB;
s130, matching the event triple with an accident factor, and binding the structured data in the event triple with the accident factor, wherein the accident factor represents a mapping relation table of a natural language of an alarm and a digital language of the alarm;
in the embodiment of the present invention, the method may specifically include:
classifying the accident factors according to the alarm condition types;
matching the classified accident factors with the event triples one by one;
binding the structured data in the matched event triples with the accident factors;
and repeating the steps until all the structured data in the matched event triples are bound with the accident factor.
Specifically, an accident factor δ is introduced to be matched with the event triplet, and in the embodiment of the present invention, the accident factor is specifically a mapping relationship table between a natural language of the alarm and a digital language of the alarm. And maintaining accident factor data according to the priori knowledge, classifying the accident factor data according to the alarm types, and matching the accident factors with the event triples according to the recorded alarm types. The specific matching process is that the accident factors of the category are matched with the triples of the events one by one. In the embodiment of the invention, the accident factor matching can avoid the problems that the time consumption is too long for directly matching the specific accident type with the warning situation text content, and the text content has no part-of-speech label, so that ambiguity is generated. And binding the structured information of the accident with the accident factor after matching.
The data for which the accident factor δ is of the list type are taken as an example:
the contents of [ open door, push solid line, go backwards, break, escape, roll pedestrian, scrape hit pedestrian, roll over, crash, …, traffic jam ] are maintained artificially. After matching with the triple in the step S6, determining that the accident structured information is bound with the matched accident factor, and expanding the data after binding as follows:
the case time is as follows: "yyyy-MM-dd", location of case: "XXXXXX", illegal action "opening and closing the door to prevent other vehicle illegal and pedestrian traffic", whether to escape: "0", whether there is a scene "0", whether there is an injury "1", the scene traffic situation is "clear", whether the vehicle can move "1", the type of the vehicle involved in the accident "02 car", the type of the accident "vehicle collides with non-motor vehicle", and the accident factor "door open" …
And repeating the processing steps on the historical data until all the data are effectively bound with the accident factor.
S140, mining association rules of the multiple pieces of structured data bound by each accident factor according to an association rule mining algorithm to obtain a set with the association rules;
in the embodiment of the present invention, the method specifically includes:
establishing a set of items to be mined for the plurality of pieces of structured data bound by each accident factor, wherein the set of items represents a set of the plurality of pieces of structured data bound by each accident factor;
traversing the item set according to a preset minimum support threshold to obtain a frequent item set;
and traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold value to obtain a set with association rules.
It should be understood that the association rules are established class by class, and each incident factor corresponds to multiple pieces of bound structured data to form an item set D (at which time the unstructured, semi-structured conversion of the data to structured is completed).
And mining the association rules of the multiple pieces of bound structured data corresponding to each accident factor by constructing an association rule mining algorithm (specifically, an Apriori algorithm can be adopted). Firstly, the items of the item set D are designed differently according to different accident factors and accident types, for example, the items of the alarm condition item sets of the traffic accident class and the criminal security class are not designed to be the same, and the item set D of the traffic accident class is not suitable for the sametransThe following fields may be designed but are not limited to: the time of a case, the place of the case, illegal behaviors, whether to escape, whether to have a scene, whether to be injured, the situation of passing on the scene, whether the vehicle can move, the type of the vehicle involved in an accident, the type of personnel, the type of the accident, the accident factor and the like. Dtrans={t1,t2,t3,...,tk,tnWhere k is the number of tasks in the set of items of that type, and k is Num (D)trans). A certain task tkActually corresponding to the alarm condition record after structuring a certain data, tkM in (1)jRepresents DtransAll item sets in, tk={m1,m2,m3,...,mj,}(j=1,2,3,...,l)。
Further specifically, traversing the item set according to a preset minimum support threshold to obtain a frequent item set, including:
setting a minimum support threshold;
calculating the support rate of the item set in the item set according to a support rate calculation formula;
and traversing the item set, and if the support rate of the item set of the current item set is not less than the minimum support rate, marking the item set as a frequent item set.
In the embodiment of the present invention, it is,
the support rate calculation formula is as follows:
Figure BDA0003337770610000061
wherein the content of the first and second substances,
Figure BDA0003337770610000062
representing a collection m of itemsjSupport ratio of (1), Num (m)j) Representing a set m of items of structured data DjNum (D) represents the number of tasks of the structured data D;
the minimum support threshold
Figure BDA00033377706100000613
The value range of (A) is 25 to 35 percent.
And sequentially searching by using the frequent item sets obtained in the previous time until all the frequent item sets are obtained.
Further specifically, the traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold to obtain a set with association rules includes:
setting a minimum confidence threshold;
calculating a confidence of a set of items within the set of items according to a confidence calculation formula;
traversing the non-empty subset of the frequent item set, and if the confidence of the item set of the non-empty subset of the current frequent item set is not less than the minimum confidence threshold, marking the item set as a set with association rules.
In the embodiment of the present invention, the confidence coefficient calculation formula is:
Figure BDA0003337770610000063
wherein m isaDenotes the cause, m, in the structured data DbRepresents the conclusion in the structured data D,
Figure BDA0003337770610000064
indicates that the reason was concluded
Figure BDA0003337770610000065
The degree of confidence of (a) is,
Figure BDA0003337770610000066
indicates that the reason was concluded
Figure BDA0003337770610000067
The rate of support of (a) is,
Figure BDA0003337770610000068
presentation reason maThe support ratio of (a);
the minimum confidence threshold value ranges from 70% to 75%.
Specifically, item set DtransComprising mjThe number of tasks of is the item set mjThe number of supports of (2) is denoted as Num (m)j) Then m isjThe support ratio of (1) is support number/task number 100%, namely:
Figure BDA0003337770610000069
setting a minimum support threshold, i.e.
Figure BDA00033377706100000610
It should be noted here that the minimum support threshold may be preset in advance, for example, the minimum support threshold may be set between 25% and 35%. If it is calculated
Figure BDA00033377706100000611
Not less than
Figure BDA00033377706100000612
Then m will be at this pointjIs recorded as a frequent item set.
With respect to item set DtransThe pattern description (for ease of analysis, where time is further discretized into a pattern period) is shown in detail in fig. 2.
For item set DtransWherein m is recordeda、mbRespectively indicate the cause and the conclusion, and
Figure BDA0003337770610000071
Figure BDA0003337770610000072
Figure BDA0003337770610000073
then
Figure BDA0003337770610000074
The support rate of (m) isa∩mbProbability P (m)a∩mb) I.e. by
Figure BDA0003337770610000075
The concept of confidence in the Apriori algorithm is to describe the cause maTo conclude mbThe degree of confidence of the image data obtained,
Figure BDA0003337770610000076
the confidence level of may be at DtransThe middle task comprises maAlso includes mbThe conditional probability of (c), i.e.:
Figure BDA0003337770610000077
setting a minimum confidence threshold, i.e.
Figure BDA0003337770610000078
Likewise, the minimum confidence threshold may also be preset in advance, for example, the minimum confidence threshold may be set between 70% and 75%.
Traversing a set of data items DtransFinding satisfaction of the calculation condition
Figure BDA0003337770610000079
Then using the frequent 1-item set to search the frequent 2-item set until all the frequent k-item sets are found, and passing the minimum confidence coefficient on the non-empty subset of the final frequent item set
Figure BDA00033377706100000710
And screening again to obtain a final association rule set.
For example,
TABLE 1 search results for frequent 1-item set
1-item Support
M1 35
M2 26
M3 48
M4 21
M5 26
M6 27
M7 29
M8 41
TABLE 2 search results for frequent 2-item set
2-item Support
M1,M2 15
M1,M3 13
M1,M4 26
M1,M5 12
M1,M6 11
...... ......
S150, processing the set with the association rule to obtain an alarm condition analysis result.
The method specifically comprises the following steps:
performing attribute restoration processing on the set with the association rule;
and matching the content subjected to attribute restoration processing with an evaluation factor to obtain an alarm analysis conclusion and conclusion evaluation, wherein the evaluation factor represents a mapping relation table of the alarm analysis conclusion and the conclusion evaluation.
It should be understood that the set with the association rule is subjected to attribute restoration processing, which mainly obtains the expression of the digital language after the accident factor matching is performed, and here, the attribute restoration is performed to restore the expression to the natural language.
And matching the restored content with an evaluation factor after attribute restoration, wherein the evaluation factor is a mapping relation table which comprises a mapping relation table of an alarm analysis conclusion and a conclusion evaluation. Therefore, after the content subjected to attribute restoration is matched with the evaluation factor, an alarm condition analysis conclusion and corresponding conclusion evaluation can be obtained.
For example, the data type of the evaluation factor γ is map, the key records the conclusion, and the corresponding value is the evaluation corresponding to the conclusion, such as: reason ma: no scene, vehicle escape ═ conclusion mb: some Key value of map of a road section (the attribute of the location is elementary school) near the XXX elementary school of the location of the case is as follows: location attribute of case-Primary school nearby road segment (23_ XXXXX primary school, wherein 23 denotes location attribute value is primary school nearby road segment), the value corresponding to the key is "strengthen school perimeter supervision".
In summary, according to the alarm analysis method based on natural language processing and association rules provided by the embodiment of the invention, association analysis is established between the main events of the alarm text and historical alarm data through the processing tool based on natural language and the association rule analysis method, event triple extraction can be performed on unstructured text information, and association rules are established for different accident incentive types by combining a large amount of historical data, so that the alarm analysis capability of the alarm receiving and processing system is improved, and accident cause investigation and related behavior improvement actions can be performed more specifically.
It will be understood that the above embodiments are merely exemplary embodiments taken to illustrate the principles of the present invention, which is not limited thereto. It will be apparent to those skilled in the art that various modifications and improvements can be made without departing from the spirit and substance of the invention, and these modifications and improvements are also considered to be within the scope of the invention.

Claims (10)

1. An alarm condition analysis method based on natural language processing and association rules is characterized by comprising the following steps:
acquiring original data of an alarm receiving and handling service;
processing the original data of the alarm receiving and processing service by a tool based on natural language to obtain event triples;
matching the event triple with an accident factor, and binding the structured data in the event triple with the accident factor, wherein the accident factor represents a mapping relation table of a natural language of an alarm and a digital language of the alarm;
mining the association rules of the multiple pieces of structured data bound by each accident factor according to an association rule mining algorithm to obtain a set with the association rules;
and processing the set with the association rule to obtain an alarm condition analysis result.
2. The alarm condition analyzing method based on natural language processing and association rules as claimed in claim 1, wherein the raw data of the alarm receiving and processing service comprises: structured data and unstructured data, the structured data including an alert ticket number, a data source, an alarm receiver, a jurisdiction, an alert type, an alert time, a treatment result flag, a feedback person, a feedback department, a feedback time, an alert reverse check flag, and an alert check flag; the unstructured data includes alarm content and feedback content.
3. A method for analyzing an alarm situation based on natural language processing and association rules according to claim 2, wherein the processing the raw data of the alarm receiving and processing service by a natural language based tool to obtain event triples comprises:
carrying out reduction processing on the structured data and carrying out data cleaning on the unstructured data;
and performing word segmentation, part-of-speech tagging, syntactic structure description and semantic dependency analysis on the unstructured data after data cleaning according to a tool based on natural voice, and constructing an event triple.
4. A method for alarm analysis based on natural language processing and association rules according to claim 1, wherein the matching the event triples with accident factors and the binding of the structured data in the event triples with the accident factors comprises:
classifying the accident factors according to the alarm condition types;
matching the classified accident factors with the event triples one by one;
binding the structured data in the matched event triples with the accident factors;
and repeating the steps until all the structured data in the matched event triples are bound with the accident factor.
5. A method for analyzing an alarm situation based on natural language processing and association rules according to claim 1, wherein the mining association rules for the plurality of pieces of structured data bound by each accident factor according to an association rule mining algorithm to obtain a set with association rules comprises:
establishing a set of items to be mined for the plurality of pieces of structured data bound by each accident factor, wherein the set of items represents a set of the plurality of pieces of structured data bound by each accident factor;
traversing the item set according to a preset minimum support threshold to obtain a frequent item set;
and traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold value to obtain a set with association rules.
6. A method for analyzing a warning situation based on natural language processing and association rules according to claim 5, wherein traversing the set of items according to a preset minimum support threshold to obtain a frequent set of items comprises:
setting a minimum support threshold;
calculating the support rate of the item set in the item set according to a support rate calculation formula;
and traversing the item set, and if the support rate of the item set of the current item set is not less than the minimum support rate, marking the item set as a frequent item set.
7. The alarm analysis method based on natural language processing and association rules of claim 6, wherein the support rate calculation formula is:
Figure FDA0003337770600000021
wherein the content of the first and second substances,
Figure FDA0003337770600000022
representing a collection m of itemsjSupport ratio of (1), Num (m)j) Representing a set m of items of structured data DjNum (D) represents the number of tasks of the structured data D;
the minimum support threshold
Figure FDA0003337770600000023
The value range of (A) is 25 to 35 percent.
8. The alarm analysis method based on natural language processing and association rules according to claim 5, wherein traversing the non-empty subset of the frequent item set according to a preset minimum confidence threshold to obtain a set with association rules comprises:
setting a minimum confidence threshold;
calculating a confidence of a set of items within the set of items according to a confidence calculation formula;
traversing the non-empty subset of the frequent item set, and if the confidence of the item set of the non-empty subset of the current frequent item set is not less than the minimum confidence threshold, marking the item set as a set with association rules.
9. A method for alarm analysis based on natural language processing and association rules according to claim 8, wherein the confidence score is calculated by the formula:
Figure FDA0003337770600000024
wherein m isaDenotes the cause, m, in the structured data DbRepresents the conclusion in the structured data D,
Figure FDA0003337770600000025
indicates that the reason was concluded
Figure FDA0003337770600000026
The degree of confidence of (a) is,
Figure FDA0003337770600000027
indicates that the reason was concluded
Figure FDA0003337770600000028
The rate of support of (a) is,
Figure FDA0003337770600000029
presentation reason maThe support ratio of (a);
the minimum confidence threshold value ranges from 70% to 75%.
10. The alarm analysis method based on natural language processing and association rules according to claim 1, wherein the processing the set with association rules to obtain the alarm analysis result comprises:
performing attribute restoration processing on the set with the association rule;
and matching the content subjected to attribute restoration processing with an evaluation factor to obtain an alarm analysis conclusion and conclusion evaluation, wherein the evaluation factor represents a mapping relation table of the alarm analysis conclusion and the conclusion evaluation.
CN202111303071.5A 2021-11-04 2021-11-04 Alarm condition analysis method based on natural language processing and association rule Pending CN114003683A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111303071.5A CN114003683A (en) 2021-11-04 2021-11-04 Alarm condition analysis method based on natural language processing and association rule

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111303071.5A CN114003683A (en) 2021-11-04 2021-11-04 Alarm condition analysis method based on natural language processing and association rule

Publications (1)

Publication Number Publication Date
CN114003683A true CN114003683A (en) 2022-02-01

Family

ID=79927679

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111303071.5A Pending CN114003683A (en) 2021-11-04 2021-11-04 Alarm condition analysis method based on natural language processing and association rule

Country Status (1)

Country Link
CN (1) CN114003683A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821286A (en) * 2023-08-23 2023-09-29 北京宝隆泓瑞科技有限公司 Correlation rule analysis method and system for gas pipeline accidents

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116821286A (en) * 2023-08-23 2023-09-29 北京宝隆泓瑞科技有限公司 Correlation rule analysis method and system for gas pipeline accidents

Similar Documents

Publication Publication Date Title
Amado et al. Criteria-Based Content Analysis (CBCA) reality criteria in adults: A meta-analytic review
Gu et al. From Twitter to detector: Real-time traffic incident detection using social media data
Schulz et al. I see a car crash: Real-time detection of small scale incidents in microblogs
CN102946331B (en) A kind of social networks zombie user detection method and device
Miah et al. Detection of child exploiting chats from a mixed chat dataset as a text classification task
US20170192959A1 (en) Apparatus and method for extracting topics
Li et al. Pedestrian injury severities in pedestrian-vehicle crashes and the partial proportional odds logit model: accounting for age difference
Sujon et al. Social media mining for understanding traffic safety culture in washington state using twitter data
Tanev et al. Monitoring disaster impact: detecting micro-events and eyewitness reports in mainstream and social media.
Curcio et al. The A ustralian S elf‐report D elinquency S cale: A revision
Xu et al. Traffic event detection using twitter data based on association rules
Lampoltshammer et al. Sensing the public's reaction to crime news using the ‘Links Correspondence Method’
CN114003683A (en) Alarm condition analysis method based on natural language processing and association rule
Kim et al. Hit-and-run crashes: Use of rough set analysis with logistic regression to capture critical attributes and determinants
Pour et al. Spatial and temporal distribution of pedestrian crashes in Melbourne metropolitan area
Deshmukh et al. Crime investigation using data mining
Zhu Investigation of vehicle-bicycle hit-and-run crashes
Kabbani et al. What do riders say and where? The detection and analysis of eyewitness transit tweets
Chamby-Diaz et al. Identifying traffic event types from twitter by multi-label classification
CN112749239B (en) Event map construction method and device and computing equipment
Zhang et al. Automated hazardous action classification using natural language processing and machine-learning techniques
Neuhold et al. Driver's dashboard–using social media data as additional information for motorway operators
Drápal et al. Using large language models to support thematic analysis in empirical legal studies
CN112035726B (en) Trademark registration method and device
Herwanto et al. Traffic condition information extraction from Twitter data

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination