CN114357110A - Abnormal phrase detection method, system and storage medium based on alarm condition information - Google Patents

Abnormal phrase detection method, system and storage medium based on alarm condition information Download PDF

Info

Publication number
CN114357110A
CN114357110A CN202111566538.5A CN202111566538A CN114357110A CN 114357110 A CN114357110 A CN 114357110A CN 202111566538 A CN202111566538 A CN 202111566538A CN 114357110 A CN114357110 A CN 114357110A
Authority
CN
China
Prior art keywords
frequency
word
phrase
abnormal
information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111566538.5A
Other languages
Chinese (zh)
Inventor
杨博
杜渂
何之栋
梁铮
王聚全
索涛
邱祥平
雷霆
彭明喜
陈健
周赵云
刘琦
郑佳
李帅帅
穆青
侯俊丞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Ds Information Technology Co ltd
Original Assignee
Ds Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Ds Information Technology Co ltd filed Critical Ds Information Technology Co ltd
Priority to CN202111566538.5A priority Critical patent/CN114357110A/en
Publication of CN114357110A publication Critical patent/CN114357110A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method, a system and a storage medium for detecting abnormal phrases based on alarm information, wherein the method comprises the following steps: acquiring warning information and performing word segmentation processing on the warning information to obtain a plurality of high-frequency phrases; receiving a statistical period, and performing time sequence data aggregation on each high-frequency phrase according to the statistical period to obtain time sequence information corresponding to each high-frequency phrase; and respectively carrying out discrete analysis on the time sequence information corresponding to each high-frequency phrase, and taking the outlier statistical period in the time sequence information corresponding to each high-frequency phrase as an abnormal period to obtain the abnormal phrase with the abnormal period. The method and the device can detect the abnormality of each high-frequency phrase in the warning information in different time dimensions, and are convenient for assisting in warning research and judgment and decision.

Description

Abnormal phrase detection method, system and storage medium based on alarm condition information
Technical Field
The invention relates to the technical field of alarm receiving data processing, in particular to an abnormal phrase detection method and system based on alarm condition information.
Background
The method has the advantages that a large amount of long text description information is generated at every moment in occasions such as receiving an alarm, dealing with the alarm, feeding back and the like in the police work, the text information is complex in structure and various in content, and if high-value information contained in the text information can be accurately extracted and utilized in the analysis, research and judgment and decision of the alarm situation, the quality and the efficiency of public security management of the police department are greatly improved.
At present, an open source word segmentation device is generally adopted to perform word segmentation on text information generated in an alarm receiving and processing process, and after an alarm keyword is extracted, an extraction result is sent to an alarm receiver and an alarm condition research department for analysis by related personnel. However, the extraction result of this method usually contains too many interfering phrases, for example, most of the alarm receiving and processing messages have keywords such as "alarm", "injury", "bleeding", etc., which have high frequencies in most of the cycle time periods, and have no positive effect on alarm analysis, and only serve as interfering words to affect the analysis result, the high-frequency words in the alarm receiving and processing text information are obtained singly, and are not compared with the previous same cycle time period, and the abnormality of each phrase in different time dimensions cannot be detected, and the reference value provided for the alarm information analysis is not large.
In order to solve the technical problems that the existing warning information processing process cannot detect the abnormity of phrases in different time dimensions, and the reference value provided for warning information analysis is not high, an abnormal phrase detection method based on warning information is needed at present, the abnormity of phrases appearing in the warning information is monitored in the time dimension, and the auxiliary warning research, judgment and decision are facilitated.
Disclosure of Invention
In order to solve the technical problems that the existing warning information processing process cannot detect the abnormal phrases in different time dimensions, and the reference value provided for warning information analysis is not high, the invention provides a warning information-based abnormal phrase detection method, a warning information-based abnormal phrase detection system and a storage medium, and the specific technical scheme is as follows:
the invention provides an abnormal phrase detection method based on alarm condition information, which comprises the following steps:
acquiring warning information and performing word segmentation processing on the warning information to obtain a plurality of high-frequency phrases;
receiving a statistical period, and performing time sequence data aggregation on each high-frequency phrase according to the statistical period to obtain time sequence information corresponding to each high-frequency phrase;
and respectively carrying out discrete analysis on the time sequence information corresponding to each high-frequency phrase, and taking the outlier statistical period in the time sequence information corresponding to each high-frequency phrase as an abnormal period to obtain the abnormal phrase with the abnormal period.
The abnormal phrase detection method based on the alarm information can detect the abnormality of the high-frequency phrase in the alarm information in different time dimensions, judge the abnormal phrase and the abnormal period corresponding to the abnormal phrase, facilitate police personnel to carry out data analysis according to the abnormal phrase and the abnormal period corresponding to the abnormal phrase, and improve the reference value of extracting the abnormal phrase from a large amount of alarm information in the practical process.
In some embodiments, after obtaining the time series information corresponding to each high frequency phrase, the method further includes:
extracting real-time alarm information in the alarm information within the current statistical period, and acquiring a plurality of current high-frequency phrases in the real-time alarm information and the time sequence information corresponding to the current high-frequency phrases;
and according to the time sequence information corresponding to all the high-frequency phrases, performing discrete analysis on each current high-frequency phrase and the time sequence information corresponding to the current high-frequency phrase to obtain the current high-frequency phrase with the abnormal period.
The abnormal phrase detection method based on the alarm information can perform the discrete analysis on the real-time alarm information in the current statistical period based on the alarm information of each statistical period in the alarm information, and only judge the abnormal phrases and the time sequence information corresponding to the abnormal phrases in the alarm information in the current statistical period. Repeated discrete analysis on a large amount of alarm information is not needed, the analysis speed is greatly increased, and the applicability of real-time scene analysis in the practice is improved.
In some embodiments, the obtaining of the alert information and the word segmentation processing of the alert information to obtain a plurality of high-frequency phrases specifically includes:
acquiring a police dictionary, a police stop word list and a preset police word segmentation device, and inputting the police dictionary and the police stop word list into the police word segmentation device;
performing word segmentation processing on the warning information through the warning word segmentation device to obtain a plurality of first word segmentation word groups;
importing all the first word segmentation word groups into a big data cluster and a full-text retrieval engine, and carrying out full-quantity high-frequency word statistics to obtain a plurality of second word segmentation word groups;
and taking the second word segmentation word group as the high-frequency word group.
The abnormal phrase detection method based on the warning condition information carries out word segmentation processing through the warning word segmentation device carrying the warning dictionary and the warning word stop table, carries out full-quantity high-frequency word statistics through the big data cluster and the full-text retrieval engine, is suitable for carrying out big data analysis on massive warning condition information to obtain the high-frequency phrases appearing in the warning word segmentation device, facilitates subsequent abnormal detection on the high-frequency phrases in different time dimensions, and reduces the data calculation amount and the processing speed.
In some embodiments, said first set of word segments comprises at least one police data tag, said data tag comprising jurisdictional information, jurisdictional units, and case categories;
after the full high-frequency word statistics is performed to obtain a plurality of second word groups, the method further comprises the following steps:
filtering the plurality of second word segmentation word groups according to the police data labels to obtain a plurality of third word segmentation word groups;
and taking the third word segmentation word group as the high-frequency word group.
The abnormal phrase detection method based on the alarm information filters the participle phrases according to the alarm data labels, further improves the reference value of the participle data by combining with the analysis requirements of users, reduces the data computation amount and improves the computation speed.
In some embodiments, after obtaining the plurality of third term word groups, the method further includes:
presetting a first sequence number threshold;
sequencing all the third word segmentation word groups from large to small according to word frequency number, and integrating to generate a word frequency list;
counting the third word segmentation phrase with the word frequency number sequence number smaller than the first sequence number threshold value in the word frequency list as a fourth word segmentation phrase;
and taking the fourth word-dividing word group as the high-frequency word group.
The abnormal phrase detection method based on the warning information further filters the participle phrases according to the word frequency number, performs abnormal analysis on the participle phrases with large word frequency number values in different time dimensions, further reduces the data amount in the warning data analysis process, improves the reference value of data, and improves the efficiency of performing data analysis by a user according to the filtered high-frequency phrases.
In some embodiments, after obtaining the alert information and performing word segmentation processing on the alert information to obtain a plurality of high frequency word groups, before performing discrete analysis on the time sequence information corresponding to each of the high frequency word groups, the method further includes:
receiving the statistical period and the period interval;
counting the word frequency number of each high-frequency word group in each counting period in the period interval, and taking the word frequency number of each high-frequency word group in each counting period in the period interval as the time sequence information corresponding to the high-frequency word group.
The abnormal phrase detection method based on the warning information disclosed by the invention discloses a specific step of time sequence data aggregation, wherein the time sequence information of each high-frequency word is integrated to generate the time sequence information of the word by counting the word frequency number corresponding to each counting period of each high-frequency word in a period interval, so that the discrete analysis can be conveniently carried out according to the time sequence information of each high-frequency word.
In some embodiments, the discrete analysis of the time series information corresponding to each high-frequency phrase specifically includes:
acquiring a preset outlier detection model;
detecting the dispersion of the word frequency number in each statistical period in the time sequence information corresponding to each high-frequency word group through the outlier detection model;
and judging the statistical period of which the dispersion is greater than a preset dispersion threshold value as the statistical period of the outlier.
The invention provides an abnormal phrase detection method based on alarm information, and discloses a specific scheme for carrying out discrete analysis on time sequence information of a high-frequency phrase.
In some embodiments, after obtaining the abnormal phrase having the abnormal period, the method further includes:
acquiring a preset confidence threshold and a preset second sorting sequence number threshold;
calculating the confidence corresponding to each abnormal phrase according to the confidence threshold;
and sequencing all the abnormal phrases according to the confidence degree from high to low, and outputting the abnormal phrases of which the sequencing sequence numbers are smaller than the second sequencing sequence number threshold value in the abnormal phrases.
According to the abnormal phrase detection method based on the warning information, the confidence of each abnormal phrase is calculated and sorted according to the confidence, and the abnormal phrases are screened according to the confidence, so that the abnormal phrases fed back to the user are more accurate, and the user can conveniently analyze data according to the abnormal phrases and the corresponding abnormal periods of the abnormal phrases.
In some embodiments, according to another aspect of the present invention, the present invention further provides an abnormal phrase detection system based on alert information, including:
the word segmentation processing module is used for acquiring warning information and a preset warning word segmentation device, and carrying out word segmentation processing on the warning information according to the warning word segmentation device to obtain a plurality of high-frequency word groups;
the time sequence processing module is connected with the word segmentation processing module and used for receiving statistical period information and carrying out time sequence data aggregation on each high-frequency word group according to the statistical period information to obtain time sequence information corresponding to each high-frequency word group;
and the abnormality processing module is connected with the time sequence processing module and is used for inputting each high-frequency phrase and the time sequence information corresponding to the high-frequency phrase into a preset abnormality detection model to obtain an abnormal phrase with an abnormal period.
In some embodiments, according to another aspect of the present invention, the present invention further provides a storage medium, where at least one instruction is stored, and the instruction is loaded and executed by a processor to implement the above abnormal phrase detection method based on alert information.
The invention provides an abnormal phrase detection method, system and storage medium based on alarm information, which at least comprises the following technical effects:
(1) the method can detect the abnormality of the high-frequency phrases in the police information in different time dimensions, judge the abnormal phrases and the abnormal periods corresponding to the abnormal phrases, facilitate the police personnel to carry out data analysis according to the abnormal phrases and the abnormal periods corresponding to the abnormal phrases, and improve the reference value of a large amount of police information extraction degree abnormal phrases in the practical process;
(2) only abnormal phrases and time sequence information corresponding to the abnormal phrases in the alarm information in the current statistical period are judged, repeated discrete analysis on a large amount of alarm information is not needed, the analysis speed is greatly improved, and the applicability of real-time scene analysis in the practical application is improved;
(3) the full-scale high-frequency word statistics is carried out through the large data cluster and the full-text retrieval engine, the method is suitable for carrying out large data analysis on massive warning information to obtain high-frequency word groups appearing in the warning information, so that the high-frequency word groups can be conveniently subjected to abnormal detection on different time dimensions in the follow-up process, and the processing speed is increased by reducing the data calculation amount;
(4) filtering the word segmentation phrases for multiple times according to actual requirements, further improving the reference value of the word segmentation data by combining with user analysis requirements, reducing the data computation amount and improving the computation speed;
(5) and the abnormal phrases are screened according to the confidence coefficient, so that the abnormal phrases fed back to the user are more accurate, and the user can conveniently analyze data according to the abnormal phrases and the corresponding abnormal periods.
Drawings
In order to more clearly illustrate the technical solutions in the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise.
FIG. 1 is a flow chart of abnormal phrase detection based on alert information according to the present invention;
FIG. 2 is a flow chart of an abnormal phrase detection method based on alert information according to the present invention;
FIG. 3 is a flow chart of word segmentation processing in an abnormal word group detection method based on alert information according to the present invention;
FIG. 4 is another flowchart of an abnormal phrase detection method based on alert information according to the present invention;
FIG. 5 is a flowchart of abnormal phrase screening in the abnormal phrase detection method based on alert information according to the present invention;
FIG. 6 is another flowchart of an abnormal phrase detection method based on alert information according to the present invention
Fig. 7 is a diagram illustrating an example of an abnormal phrase detection system based on alert information according to the present invention.
Reference numbers in the figures: a word segmentation processing module-10, a time sequence processing module-20 and an exception handling module-30.
Detailed Description
In the following description, for purposes of explanation and not limitation, specific details are set forth, such as particular system structures, techniques, etc. in order to provide a thorough understanding of the embodiments of the present application. However, it will be apparent to one skilled in the art that the present application may be practiced in other embodiments that depart from these specific details. In other instances, detailed descriptions of well-known systems, devices, circuits, and methods are omitted so as not to obscure the description of the present application with unnecessary detail.
It will be understood that the terms "comprises" and/or "comprising," when used in this specification and the appended claims, specify the presence of stated features, integers, steps, operations, elements, and/or components, but do not preclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.
For the sake of simplicity, the drawings only schematically show the parts relevant to the present invention, and they do not represent the actual structure as a product. In addition, in order to make the drawings concise and understandable, components having the same structure or function in some of the drawings are only schematically depicted, or only one of them is labeled. In this document, "one" means not only "only one" but also a case of "more than one".
It should be further understood that the term "and/or" as used in this specification and the appended claims refers to and includes any and all possible combinations of one or more of the associated listed items.
In addition, in the description of the present application, the terms "first", "second", and the like are used only for distinguishing the description, and are not intended to indicate or imply relative importance.
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the following description will be made with reference to the accompanying drawings. It is obvious that the drawings in the following description are only some examples of the invention, and that for a person skilled in the art, other drawings and embodiments can be derived from them without inventive effort.
At present, in practice, a large amount of police information is screened, usually, an open source word segmentation device is adopted to perform word segmentation processing on text information, and after police keywords are extracted, the extraction result is directly sent to a police receiver and a police situation research department for analysis by related personnel. However, in the practical work, the police information has a large number of interfering phrases, such as keywords like "alarm", "injury", "bleeding", etc., and has no positive effect on the analysis of the warning situation, and only can be used as an interfering word to influence the analysis result.
Therefore, as shown in fig. 6, according to the abnormal phrase detection method based on alert information provided by the present application, a high-frequency phrase is extracted from alert information, and according to the abnormality of the high-frequency phrase in different time dimensions, a user can calibrate a statistical period and a period interval, for example, the statistical period is one month, the period interval is the last year, the number of word frequencies of each high-frequency word occurring in each month in the last year is counted, and the data is integrated to generate time series information of the high-frequency word, discrete analysis is performed on the time series information corresponding to each high-frequency phrase, the statistical period of an outlier is determined as an abnormal period, and the abnormal phrase having the abnormal period and the time series information of the abnormal phrase are fed back to the user. The method can effectively eliminate the interference phrases, can eliminate the interference phrases in the high-frequency phrases on the basis of satisfying the requirement that the user carries out directional analysis on the high-frequency phrases with abnormal word frequency in a certain statistical period, greatly improves the speed of analyzing valuable data in massive warning situation data by the user, analyzes abnormal conditions in the warning situation data in a directional manner, and meets the warning situation analysis requirement in the practical process.
Illustratively, the alarm data is segmented into segmented words such as segmented words "gambling machine", "crash", "accident", "collision", "rear-end collision" and "alarm". After the seven words are counted by the high-frequency words, the high-frequency words of gambling machine, friction, accident, collision, rear-end collision and alarm are judged. Screening according to three data labels of the district information of A city, the district unit of traffic management department and the case category of traffic incident, and removing the high-frequency word of 'gambling machine'. And sequencing the rest five words according to the word frequency number from large to small, integrating to generate a word frequency list, taking the first four words in the list as high-frequency phrases, and removing the 'clash' with the minimum word frequency number. Respectively counting the word frequency number of the remaining four words in the past 14 days every day, integrating the information into the time sequence information of each word, counting the dispersion of the word frequency number in each counting period in the time sequence information corresponding to each high-frequency word group through a 1-sigma model, and judging the counting period with the dispersion larger than a preset dispersion threshold value as the counting period of the outlier. The statistical periods of 7 th, 8 th, 9 th and 10 th days of 'collision' and 'rear-end collision' are judged to be abnormal periods, and 'accidents' and 'alarms' do not have abnormal periods, so that the 'collision' and the 'rear-end collision' are judged to be abnormal phrases, the confidence degrees of the 'collision' and the 'rear-end collision' are respectively calculated, the 'collision' and the 'rear-end collision' are sorted from high to low according to the confidence degrees, the abnormal phrases with the highest confidence degrees are output, for example, the confidence degree of the 'collision' is higher, the 'collision' is output to workers for analysis, and the phenomenon that the construction of an isolation belt at a certain intersection affects the turning sight line during the 7 th, 8 th, 9 th and 10 th days in the past two-week period is found to increase the accidents of vehicle collision. The above example can illustrate that the abnormal phrase detection method based on the warning situation information provided by the application can greatly reduce the data volume for performing high-frequency word analysis by police officers, and can more easily identify the cause of the abnormal situation in the warning work aiming at the phrase directionality analysis with abnormal fluctuation.
In one embodiment, as shown in fig. 1, the present invention provides a method for detecting an abnormal phrase based on alert information, including the steps of:
s100, acquiring the warning information and performing word segmentation processing on the warning information to obtain a plurality of high-frequency phrases.
Specifically, the alert information to be analyzed can be input by a user, or the alert system can be accessed through an alert interface, the alert information in the alert system is received in real time, word segmentation processing is performed according to a word segmentation device or other word segmentation models, and then high-frequency word statistics is performed on all the segmented words to obtain a plurality of high-frequency word groups in the alert information.
S200, receiving the statistical period, and performing time sequence data aggregation on each high-frequency phrase according to the statistical period to obtain time sequence information corresponding to each high-frequency phrase.
Specifically, the statistical period is usually a time measurement unit, such as 3 hours, 12 hours, days, weeks, months, quarters, years, and the like, and in the statistical alert data, the word frequency of the high-frequency word group in each statistical period is counted, and the data is integrated to generate time series information corresponding to each high-frequency word group.
S300, respectively carrying out discrete analysis on the time sequence information corresponding to each high-frequency phrase, and taking the outlier statistical period in the time sequence information corresponding to each high-frequency phrase as an abnormal period to obtain the abnormal phrase with the abnormal period.
Specifically, discrete analysis is performed on time sequence information corresponding to each high-frequency phrase, the discrete condition that the word frequency number of each statistical period of each high-frequency word is compared with the word frequency number of all the statistical periods of the high-frequency word is judged, when the dispersion degree of a certain statistical period is large, the period is judged to be an outlier period, meanwhile, the high-frequency phrase corresponding to the period is judged to be an outlier phrase, and each outlier phrase and the outlier period corresponding to each outlier phrase are obtained.
The abnormal phrase detection method based on the alert information provided by the embodiment can detect the abnormality of the high-frequency phrase in the alert information in different time dimensions, and judge the abnormal phrase and the abnormal period corresponding to the abnormal phrase, so that the alert personnel can conveniently perform data analysis according to the abnormal phrase and the abnormal period corresponding to the abnormal phrase, and the reference value of extracting the abnormal phrase from a large amount of alert information in the practical process is improved.
In one embodiment, the step S300 performs discrete analysis on the time series information corresponding to each high-frequency phrase, and takes an outlier statistical period in the time series information corresponding to each high-frequency phrase as an abnormal period, so as to obtain an abnormal phrase with the abnormal period, and further includes the steps of:
and screening the abnormal phrases, screening and removing non-abnormal phrases in the abnormal phrases, and optimizing a police dictionary and a police stop word list according to the abnormal phrases and the non-abnormal phrases.
Specifically, the process of screening and rejecting non-abnormal phrases in abnormal phrases can be automatically screened by carrying a machine-learned screening model, and can also be manually screened. After screening, inputting the non-abnormal phrases into the word segmentation device, optimizing the word segmentation device, so that the phrases are not screened in the subsequent screening process, reducing the data operation amount and the operation speed in the word segmentation process, and simultaneously reducing the occurrence probability of interference words in the abnormal phrases.
In an embodiment, as shown in fig. 2, after the step S200 receives the statistical period and performs time series data aggregation on each high-frequency phrase according to the statistical period to obtain time series information corresponding to each high-frequency phrase, the method further includes the steps of:
s400, extracting real-time alarm information in the alarm information within the current statistical period, and acquiring a plurality of current high-frequency phrases and corresponding time sequence information in the real-time alarm information.
Specifically, only the real-time alarm information in the current statistical period in the alarm information is extracted, and time sequence data aggregation is performed on each high-frequency phrase in the current statistical period.
S500, according to the time sequence information corresponding to all the high-frequency phrases, performing discrete analysis on each current high-frequency phrase and the time sequence information corresponding to the current high-frequency phrase to obtain the current high-frequency phrase with abnormal periods.
Specifically, the time sequence information of each high-frequency phrase in the current statistical period and the time sequence information of all high-frequency phrases are subjected to discrete analysis, and the discrete degree of the time sequence information of the rest periods in all high-frequency phrases is not required to be analyzed. Or directly adopting the time sequence information of each high-frequency phrase of the alarm information in the historical analysis record as a data base to carry out discrete analysis on the time sequence information of each high-frequency phrase in the current statistical period.
The abnormal phrase detection method based on the alert information provided by this embodiment can perform, based on the alert information of each statistical period in the alert information, a discrete analysis on the real-time alert information in the current statistical period, and only judge the abnormal phrase and the time sequence information corresponding to the abnormal phrase in the alert information in the current statistical period. Repeated discrete analysis on a large amount of alarm information is not needed, the analysis speed is greatly increased, and the applicability of real-time scene analysis in the practice is improved.
In one embodiment, as shown in fig. 3, the step S100 of obtaining the warning information and performing word segmentation processing on the warning information to obtain a plurality of high-frequency phrases specifically includes:
s110, a police dictionary, a police stop word list and a preset police word segmentation device are obtained, and the police dictionary and the police stop word list are input into the police word segmentation device.
Illustratively, a police dictionary and a police stop list are set based on a police scene and optimized according to each data analysis result, the police dictionary including, for example, two types of gambling and traffic accidents: the gambling class dictionary includes: gambling money, lotteries, gambling materials, gambling tools, banners, lotteries, lottery tickets, lotteries, festivals, quints, points, plays lotteries, places to buy lotteries, purchases on the web, plays lotteries, gambling machines, fishing machines, cigarette grasping machines, slot machines, gambling equipment, game machines, fruit machines, lotus machines, and the like. The traffic accident dictionary includes: rear-end collisions, squeegees, grabs, scrapes, rubs, bumps, crashes, bumps, and the like.
The police word segmentation device supports various general word segmentation devices and user-defined word segmentation devices, and comprises word segmentation devices such as the shortest-path word segmentation device, the top-speed dictionary word segmentation device, the index word segmentation device, the CRF word segmentation device, the NLP word segmentation device, the deep learning word segmentation device and the like.
S120, the alarm situation information is subjected to word segmentation through the alarm word segmentation device to obtain a plurality of first word segmentation word groups.
S130, all the first word segmentation word groups are led into a big data cluster and a full-text retrieval engine, and full-quantity high-frequency word statistics is carried out to obtain a plurality of second word segmentation word groups.
S140, filtering the plurality of second participle word groups according to the police data labels to obtain a plurality of third participle word groups.
Specifically, the first word group comprises at least one police data tag, and the data tag comprises jurisdiction information, jurisdiction units and case categories.
S150 presets a first sequence number threshold.
Specifically, the first sequence number threshold indicates how many high-frequency phrases need to be subjected to subsequent time series data aggregation processing, and if 100 representative data need to be taken, the value is set to 100.
S160, sorting the third word segmentation word groups from large to small according to the word frequency number, and integrating to generate a word frequency list.
Specifically, the word frequency list can be directly output to a user for analysis by the user, or each high-frequency phrase in the word frequency list can be subjected to subsequent time series data aggregation processing, and then an abnormal phrase is output.
S170, counting a third word segmentation phrase with the word frequency ordering sequence number smaller than the first ordering sequence number threshold value in the word frequency list as a fourth word segmentation phrase, and taking the fourth word segmentation phrase as a high-frequency phrase.
Specifically, in the process of steps S110 to S170, the second word segmentation phrase and the third high-frequency phrase may be output as the high-frequency phrase according to the user requirement.
The abnormal phrase detection method based on the alert information provided by this embodiment filters the participle phrases for multiple times according to the word frequency number, and performs abnormal analysis on the participle phrases with large word frequency number values in different time dimensions, so as to further reduce the data amount in the alert data analysis process, improve the reference value of the data, and improve the efficiency of performing data analysis by the user according to the filtered high-frequency phrases.
In one embodiment, as shown in fig. 4, the abnormal phrase detection method based on alert information further includes the steps of:
s100, acquiring the warning information and performing word segmentation processing on the warning information to obtain a plurality of high-frequency phrases.
Specifically, the alert information to be analyzed can be input by a user, or the alert system can be accessed through an alert interface, the alert information in the alert system is received in real time, word segmentation processing is performed according to a word segmentation device or other word segmentation models, and then high-frequency word statistics is performed on all the segmented words to obtain a plurality of high-frequency word groups in the alert information.
S210 receives the statistical period and the period interval.
S220, counting the word frequency number of each high-frequency word group in each counting period in the period interval, and taking the word frequency number of each high-frequency word group in each counting period in the period interval as the time sequence information corresponding to the high-frequency word group.
Specifically, according to a time interval set by a user, a statistical period is extracted every preset time interval in a period interval, a word frequency number of the high-frequency word group is extracted once in the statistical period, and time sequence information corresponding to each high-frequency word group is sequentially calculated in a recursion manner. The time series information in the frame can also be calculated by framing the time series according to the unit time length set by the user. And calculating the word frequency number of each high-frequency phrase in the historical synchronization in all statistical periods in the period interval according to the time period calibrated by the user.
S310 obtains a preset outlier detection model.
Specifically, the outlier detection model supports a general outlier detection algorithm and a self-defined outlier detection algorithm, including a 1-sigma rule, a 3-sigma rule, a GESD outlier detection method, a local anomaly factor outlier detection rule, and the like.
S320, detecting the dispersion of the word frequency in each statistical period in the time sequence information corresponding to each high-frequency word group through the outlier detection model.
S330, judging that the statistical period of the dispersion greater than the preset dispersion threshold is an outlier statistical period.
S340, taking the outlier statistical period in the time series information corresponding to each high-frequency phrase as an abnormal period, and obtaining the abnormal phrase with the abnormal period.
Specifically, discrete analysis is performed on time sequence information corresponding to each high-frequency phrase, the discrete condition that the word frequency number of each statistical period of each high-frequency word is compared with the word frequency number of all the statistical periods of the high-frequency word is judged, when the dispersion degree of a certain statistical period is large, the period is judged to be an outlier period, meanwhile, the high-frequency phrase corresponding to the period is judged to be an outlier phrase, and each outlier phrase and the outlier period corresponding to each outlier phrase are obtained.
The abnormal phrase detection method based on the alert information provided by this embodiment discloses a specific step of time series data aggregation, which integrates the above information to generate time series information of each high-frequency word by counting the word frequency number corresponding to each statistical period in a period interval, and discloses a specific scheme for performing discrete analysis on the time series information of the high-frequency phrase, and judges whether a statistical period is an outlier statistical period by comparing the dispersion of the word frequency number in each statistical period with the size of a discrete threshold.
In an embodiment, as shown in fig. 5, the step S300 performs discrete analysis on the time series information corresponding to each high-frequency phrase, and takes an outlier statistical period in the time series information corresponding to each high-frequency phrase as an abnormal period, so as to obtain an abnormal phrase with the abnormal period, and further includes the steps of:
s610, acquiring a preset confidence threshold and a preset second sorting sequence number threshold.
And S620, calculating the corresponding confidence of each abnormal phrase according to the confidence threshold value.
S630, sorting the abnormal phrases according to the confidence degree from high to low, and outputting the abnormal phrases with the sorting sequence number smaller than the second sorting sequence number threshold value in the abnormal phrases.
According to the abnormal phrase detection method based on the warning information, the confidence of each abnormal phrase is calculated and sorted according to the confidence, and the abnormal phrases are screened according to the confidence, so that the abnormal phrases fed back to the user are more accurate, and the user can conveniently perform data analysis according to the abnormal phrases and the corresponding abnormal periods of the abnormal phrases.
In an embodiment, as shown in fig. 7, the present invention further provides an abnormal phrase detection system based on alert information, which includes a word segmentation processing module 10, a timing sequence processing module 20, and an abnormal processing module 30.
The word segmentation processing module 10 is configured to obtain the warning information and a preset warning word segmentation device, and perform word segmentation processing on the warning information according to the warning word segmentation device to obtain a plurality of high-frequency word groups.
Specifically, the alert information to be analyzed can be input by a user, or the alert system can be accessed through an alert interface, the alert information in the alert system is received in real time, word segmentation processing is performed according to a word segmentation device or other word segmentation models, and then high-frequency word statistics is performed on all the segmented words to obtain a plurality of high-frequency word groups in the alert information.
The time sequence processing module 20 is connected to the word segmentation processing module 10, and is configured to receive the statistical period information, and perform time sequence data aggregation on each high-frequency word group according to the statistical period information to obtain time sequence information corresponding to each high-frequency word group.
Specifically, the statistical period is usually a time measurement unit, such as 3 hours, 12 hours, days, weeks, months, quarters, years, and the like, and in the statistical alert data, the word frequency of the high-frequency word group in each statistical period is counted, and the data is integrated to generate time series information corresponding to each high-frequency word group.
The exception handling module 30 is connected to the timing sequence handling module 20, and is configured to input each high-frequency phrase and the time sequence information corresponding to the high-frequency phrase into a preset exception detection model, so as to obtain an exception phrase with an exception period.
Specifically, discrete analysis is performed on time sequence information corresponding to each high-frequency phrase, the discrete condition that the word frequency number of each statistical period of each high-frequency word is compared with the word frequency number of all the statistical periods of the high-frequency word is judged, when the dispersion degree of a certain statistical period is large, the period is judged to be an outlier period, meanwhile, the high-frequency phrase corresponding to the period is judged to be an outlier phrase, and each outlier phrase and the outlier period corresponding to each outlier phrase are obtained.
The abnormal phrase detection system based on the alert information provided by the embodiment can detect the abnormality of the high-frequency phrase in the alert information in different time dimensions, and judge the abnormal phrase and the abnormal period corresponding to the abnormal phrase, so that the alert personnel can conveniently perform data analysis according to the abnormal phrase and the abnormal period corresponding to the abnormal phrase, and the reference value of extracting the abnormal phrase from a large amount of alert information in the practical process is improved.
In an embodiment, the present invention further provides a storage medium, where at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to implement the operations performed by the above-mentioned abnormal phrase detection method based on alert information. For example, the storage medium may be a read-only memory (ROM), a Random Access Memory (RAM), a compact disc read-only memory (CD-ROM), a magnetic tape, a floppy disk, an optical data storage device, and the like.
They may be implemented in program code that is executable by a computing device such that it is executed by the computing device, or separately, or as individual integrated circuit modules, or as a plurality or steps of individual integrated circuit modules. Thus, the present invention is not limited to any specific combination of hardware and software.
In the above embodiments, the descriptions of the respective embodiments have respective emphasis, and reference may be made to the related descriptions of other embodiments for parts that are not described or recited in detail in a certain embodiment.
Those of ordinary skill in the art will appreciate that the various illustrative elements and steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
In the embodiments provided in the present application, it should be understood that the disclosed abnormal phrase detection method, system and storage medium based on alert information may be implemented in other ways. For example, the above-described embodiments of a warning word group detection method, system and storage medium based on alert information are merely illustrative, and for example, the division of the module or unit is only a logical function division, and there may be another division manner in actual implementation, for example, a plurality of units or modules may be combined or may be integrated into another system, or some features may be omitted or not executed. In addition, the communication links shown or discussed may be through interfaces, devices or units, or integrated circuits, and may be electrical, mechanical or other forms.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit. The integrated unit can be realized in a form of hardware, and can also be realized in a form of a software functional unit.
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and it should be noted that, for those skilled in the art, various modifications and decorations can be made without departing from the principle of the present invention, and these modifications and decorations should also be regarded as the protection scope of the present invention.

Claims (10)

1. An abnormal phrase detection method based on alarm information is characterized by comprising the following steps:
acquiring warning information and performing word segmentation processing on the warning information to obtain a plurality of high-frequency phrases;
receiving a statistical period, and performing time sequence data aggregation on each high-frequency phrase according to the statistical period to obtain time sequence information corresponding to each high-frequency phrase;
and respectively carrying out discrete analysis on the time sequence information corresponding to each high-frequency phrase, and taking the outlier statistical period in the time sequence information corresponding to each high-frequency phrase as an abnormal period to obtain the abnormal phrase with the abnormal period.
2. The method for detecting abnormal phrases based on alert information according to claim 1, wherein after obtaining the time series information corresponding to each high frequency phrase, the method further comprises:
extracting real-time alarm information in the alarm information within the current statistical period, and acquiring a plurality of current high-frequency phrases in the real-time alarm information and the time sequence information corresponding to the current high-frequency phrases;
and according to the time sequence information corresponding to all the high-frequency phrases, performing discrete analysis on each current high-frequency phrase and the time sequence information corresponding to the current high-frequency phrase to obtain the current high-frequency phrase with the abnormal period.
3. The method for detecting the abnormal phrase based on the alert information according to claim 1, wherein the obtaining of the alert information and the word segmentation processing of the alert information are performed to obtain a plurality of high-frequency phrases, and specifically comprises:
acquiring a police dictionary, a police stop word list and a preset police word segmentation device, and inputting the police dictionary and the police stop word list into the police word segmentation device;
performing word segmentation processing on the warning information through the warning word segmentation device to obtain a plurality of first word segmentation word groups;
importing all the first word segmentation word groups into a big data cluster and a full-text retrieval engine, and carrying out full-quantity high-frequency word statistics to obtain a plurality of second word segmentation word groups;
and taking the second word segmentation word group as the high-frequency word group.
4. The abnormal phrase detection method based on alert information as recited in claim 3,
the first word segmentation group comprises at least one police data label, and the data label comprises jurisdiction information, jurisdiction units and case types;
after the full high-frequency word statistics is performed to obtain a plurality of second word groups, the method further comprises the following steps:
filtering the plurality of second word segmentation word groups according to the police data labels to obtain a plurality of third word segmentation word groups;
and taking the third word segmentation word group as the high-frequency word group.
5. The abnormal phrase detection method based on the alert information as recited in claim 4, wherein after obtaining the plurality of third segmentation word groups, the method further comprises:
presetting a first sequence number threshold;
sequencing all the third word segmentation word groups from large to small according to word frequency number, and integrating to generate a word frequency list;
counting the third word segmentation phrase with the word frequency number sequence number smaller than the first sequence number threshold value in the word frequency list as a fourth word segmentation phrase;
and taking the fourth word-dividing word group as the high-frequency word group.
6. The method according to any one of claims 1 to 5, wherein after the obtaining of the alert information and the word segmentation processing of the alert information to obtain a plurality of high-frequency word groups, before the discrete analysis of the time series information corresponding to each high-frequency word group, the method further comprises:
receiving the statistical period and the period interval;
counting the word frequency number of each high-frequency word group in each counting period in the period interval, and taking the word frequency number of each high-frequency word group in each counting period in the period interval as the time sequence information corresponding to the high-frequency word group.
7. The abnormal phrase detection method based on the alert information according to claim 1, wherein the discrete analysis is performed on the time series information corresponding to each high frequency phrase, specifically comprising:
acquiring a preset outlier detection model;
detecting the dispersion of the word frequency number in each statistical period in the time sequence information corresponding to each high-frequency word group through the outlier detection model;
and judging the statistical period of which the dispersion is greater than a preset dispersion threshold value as the statistical period of the outlier.
8. The method according to claim 7, further comprising, after obtaining the abnormal phrase having the abnormal period, the steps of:
acquiring a preset confidence threshold and a preset second sorting sequence number threshold;
calculating the confidence corresponding to each abnormal phrase according to the confidence threshold;
and sequencing all the abnormal phrases according to the confidence degree from high to low, and outputting the abnormal phrases of which the sequencing sequence numbers are smaller than the second sequencing sequence number threshold value in the abnormal phrases.
9. The utility model provides an unusual phrase detecting system based on alert feelings information which characterized in that includes:
the word segmentation processing module is used for acquiring warning information and a preset warning word segmentation device, and carrying out word segmentation processing on the warning information according to the warning word segmentation device to obtain a plurality of high-frequency word groups;
the time sequence processing module is connected with the word segmentation processing module and used for receiving statistical period information and carrying out time sequence data aggregation on each high-frequency word group according to the statistical period information to obtain time sequence information corresponding to each high-frequency word group;
and the abnormality processing module is connected with the time sequence processing module and is used for inputting each high-frequency phrase and the time sequence information corresponding to the high-frequency phrase into a preset abnormality detection model to obtain an abnormal phrase with an abnormal period.
10. A storage medium, wherein at least one instruction is stored in the storage medium, and the instruction is loaded and executed by a processor to implement the alert information based abnormal phrase detection method according to any one of claims 1 to 8.
CN202111566538.5A 2021-12-20 2021-12-20 Abnormal phrase detection method, system and storage medium based on alarm condition information Pending CN114357110A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111566538.5A CN114357110A (en) 2021-12-20 2021-12-20 Abnormal phrase detection method, system and storage medium based on alarm condition information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111566538.5A CN114357110A (en) 2021-12-20 2021-12-20 Abnormal phrase detection method, system and storage medium based on alarm condition information

Publications (1)

Publication Number Publication Date
CN114357110A true CN114357110A (en) 2022-04-15

Family

ID=81100791

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111566538.5A Pending CN114357110A (en) 2021-12-20 2021-12-20 Abnormal phrase detection method, system and storage medium based on alarm condition information

Country Status (1)

Country Link
CN (1) CN114357110A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115473789A (en) * 2022-09-16 2022-12-13 深信服科技股份有限公司 Alarm processing method and related equipment

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115473789A (en) * 2022-09-16 2022-12-13 深信服科技股份有限公司 Alarm processing method and related equipment
CN115473789B (en) * 2022-09-16 2024-02-27 深信服科技股份有限公司 Alarm processing method and related equipment

Similar Documents

Publication Publication Date Title
CN105335509B (en) A kind of method for recommending action message, device and server
CN108322347A (en) Data detection method, device, detection service device and storage medium
CN107578353A (en) The registrable property determination methods of work mark based on big data and device
CN110728525B (en) Delimitation processing method and device for network batch user complaints
CN109819126B (en) Abnormal number identification method and device
CN108600172B (en) Method, device and equipment for detecting database collision attack and computer readable storage medium
CN106327230B (en) Abnormal user detection method and equipment
CN116680113B (en) Equipment detection implementation control system
US8145585B2 (en) Automated methods and systems for the detection and identification of money service business transactions
CN114357110A (en) Abnormal phrase detection method, system and storage medium based on alarm condition information
CN113706291A (en) Fraud risk prediction method, device, equipment and storage medium
CN109819128A (en) A kind of quality detecting method and device of telephonograph
CN115759640A (en) Public service information processing system and method for smart city
CN106951360B (en) Data statistical integrity calculation method and system
CN111064719A (en) Method and device for detecting abnormal downloading behavior of file
CN113628043A (en) Complaint validity judgment method, device, equipment and medium based on data classification
CN113282920A (en) Log abnormity detection method and device, computer equipment and storage medium
CN102915315A (en) Method and system for classifying webpages
CN109409091B (en) Method, device and equipment for detecting Web page and computer storage medium
CN116302809A (en) Edge end data analysis and calculation device
CN115565373A (en) Real-time risk prediction method, device, equipment and medium for highway tunnel accident
CN115150206A (en) Intrusion detection safety early warning system and method for information safety
CN111507368B (en) Campus network intrusion detection method and system
CN115099339A (en) Fraud behavior identification method and device, electronic equipment and storage medium
CN113705625A (en) Method and device for identifying abnormal life guarantee application families and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination