CN116976339B

CN116976339B - Special condition analysis method, equipment and medium for expressway

Info

Publication number: CN116976339B
Application number: CN202311210811.XA
Authority: CN
Inventors: 万青松; 房宏基; 席永轲; 迟猛; 程卫平; 尹淑婷
Original assignee: Shandong High Speed Information Group Co ltd
Current assignee: Shandong High Speed Information Group Co ltd
Priority date: 2023-09-20
Filing date: 2023-09-20
Publication date: 2023-12-22
Anticipated expiration: 2043-09-20
Also published as: CN116976339A

Abstract

The application discloses a special condition analysis method, equipment and medium for expressways, wherein the method comprises the following steps: acquiring historical special condition data to obtain corresponding text vector data; clustering the text vector data based on a clustering algorithm of local density; based on a parallel association rule algorithm, carrying out support degree calculation on clustered text vector data, and generating association rule data meeting confidence degree pre-support based on frequent item sets to obtain a plurality of special condition data categories; extracting core feature words aiming at each special condition data category to obtain a category model corresponding to the special condition data category; and acquiring new special condition data, and determining a similar special condition data set with the new special condition data. By means of rapid clustering, association analysis, core feature word extraction and category model establishment, similar special condition data can be found in the historical special condition data rapidly aiming at the new special condition data, so that workers can process rapidly.

Description

Special condition analysis method, equipment and medium for expressway

Technical Field

The application relates to the field of traffic control systems, in particular to a special condition analysis method, equipment and medium for highways.

Background

With the continuous development of social services and transportation industry, the problems in special conditions (special conditions refer to short for special conditions, such as tolling special conditions, accidents, traffic jams, weather mutation and the like) of the expressway are more diversified, and with the continuous increase of the mileage of the expressway, the acceptance of various special conditions is more and more increased. At present, the high-speed special condition processing mode is mainly manually handled, and when special conditions occur, related staff receives user reports through a telephone, an interphone or a monitoring system and takes corresponding measures.

However, this approach requires high business familiarity for the staff, and new staff often need extensive training to be able to perform. Meanwhile, as the special condition acceptance information is continuously enlarged, the problems of low working efficiency and the like are gradually presented to business personnel.

Disclosure of Invention

In order to solve the above problems, the present application proposes a special condition analysis method for expressways, comprising:

acquiring historical special condition data, and preprocessing the historical special condition data to obtain corresponding text vector data;

clustering the text vector data based on a clustering algorithm of local density;

based on a parallel association rule algorithm, carrying out support degree calculation on the clustered text vector data to obtain a frequent item set, and generating association rule data meeting confidence degree pre-support based on the frequent item set so as to carry out data classification according to the association rule data to obtain a plurality of special emotion data categories;

extracting core feature words for each special condition data category, and obtaining a category model corresponding to the special condition data category based on the weight value corresponding to the core feature words;

and acquiring new special condition data, and analyzing the new special condition data based on the category model to determine a similar special condition data set with the new special condition data.

On the other hand, the application also provides a special condition analysis device for the expressway, which comprises:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform operations such as: the special case analysis method for expressways described in the above example.

In another aspect, the present application also proposes a non-volatile computer storage medium storing computer-executable instructions configured to: the special case analysis method for expressways described in the above example.

The special condition analysis method for the expressway provided by the application can bring the following beneficial effects:

by means of rapid clustering, association analysis, core feature word extraction and category model establishment, similar special condition data can be found in the historical special condition data rapidly aiming at the new special condition data, so that workers can process rapidly. And the method of unsupervised training is adopted, so that even if special condition data are continuously expanded and similar special conditions are continuously increased, accurate inquiry of similar special conditions can be realized without manual labeling.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiments of the application and together with the description serve to explain the application and do not constitute an undue limitation to the application. In the drawings:

FIG. 1 is a flow chart of a special case analysis method for a highway according to an embodiment of the present application;

fig. 2 is a schematic diagram of a special case analysis device for expressways in an embodiment of the present application.

Detailed Description

For the purposes, technical solutions and advantages of the present application, the technical solutions of the present application will be clearly and completely described below with reference to specific embodiments of the present application and corresponding drawings. It will be apparent that the described embodiments are only some, but not all, of the embodiments of the present application. All other embodiments, which can be made by one of ordinary skill in the art without undue burden from the present disclosure, are within the scope of the present disclosure.

The following describes in detail the technical solutions provided by the embodiments of the present application with reference to the accompanying drawings.

Through analysis of expressway special condition data, partial similar special conditions exist in various special condition information, the handling methods of the similar special conditions are approximately the same, and if the handling conditions of the similar special conditions are provided for business personnel for reference, the working efficiency of the business personnel can be greatly improved, and the service quality of a high-speed operation unit can be improved.

Based on the method, semi-intelligent screening of similar special conditions based on a method of combining artificial marking, statistics and the like with knowledge and knowledge of business personnel is provided. For example, based on a large amount of acquired historical special condition data, a manual marking mode is adopted to perform category identification (such as accident, congestion, charge and the like) on each piece of historical data, the tag is stored in a database as a parameter, and after a piece of new special condition data is obtained, the historical tag can be subjected to matching inquiry through database inquiry to screen out similar special conditions of the new special condition.

However, as expressway special conditions are diversified, similar special conditions are increased, new words are appeared continuously, data are noisier, so that the accuracy of the similar special condition screening method is low, the time complexity is higher, and business personnel may have weak reference to the similar special conditions when transacting business, so that special condition business volume is increased continuously, and the transacting efficiency and the service quality are difficult to meet the demands of the travel public.

Based on this, it is proposed that as shown in fig. 1, an embodiment of the present application provides a special case analysis method for expressways, including:

s101: and acquiring historical special condition data, and preprocessing the historical special condition data to obtain corresponding text vector data.

Specifically, the pretreatment process may include: and removing preset special symbols and punctuation marks from the historical special condition data. In the rest text data, the historical special condition data is segmented (also called data word segmentation), word shape and word group are carried out, words or phrases obtained after segmentation are restored into original word shapes, training of text word vectors is carried out, and text normalization is carried out on the original word shapes to obtain corresponding text vector data. Of course, it is also possible to splice and correct text data therein.

Preprocessing the historical special condition data, so as to clean, convert and sort the data, and make the data become normalized text data suitable for subsequent text clustering and mining tasks.

S102: and clustering the text vector data based on a clustering algorithm of local density.

And clustering the special condition data based on a local density rapid clustering algorithm for the normalized text vector data, so as to realize the classification of the whole special condition data.

In particular, the local density describes the degree of aggregation of data around a data node. The relative distance describes the distance of one data node from other data nodes having a greater local density. If the local density value and the relative distance value of one data node are larger, which means that there are more data nodes around itself and the distance from the data node with more data nodes around another is longer, it is considered as a cluster center.

And regarding each piece of text vector data, taking the text vector data as a data node, and determining coordinate data corresponding to the data node, namely the coordinate data obtained by the corresponding vector direction and vector length. And determining the local density corresponding to the data node through the coordinate data and a preset dc value.

Determining a plurality of data nodes with local densities higher than a preset density threshold value, and taking the plurality of data nodes as a clustering center; and determining relative distances between the data nodes and other data nodes, and if the relative distances are higher than a preset distance threshold value, determining the other data nodes as cluster centers, wherein each cluster center and other surrounding data nodes form a category.

The preset distance threshold may also be called a cut-off distance, and is dynamically updated based on the total data amount (obtained by multiplying the average distance between all data nodes by the corresponding weight and then performing accumulation and summation), and the larger the data amount is, the larger the preset example threshold is generally.

S103: and carrying out support degree calculation on the clustered text vector data based on a parallel association rule algorithm to obtain a frequent item set, and generating association rule data meeting confidence degree pre-support based on the frequent item set so as to carry out data classification according to the association rule data to obtain a plurality of special emotion data categories.

And grouping the classified special emotion data by adopting a parallel Apriori association rule algorithm, and classifying the classified special emotion data into different special emotion type sets according to the strong association among text data.

Specifically, a dynamically set support threshold is determined, based on the support threshold, a threshold judgment is performed on keywords in a keyword set corresponding to text vector data, if the support corresponding to the keywords is not smaller than the support threshold, the keywords are used as frequent keywords in a frequent item set, the obtained frequent item set is used iteratively, so that new frequent keywords are obtained through the support of the remaining keywords until no new frequent keywords are generated in the keyword set.

In the iteration process, the keywords which are greater than or equal to the support threshold value are reserved to obtain 1 frequent keywords, then the last obtained (n-1) frequent item set is used for continuous iteration, the support degree of the keywords in the frequent item set is calculated, the frequent keywords which meet the support threshold value are reserved, new frequent keywords are generated, and no new frequent item set is generated.

S104: and extracting core feature words for each special condition data category, and obtaining a category model corresponding to the special condition data category based on the weight value corresponding to the core feature words.

Extracting keywords from the data of each special condition data category by adopting a TF-IDF algorithm, screening the keywords of each category of data according to a dynamic threshold, and then training each category of data by applying an LDA algorithm to construct a corresponding category model.

Specifically, for each special case data category, determining a text occurrence frequency TF value of each keyword contained in the text occurrence frequency TF value, and determining an inverse document occurrence frequency IDF value of each keyword in a total corpus corresponding to all special case data categories. And determining a weight value corresponding to each keyword according to the text occurrence frequency TF value and the reverse file occurrence frequency IDF value. For example, the weight value of each keyword is finally determined by using TF value.

And taking the keywords with weight values higher than a preset dynamic weight threshold value as core feature words, setting a dynamic screening threshold value according to text characteristics, screening the keywords of the historical special condition data of each class, and only retaining the core feature words.

And aiming at each special condition data category, obtaining a category model corresponding to the special condition data category according to the core feature words contained in the special condition data category and the weight values of the core feature words. The category model may be an LDA model, and the category model corresponding to the specific condition data category is obtained in the form of a weight value of a core feature word, for example, a category model may include: the traffic control system comprises the following components of [0.029 ] an emergency lane "+0.023 ] a stop" +0.020 ] a penalty "+0.016 ] a congestion" +0.014 ] a drive-away "+0.014 ] a warning board. And integrating the category model into a total model library to obtain a category total model.

S105: and acquiring new special condition data, and analyzing the new special condition data based on the category model to determine a similar special condition data set with the new special condition data.

And after acquiring a piece of new special condition data, screening candidate categories by combining a keyword fitting and repeatability eliminating method, screening historical special conditions by a time threshold and the candidate categories, and calculating the similarity fitting degree of the screened historical special conditions, so that a similar special condition set of the special condition data is rapidly and accurately predicted.

Specifically, the new special condition data are analyzed through each class model respectively, so that keyword fitting and repeatability elimination are carried out on the new special condition data, and the candidate special condition data class is predicted. The keyword fitting and repeatability eliminating means that keywords are extracted from new special condition data, fitting is carried out among the keywords, repeated keywords are removed, and only proper keywords are left.

And screening the historical special condition data in the candidate special condition data category based on the predicted candidate special condition data category and the corresponding time threshold. For example, candidate special case data within a recent time threshold is selected only from the candidate special case data categories.

And performing fitting degree calculation (for example, calculating through (number of special condition matching words/word segmentation length of target special condition data) on the screened historical special condition data and the new special condition data, wherein the number of special condition matching words refers to the number of the same keywords between the new special condition data and the screened historical special condition data, the target special condition data refers to the new special condition data) and sequencing, and selecting a plurality of historical special condition data with the highest sequencing as a similar special condition data set of the new special condition data.

As shown in fig. 2, the embodiment of the present application further provides a special condition analysis device for an expressway, including:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; the processor is connected with the memory through bus communication;

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform operations such as: the special case analysis method for expressways according to any one of the above embodiments.

The embodiments also provide a non-volatile computer storage medium storing computer executable instructions configured to: the special case analysis method for expressways according to any one of the above embodiments.

All embodiments in the application are described in a progressive manner, and identical and similar parts of all embodiments are mutually referred, so that each embodiment mainly describes differences from other embodiments. In particular, for the apparatus and medium embodiments, the description is relatively simple, as it is substantially similar to the method embodiments, with reference to the section of the method embodiments being relevant.

The devices and media provided in the embodiments of the present application are in one-to-one correspondence with the methods, so that the devices and media also have similar beneficial technical effects as the corresponding methods, and since the beneficial technical effects of the methods have been described in detail above, the beneficial technical effects of the devices and media are not described in detail herein.

It will be appreciated by those skilled in the art that embodiments of the present application may be provided as a method, system, or computer program product. Accordingly, the present application may take the form of an entirely hardware embodiment, an entirely software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present application may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present application is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the application. It will be understood that each flow and/or block of the flowchart illustrations and/or block diagrams, and combinations of flows and/or blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In one typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include volatile memory in a computer-readable medium, random Access Memory (RAM) and/or nonvolatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of computer-readable media.

Computer readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media (transmission media), such as modulated data signals and carrier waves.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article or apparatus that comprises the element.

The foregoing is merely exemplary of the present application and is not intended to limit the present application. Various modifications and changes may be made to the present application by those skilled in the art. Any modifications, equivalent substitutions, improvements, etc. which are within the spirit and principles of the present application are intended to be included within the scope of the claims of the present application.

Claims

1. A special condition analysis method for an expressway, comprising:

based on a parallel association rule algorithm, carrying out support degree calculation on the clustered text vector data to obtain a frequent item set, and generating association rule data meeting a confidence coefficient threshold based on the frequent item set so as to carry out data classification according to the association rule data to obtain a plurality of special emotion data categories;

acquiring new special emotion data, and analyzing the new special emotion data based on the category model to determine a similar special emotion data set with the new special emotion data;

extracting core feature words for each special condition data category, and obtaining a category model corresponding to the special condition data category based on the weight value corresponding to the core feature words, wherein the method specifically comprises the following steps:

determining a text occurrence frequency TF value of each keyword contained in each special case data category, and determining a reverse file occurrence frequency IDF value of each keyword in all special case data categories;

determining a weight value corresponding to each keyword according to the text occurrence frequency TF value and the reverse file occurrence frequency IDF value;

the key words with the weight values higher than the preset dynamic weight threshold value are used as core feature words;

aiming at each special case data category, obtaining a category model corresponding to the special case data category according to the core feature words contained in the special case data category and the weight values of the core feature words;

according to the core feature words contained in the model and the weight values of the core feature words, a category model corresponding to the special condition data category is obtained, and the model specifically comprises the following steps:

according to the core feature words contained in the model and the weight values of the core feature words, obtaining a category model corresponding to the special condition data category in the form of the weight values of the core feature words;

and integrating the category model into a total model library to obtain a category total model.

2. The method according to claim 1, wherein preprocessing the historical special case data specifically comprises:

removing preset special symbols and punctuation marks from the historical special condition data;

in the rest text data, dividing the historical special condition data, and restoring the words or phrases obtained after dividing into original word shapes;

training the text word vector to normalize the text of the original word shape to obtain corresponding text vector data.

3. The method according to claim 2, wherein clustering the text vector data based on a clustering algorithm of local densities, in particular comprises:

aiming at each piece of text vector data, taking the text vector data as a data node, and determining coordinate data corresponding to the data node;

determining the local density corresponding to the data node through the coordinate data and a preset dc value;

determining a plurality of data nodes with local density higher than a preset density threshold value, and taking the plurality of data nodes as a clustering center;

and determining relative distances between the data nodes and other data nodes, and determining the other data nodes as a clustering center if the relative distances are higher than a preset distance threshold value.

4. A method according to claim 3, wherein the preset distance threshold is dynamically updated, and wherein the determining of the preset distance threshold comprises:

and obtaining the total data according to the average distance among all the data nodes and the corresponding weight, and dynamically updating the preset distance threshold according to the total data.

5. The method according to claim 1, wherein the supporting degree calculation is performed on the clustered text vector data based on a parallel association rule algorithm to obtain a frequent item set, and the method specifically comprises:

determining a dynamically set support threshold, and performing threshold judgment on keywords in a keyword set corresponding to the text vector data based on the support threshold;

if the support degree corresponding to the keyword is not less than the support degree threshold, the keyword is used as a frequent keyword in a frequent item set;

and iteratively using the obtained frequent item set to obtain new frequent keywords through the support degree of the residual keywords until no new frequent keywords are generated in the keyword set.

6. The method according to claim 1, wherein analyzing the new special case data based on the category model to determine a set of similar special case data to the new special case data, comprises:

analyzing the new special condition data through each model respectively to perform keyword fitting and repeatability elimination on the new special condition data and predict candidate special condition data types;

screening historical special condition data in the candidate special condition data category based on the predicted candidate special condition data category and a corresponding time threshold;

and aiming at the historical special emotion data obtained through screening, carrying out fitting degree calculation and sequencing on the historical special emotion data and the new special emotion data, and selecting a plurality of historical special emotion data with the highest sequencing as a similar special emotion data set of the new special emotion data.

7. A special case analysis device for an expressway, comprising:

at least one processor; the method comprises the steps of,

a memory communicatively coupled to the at least one processor; wherein,

the memory stores instructions executable by the at least one processor to enable the at least one processor to perform operations such as: the special case analysis method for an expressway according to any one of claims 1 to 6.

8. A non-transitory computer storage medium storing computer-executable instructions, the computer-executable instructions configured to: the special case analysis method for an expressway according to any one of claims 1 to 6.