CN112214990A - Method and device for extracting key words of rail transit maintenance work order - Google Patents
Method and device for extracting key words of rail transit maintenance work order Download PDFInfo
- Publication number
- CN112214990A CN112214990A CN202011015127.2A CN202011015127A CN112214990A CN 112214990 A CN112214990 A CN 112214990A CN 202011015127 A CN202011015127 A CN 202011015127A CN 112214990 A CN112214990 A CN 112214990A
- Authority
- CN
- China
- Prior art keywords
- word
- maintenance work
- work order
- corpus
- rail transit
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012423 maintenance Methods 0.000 title claims abstract description 195
- 238000000034 method Methods 0.000 title claims abstract description 40
- 230000008439 repair process Effects 0.000 claims description 42
- 238000000605 extraction Methods 0.000 claims description 25
- 238000004590 computer program Methods 0.000 claims description 12
- 238000010606 normalization Methods 0.000 claims description 9
- 238000012545 processing Methods 0.000 claims description 6
- 238000004140 cleaning Methods 0.000 claims description 4
- 230000011218 segmentation Effects 0.000 claims description 4
- 238000004364 calculation method Methods 0.000 description 6
- 238000004891 communication Methods 0.000 description 5
- 238000010586 diagram Methods 0.000 description 4
- 230000008569 process Effects 0.000 description 4
- 230000008901 benefit Effects 0.000 description 2
- 238000001514 detection method Methods 0.000 description 2
- 239000011159 matrix material Substances 0.000 description 2
- 230000003287 optical effect Effects 0.000 description 2
- 238000009825 accumulation Methods 0.000 description 1
- 210000001072 colon Anatomy 0.000 description 1
- 230000007547 defect Effects 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000007689 inspection Methods 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000007500 overflow downdraw method Methods 0.000 description 1
- 238000013077 scoring method Methods 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Probability & Statistics with Applications (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The embodiment of the invention provides a method and a device for extracting keywords of a rail transit maintenance work order, which comprises the steps of firstly obtaining a maintenance work order corpus, and calculating the product of the word frequency of each word in the corpus and the reverse file frequency; then, determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; and finally, determining the key words of the rail transit maintenance work order according to the actual weight of each word in the maintenance work order corpus. Make full use of track traffic maintenance work order data, processed through track traffic maintenance work order data and determined track traffic maintenance work order keyword, and then can classify the track traffic maintenance work order through the keyword, problem that appears in the more quick definite track traffic maintenance work order and the reason of finding the problem.
Description
Technical Field
The invention relates to the technical field of rail transit, in particular to a method and a device for extracting keywords of a rail transit maintenance work order.
Background
The rail transit technology develops rapidly. From the maintenance perspective, operation and maintenance are important links of safe operation of the subway system. In the rail transit maintenance process, a method of manually recording work orders is generally adopted, and the work order accumulation amount is increased in the long-term subway operation maintenance process. The work order data records the reasons of fault occurrence, the description of fault occurrence and the solution of fault, and has larger information amount.
The data provided by the track device conforms to big data features. The data is various, and different track devices generate various data with complicated contents in different life cycle times. The data format is various, and includes traditional paper data such as manual line check record book, track check instrument format data, vehicle-mounted control system (VOBC), driving record data, and so on. Data structures are diverse, including structured data, semi-structured data, unstructured data, and so forth. The data scale is huge, the rail inspection vehicle detection data is taken as an example, the detection is carried out twice per month according to a positive line, corresponding waveform data and overrun data are stored in a database, and the data volume of one year can reach 9 TB.
However, work order data is not currently being utilized efficiently, and if such a large amount of data is not being utilized efficiently, there is no benefit in determining the cause of the fault.
Disclosure of Invention
The embodiment of the invention provides a method and a device for extracting keywords of a rail transit maintenance work order, which are used for overcoming the defects in the prior art.
The embodiment of the invention provides a method for extracting keywords of a rail transit maintenance work order, which comprises the following steps:
obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
The rail transit maintenance work order keyword extraction method according to one embodiment of the invention further comprises the following steps:
receiving a keyword to be inquired, and determining all alternative rail transit maintenance work orders containing the keyword of the rail transit maintenance work order to be inquired;
and calculating the weight of each alternative rail transit maintenance work order, and taking the alternative rail transit maintenance work order with higher weight as the rail transit maintenance work order corresponding to the keyword to be inquired.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the actual weight of each word in the maintenance work order corpus is determined based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency, and the method specifically comprises the following steps:
and determining the actual weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the actual weight is determined based on the initial weight and the product of the word frequency and the reverse file frequency of each word by combining with a tuning parameter, and the method specifically comprises the following steps:
determining an intermediate weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter;
and based on a maximum and minimum normalization method, performing normalization processing on the intermediate weight to obtain the actual weight.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the intermediate weight is determined based on the initial weight and the product of the word frequency and the reverse file frequency of each word by combining with a tuning parameter, and the method specifically comprises the following steps:
determining the intermediate weight by the following formula:
wherein q isiIs the intermediate weight of the ith word, TF is the word frequency of the ith word, IDF is the inverse file frequency of the ith word, k is the tuning parameter,is the initial weight of the ith word.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the intermediate weight is normalized based on the maximum and minimum normalization method to obtain the actual weight, and the method specifically comprises the following steps:
determining the actual weights by:
wherein q isi-finalIs the actual weight of the ith word, qiIntermediate weight for the ith word, qminIs the minimum of all intermediate weights, qmaxIs the maximum of all intermediate weights.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the obtaining of the maintenance work order corpus specifically comprises the following steps:
and for each rail transit maintenance work order in the specified number of rail transit maintenance work orders, processing the rail transit maintenance work orders based on a sentence cleaning method and a settlement word segmentation method to obtain the maintenance work order corpus.
The embodiment of the invention also provides a device for extracting the keyword of the rail transit maintenance work order, which comprises the following components: the system comprises a maintenance work order corpus obtaining module, an actual weight determining module and a keyword determining module. Wherein,
the maintenance work order corpus acquisition module is used for acquiring a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
the actual weight determining module is used for determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and the keyword determining module is used for determining the rail transit maintenance work order keywords based on the actual weight of each word in the maintenance work order corpus.
The embodiment of the invention also provides electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the steps of any one of the above rail transit maintenance work order keyword extraction methods are realized.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the above-mentioned track traffic repair worksheet keyword extraction methods.
The rail transit maintenance work order keyword extraction method and device provided by the embodiment of the invention comprise the steps of firstly obtaining a maintenance work order corpus, and calculating the product of the word frequency of each word in the corpus and the reverse file frequency; then, determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; and finally, determining the key words of the rail transit maintenance work order according to the actual weight of each word in the maintenance work order corpus. Make full use of track traffic maintenance work order data, processed through track traffic maintenance work order data and determined track traffic maintenance work order keyword, and then can classify the track traffic maintenance work order through the keyword, problem that appears in the more quick definite track traffic maintenance work order and the reason of finding the problem.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for extracting keywords from a rail transit maintenance work order according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a device for extracting keywords from a rail transit maintenance work order according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for extracting keywords from a rail transit repair work order provided in an embodiment of the present invention, and as shown in fig. 1, the method includes:
s1, obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
s2, determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and S3, determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
Specifically, the method for extracting the key words of the rail transit maintenance work order provided by the embodiment of the invention aims to determine the key words of the rail transit maintenance work order through big data.
Step S1 is performed first. And acquiring a maintenance work order corpus, wherein the maintenance work order corpus comprises all words appearing in the specified number of rail transit maintenance work orders. The specified number can be set as required, and a large number can be considered in the embodiment of the present invention. The words can be words, or can be new words formed by fusing a plurality of adjacent words with similar word frequencies, such as words like "west vertical gate", "subway station", "platform", and the like, and new words like "west vertical gate subway station platform".
And calculating the word frequency (TF) of each word and the inverse file frequency (IDF) of each word in the maintenance work order corpus. The term frequency refers to the frequency of terms existing in the repair work order corpus appearing in a certain rail transit repair work order. In consideration of the word characteristics of rail transit, the words in the corpus are allowed to be fused before and after to form new words in the corpus. For example, the term "west vertical subway station platform" may be divided into three terms, that is, "west vertical subway station," subway station, "and" platform. However, the single appearance of the three words cannot effectively represent the position of the rail transit subway station. By adopting the fusion method, the words before and after the subway station can be fused together by taking the word of the subway station as an example, so that a new word of the west-straight-door subway station platform is formed, and the information content represented by the words is improved. Word frequency TF of ith word in maintenance work order corpusiThe specific calculation method comprises the following steps:
for example, the term "west vertical subway station" appears 3 times in a certain track traffic repair order, which has a total of 1000 words. Then the word frequency for "west straightaway subway station" is 0.003.
In the process of calculating the word frequency, it can be seen that some common words have little use for extracting the topic keywords, but there may be a large word frequency, and some words with a low frequency of occurrence can express the topics of the articles. This deviates from the main purpose of keyword extraction, that is, the stronger the work order theme can be embodied by the words in a certain rail transit thesaurus, the heavier the weight is, and conversely, the smaller the original intention is. If a large number of work orders are formed into a work order library, some words may appear in only a few articles, but such words can effectively reflect the subject of the articles, and the words should be assigned a larger weight. In view of this, the reverse file frequency of each word in the rail transit work order library should be calculated.
Inverse file frequency IDFiThe calculation method comprises the following steps:
if the number of the rail transit repair work orders containing a certain word is more, the reverse file frequency of the word is smaller; if the number of the rail transit repair work orders containing a certain term is smaller, the reverse file frequency of the term is higher. The purpose of the denominator +1 is to prevent a word in the repair work order corpus from not appearing in a rail transit repair work order, so that the reverse file frequency value of the word is infinitely small.
After the word frequencies and the reverse file frequencies of all the words in the rail transit work list library are obtained through calculation, the word frequency-reverse file frequency (TF-IDF) of each word needs to be calculated. The TF-IDF combines the frequency of words in the rail transit maintenance work order and the reverse file frequency of the work order in the whole rail transit work order library, common words can be filtered out, and important words are reserved as keywords.
TF-IDF of ith wordiThe calculation method comprises the following steps:
TF-IDFi=TF*IDF;
as can be seen from the above formula, TF-IDF of the ith word is calculatediIn the process of (2), no relevant weighting is performed on any word, that is, all words except for the calculation of the word frequency and the inverse document frequency are considered to have equal importance. However, it can be seen from the scene of extracting the key words of the rail transit repair work order that obviously different words in the rail transit repair work order have different importance. For example, the term "point switch" is obviously more targeted than the term "subway station", and auxiliary information should be added to give higher weight to various proper nouns, so that the extracted keywords are more reasonable. Therefore, it is necessary to improve the calculation method of the TF-IDF, and properly reshape and reform the above formula in combination with the actual application scenario, so that the formula is more suitable for the application scenario of the rail transit repair work order.
Therefore, in the embodiment of the present invention, when step S2 is executed, an initial weight of each word in the repair work order corpus is introduced, and the initial weight is determined based on the importance of each word in the repair work order corpus. And combining the initial weight of each word with TF-IDF to obtain the actual weight of each word in the maintenance work order corpus.
And finally, executing a step S3, and determining the key words of the rail transit maintenance work order according to the actual weight of each word in the maintenance work order corpus. Specifically, the words can be sorted according to the actual weight of each word in the maintenance work order corpus, and a plurality of words with larger actual weights are selected as the key words of the rail transit maintenance work order.
The rail transit maintenance work order keyword extraction method provided by the embodiment of the invention comprises the steps of firstly obtaining a maintenance work order corpus, and calculating the product of the word frequency of each word in the corpus and the reverse file frequency; then, determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; and finally, determining the key words of the rail transit maintenance work order according to the actual weight of each word in the maintenance work order corpus. Make full use of track traffic maintenance work order data, processed through track traffic maintenance work order data and determined track traffic maintenance work order keyword, and then can classify the track traffic maintenance work order through the keyword, problem that appears in the more quick definite track traffic maintenance work order and the reason of finding the problem.
On the basis of the above embodiment, the method for extracting keywords of the rail transit repair work order provided in the embodiment of the present invention further includes:
receiving a keyword to be inquired, and determining all alternative rail transit maintenance work orders containing the keyword of the rail transit maintenance work order to be inquired;
and calculating the weight of each alternative rail transit maintenance work order, and taking the alternative rail transit maintenance work order with higher weight as the rail transit maintenance work order corresponding to the keyword to be inquired.
Specifically, the method for extracting the key words of the rail transit maintenance work order provided by the embodiment of the invention can determine the matched work order through the key words input by the maintenance personnel on the basis of determining the key words of the rail transit maintenance work order.
Firstly, receiving a keyword to be inquired, wherein the keyword to be inquired is a keyword input by a maintenance worker. And determining the rail transit maintenance work orders containing the keywords to be inquired in the rail transit maintenance work orders with the specified number, and taking the rail transit maintenance work orders as alternative rail transit maintenance work orders.
And then calculating the weight of each alternative rail transit maintenance work order, specifically determining the word frequency of the keyword to be inquired in each alternative rail transit maintenance work order, and taking the word frequency as the weight of the alternative rail transit maintenance work order.
And finally, taking the alternative rail transit maintenance work order with higher weight as the rail transit maintenance work order corresponding to the keyword to be inquired, and pushing the alternative rail transit maintenance work order to maintenance personnel to realize keyword inquiry matching.
In the embodiment of the invention, the rail transit maintenance work order corresponding to the keyword to be inquired is determined, so that help can be provided for maintenance personnel to quickly determine the cause of the problem.
On the basis of the above embodiment, the method for extracting the key words of the rail transit repair work order provided in the embodiment of the present invention determines the actual weight of each word in the repair work order corpus based on the initial weight of each word in the repair work order corpus and the product of the word frequency of each word and the reverse file frequency, and specifically includes:
and determining the actual weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter.
Specifically, in the embodiment of the present invention, when determining the actual weight of each word in the repair work order corpus, the intermediate weight may be determined by the following formula based on the initial weight and the product of the word frequency and the inverse file frequency of each word, in combination with the tuning parameter:
wherein q isiIs the intermediate weight of the ith word, TF is the word frequency of the ith word, IDF is the inverse file frequency of the ith word, k is the tuning parameter,is the initial weight of the ith word.
This intermediate weight is then taken as the actual weight of the ith word.
On the basis of the embodiment, the initial weight of the rail transit repair order keyword extraction method provided by the embodiment of the invention can be determined by weight scoring according to the importance of each word by an expert, namely, the initial weight of each word in the repair order corpus is determined by adopting an expert scoring method. The weights of all the words are scored to form a weight matrix W, and the specific form of the weight matrix W is as follows:
where m is the number of experts, n is the number of all words in the repair worksheet corpus, wijIndicating that the jth expert scored the weight of the ith word. By the method, the weight of the keywords in the rail transit field, especially the maintenance direction, can be effectively improved.
The initial weight of the jth word in the repair order corpus may be expressed as:
in the embodiment of the invention, the special words of the rail transit in the repair work order corpus are given higher weight scores, and the special words of the non-rail transit have lower weight scores.
On the basis of the foregoing embodiment, the method for extracting keywords of a track traffic repair work order provided in the embodiment of the present invention determines the actual weight based on the initial weight and a product of a word frequency and a reverse file frequency of each word in combination with a tuning parameter, and specifically includes:
determining an intermediate weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter;
and based on a maximum and minimum normalization method, performing normalization processing on the intermediate weight to obtain the actual weight.
Specifically, when the actual weight is determined in the embodiment of the present invention, the intermediate weight may be normalized by combining a maximum and minimum normalization method on the basis of determining the intermediate weight, so as to obtain the final actual weight.
The actual weight may specifically be determined by the following formula:
wherein q isi-finIs the actual weight of the ith word, qiIntermediate weight for the ith word, qminIs the minimum of all intermediate weights, qmaxIs the maximum of all intermediate weights.
On the basis of the foregoing embodiment, the method for extracting keywords of a track traffic repair work order provided in the embodiment of the present invention includes the following steps:
and for each rail transit maintenance work order in the specified number of rail transit maintenance work orders, processing the rail transit maintenance work orders based on a sentence cleaning method and a settlement word segmentation method to obtain the maintenance work order corpus.
Specifically, when the repair work order corpus is obtained, each of the specified number of rail transit repair work orders may be used as a sample, and a sentence cleaning method is adopted to clean special characters, such as periods, colons, commas, and the like. And common words such as "is", "has", etc. are eliminated. And then, calling a jieba (jieba) word segmentation method, converting the rail transit maintenance work order data from the unstructured text data into a standard single word for storage, and constructing a maintenance work order corpus.
Fig. 2 is a schematic structural diagram of a track traffic repair work order keyword extraction device provided in an embodiment of the present invention, and as shown in fig. 2, the device includes: a repair work order corpus acquisition module 21, an actual weight determination module 22, and a keyword determination module 23. Wherein,
the maintenance work order corpus acquiring module 21 is configured to acquire a maintenance work order corpus, calculate a word frequency of each word and a reverse file frequency of each word in the corpus, and determine a product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
the actual weight determining module 22 is configured to determine an actual weight of each word in the repair work order corpus based on the initial weight of each word in the repair work order corpus and a product of a word frequency of each word and a reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
the keyword determining module 23 is configured to determine a rail transit repair work order keyword based on the actual weight of each word in the repair work order corpus.
Specifically, the functions of the modules in the rail transit repair work order keyword extraction device provided in the embodiment of the present invention correspond to the operation flows of the steps in the above method embodiments one to one, and the implementation effects are also consistent.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform a rail transit repair order keyword extraction method, including: obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders; determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus; and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the track traffic repair order keyword extraction method provided by the above-mentioned method embodiments, including: obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders; determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus; and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the track traffic repair work order keyword extraction method provided in the foregoing embodiments, and the method includes: obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders; determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus; and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.
Claims (10)
1. A rail transit maintenance work order keyword extraction method is characterized by comprising the following steps:
obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
2. The rail transit repair work order keyword extraction method according to claim 1, further comprising:
receiving a keyword to be inquired, and determining all alternative rail transit maintenance work orders containing the keyword of the rail transit maintenance work order to be inquired;
and calculating the weight of each alternative rail transit maintenance work order, and taking the alternative rail transit maintenance work order with higher weight as the rail transit maintenance work order corresponding to the keyword to be inquired.
3. The method for extracting keyword of track traffic maintenance work order according to claim 1, wherein the determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the inverse document frequency specifically comprises:
and determining the actual weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter.
4. The rail transit maintenance work order keyword extraction method according to claim 3, wherein the determining the actual weight based on the initial weight and a product of a word frequency and an inverse file frequency of each word in combination with a tuning parameter specifically comprises:
determining an intermediate weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter;
and based on a maximum and minimum normalization method, performing normalization processing on the intermediate weight to obtain the actual weight.
5. The rail transit maintenance work order keyword extraction method according to claim 4, wherein the determining of the intermediate weight based on the initial weight and the product of the word frequency and the inverse file frequency of each word in combination with a tuning parameter specifically comprises:
determining the intermediate weight by the following formula:
6. The rail transit maintenance work order keyword extraction method according to claim 4, wherein the normalizing the intermediate weight based on the maximum and minimum normalization method to obtain the actual weight specifically comprises:
determining the actual weights by:
wherein q isi-finalIs the actual weight of the ith word, qiIntermediate weight for the ith word, qminIs the minimum of all intermediate weights, qmaxIs the maximum of all intermediate weights.
7. The rail transit repair order keyword extraction method according to any one of claims 1 to 6, wherein the obtaining of the repair order corpus specifically includes:
and for each rail transit maintenance work order in the specified number of rail transit maintenance work orders, processing the rail transit maintenance work orders based on a sentence cleaning method and a settlement word segmentation method to obtain the maintenance work order corpus.
8. The utility model provides a track traffic maintenance work order keyword extraction element which characterized in that includes:
the maintenance work order corpus acquiring module is used for acquiring a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
the actual weight determining module is used for determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and the keyword determining module is used for determining the rail transit maintenance work order keywords based on the actual weight of each word in the maintenance work order corpus.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the rail transit service order keyword extraction method according to any one of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for extracting keywords for a track traffic repair worksheet as claimed in any one of claims 1 to 7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011015127.2A CN112214990A (en) | 2020-09-24 | 2020-09-24 | Method and device for extracting key words of rail transit maintenance work order |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202011015127.2A CN112214990A (en) | 2020-09-24 | 2020-09-24 | Method and device for extracting key words of rail transit maintenance work order |
Publications (1)
Publication Number | Publication Date |
---|---|
CN112214990A true CN112214990A (en) | 2021-01-12 |
Family
ID=74051876
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202011015127.2A Pending CN112214990A (en) | 2020-09-24 | 2020-09-24 | Method and device for extracting key words of rail transit maintenance work order |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112214990A (en) |
-
2020
- 2020-09-24 CN CN202011015127.2A patent/CN112214990A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109325165B (en) | Network public opinion analysis method, device and storage medium | |
US11748416B2 (en) | Machine-learning system for servicing queries for digital content | |
CN104408093B (en) | A kind of media event key element abstracting method and device | |
CN109145216A (en) | Network public-opinion monitoring method, device and storage medium | |
EP2581868A2 (en) | Systems and methods for managing publication of online advertisements | |
WO2010038540A1 (en) | System for extracting term from document containing text segment | |
CN106126619A (en) | A kind of video retrieval method based on video content and system | |
US10824816B2 (en) | Semantic parsing method and apparatus | |
CN101630312A (en) | Clustering method for question sentences in question-and-answer platform and system thereof | |
CN112686022A (en) | Method and device for detecting illegal corpus, computer equipment and storage medium | |
CN109858626B (en) | Knowledge base construction method and device | |
US20200134264A1 (en) | Method for Updating a Knowledge Base of a Sentiment Analysis System | |
CN109992653A (en) | Information processing method and processing system | |
US9514496B2 (en) | System for management of sentiments and methods thereof | |
US20230367821A1 (en) | Machine-learning system for servicing queries for digital content | |
CN110610003B (en) | Method and system for assisting text annotation | |
WO2021121252A1 (en) | Comment-based behavior prediction | |
CN107545505B (en) | Method and system for identifying insurance financing product information | |
CN111782793A (en) | Intelligent customer service processing method, system and equipment | |
CN114663067A (en) | Job matching method, system, equipment and medium | |
CN115168562A (en) | Method, device, equipment and medium for constructing intelligent question-answering system | |
CN113240322B (en) | Climate risk disclosure quality method, apparatus, electronic device, and storage medium | |
Ahiladas et al. | Ruchi: Rating individual food items in restaurant reviews | |
Suryaningrum | Comparison of the TF-IDF method with the count vectorizer to classify hate speech | |
CN112487211B (en) | Rail transit knowledge base construction method and system |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |