CN112214990A - Method and device for extracting key words of rail transit maintenance work order - Google Patents

Method and device for extracting key words of rail transit maintenance work order Download PDF

Info

Publication number
CN112214990A
CN112214990A CN202011015127.2A CN202011015127A CN112214990A CN 112214990 A CN112214990 A CN 112214990A CN 202011015127 A CN202011015127 A CN 202011015127A CN 112214990 A CN112214990 A CN 112214990A
Authority
CN
China
Prior art keywords
word
maintenance work
work order
corpus
rail transit
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011015127.2A
Other languages
Chinese (zh)
Inventor
李振
包峰
罗铭
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Traffic Control Technology TCT Co Ltd
Original Assignee
Traffic Control Technology TCT Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Traffic Control Technology TCT Co Ltd filed Critical Traffic Control Technology TCT Co Ltd
Priority to CN202011015127.2A priority Critical patent/CN112214990A/en
Publication of CN112214990A publication Critical patent/CN112214990A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Probability & Statistics with Applications (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a method and a device for extracting keywords of a rail transit maintenance work order, which comprises the steps of firstly obtaining a maintenance work order corpus, and calculating the product of the word frequency of each word in the corpus and the reverse file frequency; then, determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; and finally, determining the key words of the rail transit maintenance work order according to the actual weight of each word in the maintenance work order corpus. Make full use of track traffic maintenance work order data, processed through track traffic maintenance work order data and determined track traffic maintenance work order keyword, and then can classify the track traffic maintenance work order through the keyword, problem that appears in the more quick definite track traffic maintenance work order and the reason of finding the problem.

Description

Method and device for extracting key words of rail transit maintenance work order
Technical Field
The invention relates to the technical field of rail transit, in particular to a method and a device for extracting keywords of a rail transit maintenance work order.
Background
The rail transit technology develops rapidly. From the maintenance perspective, operation and maintenance are important links of safe operation of the subway system. In the rail transit maintenance process, a method of manually recording work orders is generally adopted, and the work order accumulation amount is increased in the long-term subway operation maintenance process. The work order data records the reasons of fault occurrence, the description of fault occurrence and the solution of fault, and has larger information amount.
The data provided by the track device conforms to big data features. The data is various, and different track devices generate various data with complicated contents in different life cycle times. The data format is various, and includes traditional paper data such as manual line check record book, track check instrument format data, vehicle-mounted control system (VOBC), driving record data, and so on. Data structures are diverse, including structured data, semi-structured data, unstructured data, and so forth. The data scale is huge, the rail inspection vehicle detection data is taken as an example, the detection is carried out twice per month according to a positive line, corresponding waveform data and overrun data are stored in a database, and the data volume of one year can reach 9 TB.
However, work order data is not currently being utilized efficiently, and if such a large amount of data is not being utilized efficiently, there is no benefit in determining the cause of the fault.
Disclosure of Invention
The embodiment of the invention provides a method and a device for extracting keywords of a rail transit maintenance work order, which are used for overcoming the defects in the prior art.
The embodiment of the invention provides a method for extracting keywords of a rail transit maintenance work order, which comprises the following steps:
obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
The rail transit maintenance work order keyword extraction method according to one embodiment of the invention further comprises the following steps:
receiving a keyword to be inquired, and determining all alternative rail transit maintenance work orders containing the keyword of the rail transit maintenance work order to be inquired;
and calculating the weight of each alternative rail transit maintenance work order, and taking the alternative rail transit maintenance work order with higher weight as the rail transit maintenance work order corresponding to the keyword to be inquired.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the actual weight of each word in the maintenance work order corpus is determined based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency, and the method specifically comprises the following steps:
and determining the actual weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the actual weight is determined based on the initial weight and the product of the word frequency and the reverse file frequency of each word by combining with a tuning parameter, and the method specifically comprises the following steps:
determining an intermediate weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter;
and based on a maximum and minimum normalization method, performing normalization processing on the intermediate weight to obtain the actual weight.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the intermediate weight is determined based on the initial weight and the product of the word frequency and the reverse file frequency of each word by combining with a tuning parameter, and the method specifically comprises the following steps:
determining the intermediate weight by the following formula:
Figure BDA0002698794070000031
wherein q isiIs the intermediate weight of the ith word, TF is the word frequency of the ith word, IDF is the inverse file frequency of the ith word, k is the tuning parameter,
Figure BDA0002698794070000032
is the initial weight of the ith word.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the intermediate weight is normalized based on the maximum and minimum normalization method to obtain the actual weight, and the method specifically comprises the following steps:
determining the actual weights by:
Figure BDA0002698794070000033
wherein q isi-finalIs the actual weight of the ith word, qiIntermediate weight for the ith word, qminIs the minimum of all intermediate weights, qmaxIs the maximum of all intermediate weights.
According to the rail transit maintenance work order keyword extraction method provided by the embodiment of the invention, the obtaining of the maintenance work order corpus specifically comprises the following steps:
and for each rail transit maintenance work order in the specified number of rail transit maintenance work orders, processing the rail transit maintenance work orders based on a sentence cleaning method and a settlement word segmentation method to obtain the maintenance work order corpus.
The embodiment of the invention also provides a device for extracting the keyword of the rail transit maintenance work order, which comprises the following components: the system comprises a maintenance work order corpus obtaining module, an actual weight determining module and a keyword determining module. Wherein,
the maintenance work order corpus acquisition module is used for acquiring a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
the actual weight determining module is used for determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and the keyword determining module is used for determining the rail transit maintenance work order keywords based on the actual weight of each word in the maintenance work order corpus.
The embodiment of the invention also provides electronic equipment which comprises a memory, a processor and a computer program which is stored on the memory and can run on the processor, wherein when the processor executes the program, the steps of any one of the above rail transit maintenance work order keyword extraction methods are realized.
An embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the steps of any one of the above-mentioned track traffic repair worksheet keyword extraction methods.
The rail transit maintenance work order keyword extraction method and device provided by the embodiment of the invention comprise the steps of firstly obtaining a maintenance work order corpus, and calculating the product of the word frequency of each word in the corpus and the reverse file frequency; then, determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; and finally, determining the key words of the rail transit maintenance work order according to the actual weight of each word in the maintenance work order corpus. Make full use of track traffic maintenance work order data, processed through track traffic maintenance work order data and determined track traffic maintenance work order keyword, and then can classify the track traffic maintenance work order through the keyword, problem that appears in the more quick definite track traffic maintenance work order and the reason of finding the problem.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.
Fig. 1 is a schematic flowchart of a method for extracting keywords from a rail transit maintenance work order according to an embodiment of the present invention;
fig. 2 is a schematic structural diagram of a device for extracting keywords from a rail transit maintenance work order according to an embodiment of the present invention;
fig. 3 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Fig. 1 is a schematic flow chart of a method for extracting keywords from a rail transit repair work order provided in an embodiment of the present invention, and as shown in fig. 1, the method includes:
s1, obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
s2, determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and S3, determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
Specifically, the method for extracting the key words of the rail transit maintenance work order provided by the embodiment of the invention aims to determine the key words of the rail transit maintenance work order through big data.
Step S1 is performed first. And acquiring a maintenance work order corpus, wherein the maintenance work order corpus comprises all words appearing in the specified number of rail transit maintenance work orders. The specified number can be set as required, and a large number can be considered in the embodiment of the present invention. The words can be words, or can be new words formed by fusing a plurality of adjacent words with similar word frequencies, such as words like "west vertical gate", "subway station", "platform", and the like, and new words like "west vertical gate subway station platform".
And calculating the word frequency (TF) of each word and the inverse file frequency (IDF) of each word in the maintenance work order corpus. The term frequency refers to the frequency of terms existing in the repair work order corpus appearing in a certain rail transit repair work order. In consideration of the word characteristics of rail transit, the words in the corpus are allowed to be fused before and after to form new words in the corpus. For example, the term "west vertical subway station platform" may be divided into three terms, that is, "west vertical subway station," subway station, "and" platform. However, the single appearance of the three words cannot effectively represent the position of the rail transit subway station. By adopting the fusion method, the words before and after the subway station can be fused together by taking the word of the subway station as an example, so that a new word of the west-straight-door subway station platform is formed, and the information content represented by the words is improved. Word frequency TF of ith word in maintenance work order corpusiThe specific calculation method comprises the following steps:
Figure BDA0002698794070000061
for example, the term "west vertical subway station" appears 3 times in a certain track traffic repair order, which has a total of 1000 words. Then the word frequency for "west straightaway subway station" is 0.003.
In the process of calculating the word frequency, it can be seen that some common words have little use for extracting the topic keywords, but there may be a large word frequency, and some words with a low frequency of occurrence can express the topics of the articles. This deviates from the main purpose of keyword extraction, that is, the stronger the work order theme can be embodied by the words in a certain rail transit thesaurus, the heavier the weight is, and conversely, the smaller the original intention is. If a large number of work orders are formed into a work order library, some words may appear in only a few articles, but such words can effectively reflect the subject of the articles, and the words should be assigned a larger weight. In view of this, the reverse file frequency of each word in the rail transit work order library should be calculated.
Inverse file frequency IDFiThe calculation method comprises the following steps:
Figure BDA0002698794070000071
if the number of the rail transit repair work orders containing a certain word is more, the reverse file frequency of the word is smaller; if the number of the rail transit repair work orders containing a certain term is smaller, the reverse file frequency of the term is higher. The purpose of the denominator +1 is to prevent a word in the repair work order corpus from not appearing in a rail transit repair work order, so that the reverse file frequency value of the word is infinitely small.
After the word frequencies and the reverse file frequencies of all the words in the rail transit work list library are obtained through calculation, the word frequency-reverse file frequency (TF-IDF) of each word needs to be calculated. The TF-IDF combines the frequency of words in the rail transit maintenance work order and the reverse file frequency of the work order in the whole rail transit work order library, common words can be filtered out, and important words are reserved as keywords.
TF-IDF of ith wordiThe calculation method comprises the following steps:
TF-IDFi=TF*IDF;
as can be seen from the above formula, TF-IDF of the ith word is calculatediIn the process of (2), no relevant weighting is performed on any word, that is, all words except for the calculation of the word frequency and the inverse document frequency are considered to have equal importance. However, it can be seen from the scene of extracting the key words of the rail transit repair work order that obviously different words in the rail transit repair work order have different importance. For example, the term "point switch" is obviously more targeted than the term "subway station", and auxiliary information should be added to give higher weight to various proper nouns, so that the extracted keywords are more reasonable. Therefore, it is necessary to improve the calculation method of the TF-IDF, and properly reshape and reform the above formula in combination with the actual application scenario, so that the formula is more suitable for the application scenario of the rail transit repair work order.
Therefore, in the embodiment of the present invention, when step S2 is executed, an initial weight of each word in the repair work order corpus is introduced, and the initial weight is determined based on the importance of each word in the repair work order corpus. And combining the initial weight of each word with TF-IDF to obtain the actual weight of each word in the maintenance work order corpus.
And finally, executing a step S3, and determining the key words of the rail transit maintenance work order according to the actual weight of each word in the maintenance work order corpus. Specifically, the words can be sorted according to the actual weight of each word in the maintenance work order corpus, and a plurality of words with larger actual weights are selected as the key words of the rail transit maintenance work order.
The rail transit maintenance work order keyword extraction method provided by the embodiment of the invention comprises the steps of firstly obtaining a maintenance work order corpus, and calculating the product of the word frequency of each word in the corpus and the reverse file frequency; then, determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; and finally, determining the key words of the rail transit maintenance work order according to the actual weight of each word in the maintenance work order corpus. Make full use of track traffic maintenance work order data, processed through track traffic maintenance work order data and determined track traffic maintenance work order keyword, and then can classify the track traffic maintenance work order through the keyword, problem that appears in the more quick definite track traffic maintenance work order and the reason of finding the problem.
On the basis of the above embodiment, the method for extracting keywords of the rail transit repair work order provided in the embodiment of the present invention further includes:
receiving a keyword to be inquired, and determining all alternative rail transit maintenance work orders containing the keyword of the rail transit maintenance work order to be inquired;
and calculating the weight of each alternative rail transit maintenance work order, and taking the alternative rail transit maintenance work order with higher weight as the rail transit maintenance work order corresponding to the keyword to be inquired.
Specifically, the method for extracting the key words of the rail transit maintenance work order provided by the embodiment of the invention can determine the matched work order through the key words input by the maintenance personnel on the basis of determining the key words of the rail transit maintenance work order.
Firstly, receiving a keyword to be inquired, wherein the keyword to be inquired is a keyword input by a maintenance worker. And determining the rail transit maintenance work orders containing the keywords to be inquired in the rail transit maintenance work orders with the specified number, and taking the rail transit maintenance work orders as alternative rail transit maintenance work orders.
And then calculating the weight of each alternative rail transit maintenance work order, specifically determining the word frequency of the keyword to be inquired in each alternative rail transit maintenance work order, and taking the word frequency as the weight of the alternative rail transit maintenance work order.
And finally, taking the alternative rail transit maintenance work order with higher weight as the rail transit maintenance work order corresponding to the keyword to be inquired, and pushing the alternative rail transit maintenance work order to maintenance personnel to realize keyword inquiry matching.
In the embodiment of the invention, the rail transit maintenance work order corresponding to the keyword to be inquired is determined, so that help can be provided for maintenance personnel to quickly determine the cause of the problem.
On the basis of the above embodiment, the method for extracting the key words of the rail transit repair work order provided in the embodiment of the present invention determines the actual weight of each word in the repair work order corpus based on the initial weight of each word in the repair work order corpus and the product of the word frequency of each word and the reverse file frequency, and specifically includes:
and determining the actual weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter.
Specifically, in the embodiment of the present invention, when determining the actual weight of each word in the repair work order corpus, the intermediate weight may be determined by the following formula based on the initial weight and the product of the word frequency and the inverse file frequency of each word, in combination with the tuning parameter:
Figure BDA0002698794070000101
wherein q isiIs the intermediate weight of the ith word, TF is the word frequency of the ith word, IDF is the inverse file frequency of the ith word, k is the tuning parameter,
Figure BDA0002698794070000102
is the initial weight of the ith word.
This intermediate weight is then taken as the actual weight of the ith word.
On the basis of the embodiment, the initial weight of the rail transit repair order keyword extraction method provided by the embodiment of the invention can be determined by weight scoring according to the importance of each word by an expert, namely, the initial weight of each word in the repair order corpus is determined by adopting an expert scoring method. The weights of all the words are scored to form a weight matrix W, and the specific form of the weight matrix W is as follows:
Figure BDA0002698794070000103
where m is the number of experts, n is the number of all words in the repair worksheet corpus, wijIndicating that the jth expert scored the weight of the ith word. By the method, the weight of the keywords in the rail transit field, especially the maintenance direction, can be effectively improved.
The initial weight of the jth word in the repair order corpus may be expressed as:
Figure BDA0002698794070000104
in the embodiment of the invention, the special words of the rail transit in the repair work order corpus are given higher weight scores, and the special words of the non-rail transit have lower weight scores.
On the basis of the foregoing embodiment, the method for extracting keywords of a track traffic repair work order provided in the embodiment of the present invention determines the actual weight based on the initial weight and a product of a word frequency and a reverse file frequency of each word in combination with a tuning parameter, and specifically includes:
determining an intermediate weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter;
and based on a maximum and minimum normalization method, performing normalization processing on the intermediate weight to obtain the actual weight.
Specifically, when the actual weight is determined in the embodiment of the present invention, the intermediate weight may be normalized by combining a maximum and minimum normalization method on the basis of determining the intermediate weight, so as to obtain the final actual weight.
The actual weight may specifically be determined by the following formula:
Figure BDA0002698794070000111
wherein q isi-finIs the actual weight of the ith word, qiIntermediate weight for the ith word, qminIs the minimum of all intermediate weights, qmaxIs the maximum of all intermediate weights.
On the basis of the foregoing embodiment, the method for extracting keywords of a track traffic repair work order provided in the embodiment of the present invention includes the following steps:
and for each rail transit maintenance work order in the specified number of rail transit maintenance work orders, processing the rail transit maintenance work orders based on a sentence cleaning method and a settlement word segmentation method to obtain the maintenance work order corpus.
Specifically, when the repair work order corpus is obtained, each of the specified number of rail transit repair work orders may be used as a sample, and a sentence cleaning method is adopted to clean special characters, such as periods, colons, commas, and the like. And common words such as "is", "has", etc. are eliminated. And then, calling a jieba (jieba) word segmentation method, converting the rail transit maintenance work order data from the unstructured text data into a standard single word for storage, and constructing a maintenance work order corpus.
Fig. 2 is a schematic structural diagram of a track traffic repair work order keyword extraction device provided in an embodiment of the present invention, and as shown in fig. 2, the device includes: a repair work order corpus acquisition module 21, an actual weight determination module 22, and a keyword determination module 23. Wherein,
the maintenance work order corpus acquiring module 21 is configured to acquire a maintenance work order corpus, calculate a word frequency of each word and a reverse file frequency of each word in the corpus, and determine a product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
the actual weight determining module 22 is configured to determine an actual weight of each word in the repair work order corpus based on the initial weight of each word in the repair work order corpus and a product of a word frequency of each word and a reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
the keyword determining module 23 is configured to determine a rail transit repair work order keyword based on the actual weight of each word in the repair work order corpus.
Specifically, the functions of the modules in the rail transit repair work order keyword extraction device provided in the embodiment of the present invention correspond to the operation flows of the steps in the above method embodiments one to one, and the implementation effects are also consistent.
Fig. 3 illustrates a physical structure diagram of an electronic device, which may include, as shown in fig. 3: a processor (processor)310, a communication Interface (communication Interface)320, a memory (memory)330 and a communication bus 340, wherein the processor 310, the communication Interface 320 and the memory 330 communicate with each other via the communication bus 340. The processor 310 may call logic instructions in the memory 330 to perform a rail transit repair order keyword extraction method, including: obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders; determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus; and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
In addition, the logic instructions in the memory 330 may be implemented in the form of software functional units and stored in a computer readable storage medium when the software functional units are sold or used as independent products. Based on such understanding, the technical solution of the present invention may be embodied in the form of a software product, which is stored in a storage medium and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present invention. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.
In another aspect, an embodiment of the present invention further provides a computer program product, where the computer program product includes a computer program stored on a non-transitory computer-readable storage medium, where the computer program includes program instructions, and when the program instructions are executed by a computer, the computer is capable of executing the track traffic repair order keyword extraction method provided by the above-mentioned method embodiments, including: obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders; determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus; and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
In another aspect, an embodiment of the present invention further provides a non-transitory computer-readable storage medium, on which a computer program is stored, where the computer program is implemented by a processor to perform the track traffic repair work order keyword extraction method provided in the foregoing embodiments, and the method includes: obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders; determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus; and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.
Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.
Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims (10)

1. A rail transit maintenance work order keyword extraction method is characterized by comprising the following steps:
obtaining a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and determining the key words of the rail transit maintenance work order based on the actual weight of each word in the maintenance work order corpus.
2. The rail transit repair work order keyword extraction method according to claim 1, further comprising:
receiving a keyword to be inquired, and determining all alternative rail transit maintenance work orders containing the keyword of the rail transit maintenance work order to be inquired;
and calculating the weight of each alternative rail transit maintenance work order, and taking the alternative rail transit maintenance work order with higher weight as the rail transit maintenance work order corresponding to the keyword to be inquired.
3. The method for extracting keyword of track traffic maintenance work order according to claim 1, wherein the determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the inverse document frequency specifically comprises:
and determining the actual weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter.
4. The rail transit maintenance work order keyword extraction method according to claim 3, wherein the determining the actual weight based on the initial weight and a product of a word frequency and an inverse file frequency of each word in combination with a tuning parameter specifically comprises:
determining an intermediate weight based on the initial weight and the product of the word frequency and the reverse file frequency of each word in combination with a tuning parameter;
and based on a maximum and minimum normalization method, performing normalization processing on the intermediate weight to obtain the actual weight.
5. The rail transit maintenance work order keyword extraction method according to claim 4, wherein the determining of the intermediate weight based on the initial weight and the product of the word frequency and the inverse file frequency of each word in combination with a tuning parameter specifically comprises:
determining the intermediate weight by the following formula:
Figure FDA0002698794060000021
wherein q isiIs the intermediate weight of the ith word, TF is the word frequency of the ith word, IDF is the inverse file frequency of the ith word, k is the tuning parameter,
Figure FDA0002698794060000022
is the initial weight of the ith word.
6. The rail transit maintenance work order keyword extraction method according to claim 4, wherein the normalizing the intermediate weight based on the maximum and minimum normalization method to obtain the actual weight specifically comprises:
determining the actual weights by:
Figure FDA0002698794060000023
wherein q isi-finalIs the actual weight of the ith word, qiIntermediate weight for the ith word, qminIs the minimum of all intermediate weights, qmaxIs the maximum of all intermediate weights.
7. The rail transit repair order keyword extraction method according to any one of claims 1 to 6, wherein the obtaining of the repair order corpus specifically includes:
and for each rail transit maintenance work order in the specified number of rail transit maintenance work orders, processing the rail transit maintenance work orders based on a sentence cleaning method and a settlement word segmentation method to obtain the maintenance work order corpus.
8. The utility model provides a track traffic maintenance work order keyword extraction element which characterized in that includes:
the maintenance work order corpus acquiring module is used for acquiring a maintenance work order corpus, calculating the word frequency of each word and the reverse file frequency of each word in the corpus, and determining the product of the word frequency of each word and the reverse file frequency; the maintenance work order corpus is a set containing words appearing in a specified number of rail transit maintenance work orders;
the actual weight determining module is used for determining the actual weight of each word in the maintenance work order corpus based on the initial weight of each word in the maintenance work order corpus and the product of the word frequency of each word and the reverse file frequency; the initial weight is determined based on the importance of each word in the maintenance work order corpus;
and the keyword determining module is used for determining the rail transit maintenance work order keywords based on the actual weight of each word in the maintenance work order corpus.
9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor implements the steps of the rail transit service order keyword extraction method according to any one of claims 1 to 7 when executing the program.
10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when executed by a processor, implements the steps of the method for extracting keywords for a track traffic repair worksheet as claimed in any one of claims 1 to 7.
CN202011015127.2A 2020-09-24 2020-09-24 Method and device for extracting key words of rail transit maintenance work order Pending CN112214990A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011015127.2A CN112214990A (en) 2020-09-24 2020-09-24 Method and device for extracting key words of rail transit maintenance work order

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011015127.2A CN112214990A (en) 2020-09-24 2020-09-24 Method and device for extracting key words of rail transit maintenance work order

Publications (1)

Publication Number Publication Date
CN112214990A true CN112214990A (en) 2021-01-12

Family

ID=74051876

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011015127.2A Pending CN112214990A (en) 2020-09-24 2020-09-24 Method and device for extracting key words of rail transit maintenance work order

Country Status (1)

Country Link
CN (1) CN112214990A (en)

Similar Documents

Publication Publication Date Title
CN109325165B (en) Network public opinion analysis method, device and storage medium
US11748416B2 (en) Machine-learning system for servicing queries for digital content
CN104408093B (en) A kind of media event key element abstracting method and device
CN109145216A (en) Network public-opinion monitoring method, device and storage medium
EP2581868A2 (en) Systems and methods for managing publication of online advertisements
WO2010038540A1 (en) System for extracting term from document containing text segment
CN106126619A (en) A kind of video retrieval method based on video content and system
US10824816B2 (en) Semantic parsing method and apparatus
CN101630312A (en) Clustering method for question sentences in question-and-answer platform and system thereof
CN112686022A (en) Method and device for detecting illegal corpus, computer equipment and storage medium
CN109858626B (en) Knowledge base construction method and device
US20200134264A1 (en) Method for Updating a Knowledge Base of a Sentiment Analysis System
CN109992653A (en) Information processing method and processing system
US9514496B2 (en) System for management of sentiments and methods thereof
US20230367821A1 (en) Machine-learning system for servicing queries for digital content
CN110610003B (en) Method and system for assisting text annotation
WO2021121252A1 (en) Comment-based behavior prediction
CN107545505B (en) Method and system for identifying insurance financing product information
CN111782793A (en) Intelligent customer service processing method, system and equipment
CN114663067A (en) Job matching method, system, equipment and medium
CN115168562A (en) Method, device, equipment and medium for constructing intelligent question-answering system
CN113240322B (en) Climate risk disclosure quality method, apparatus, electronic device, and storage medium
Ahiladas et al. Ruchi: Rating individual food items in restaurant reviews
Suryaningrum Comparison of the TF-IDF method with the count vectorizer to classify hate speech
CN112487211B (en) Rail transit knowledge base construction method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination