CN110059319B - Pipe gallery fault analysis method based on keyword co-occurrence - Google Patents

Pipe gallery fault analysis method based on keyword co-occurrence Download PDF

Info

Publication number
CN110059319B
CN110059319B CN201910323713.4A CN201910323713A CN110059319B CN 110059319 B CN110059319 B CN 110059319B CN 201910323713 A CN201910323713 A CN 201910323713A CN 110059319 B CN110059319 B CN 110059319B
Authority
CN
China
Prior art keywords
keyword
occurrence
word
keywords
fault information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910323713.4A
Other languages
Chinese (zh)
Other versions
CN110059319A (en
Inventor
孙华
华罗懿
蒋峥嵘
唐聪
金鸣
朱统权
陆理平
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanghai Chemical Industry Park Public Pipe Rack Co ltd
Original Assignee
Shanghai Chemical Industry Park Public Pipe Rack Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanghai Chemical Industry Park Public Pipe Rack Co ltd filed Critical Shanghai Chemical Industry Park Public Pipe Rack Co ltd
Priority to CN201910323713.4A priority Critical patent/CN110059319B/en
Publication of CN110059319A publication Critical patent/CN110059319A/en
Application granted granted Critical
Publication of CN110059319B publication Critical patent/CN110059319B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/338Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking

Abstract

The invention discloses a pipe gallery fault analysis method based on keyword co-occurrence, which comprises the following steps of: the method comprises the following steps: constructing a self-defined dictionary which is based on the general dictionary and covers the professional nouns and the professional terms of the pipe gallery industry; step two: after the fault information is obtained, carrying out fault information preprocessing on the fault information; step three: after a word set is obtained, selecting keywords aiming at the word set; step four: after word cloud visualization is carried out on the keywords, a keyword co-occurrence matrix is formed and visualized. Fault keywords are found by means of self-defining dictionaries and word frequency statistics, and are displayed through keyword cloud visualization; and further constructing a keyword co-occurrence network, and performing layout optimization by adopting a graph layout algorithm so as to analyze and visually display fault information.

Description

Pipe gallery fault analysis method based on keyword co-occurrence
Technical Field
The invention relates to a pipe gallery fault analysis method based on keyword co-occurrence.
Background
The text analysis technology refers to the representation of texts and the selection of feature items thereof, and is a basic problem in text mining and information retrieval. The method converts unstructured original text into information which can be recognized and processed by a structured computer, so that a mathematical model is built to describe and replace the text, and the purpose of mining effective information from a large amount of texts is finally achieved. Keyword co-occurrence network analysis is also called co-word analysis, and is a text content analysis technology, which reveals the relationship between keywords by analyzing the form of co-occurrence of words or noun phrases in the same text theme, wherein the higher the co-occurrence frequency is, the stronger the correlation between two keywords is.
The graph layout algorithm is an algorithm for solving the problems of disordered layout, poor readability and the like in a large-scale node connection graph. The force-guiding layout algorithm is a classical method, and the main idea is that the whole topological graph is regarded as a physical system, attractive force exists between adjacent vertexes, repulsive force exists between nonadjacent vertexes, resultant force borne by the vertexes is calculated in each iteration, the vertexes are moved according to the resultant force, and finally the whole system reaches an energy minimum value. The graph layout result meeting the aesthetic standard with more uniform side length and vertex distribution can be generated by the traditional force-guided layout algorithm, and the representative algorithm is FR (Fruchterman-Reingold) algorithm of Fruchterman et al.
Present piping lane trouble information has not effectively utilized yet, and the administrator can't judge content such as the type of piping lane trouble, place rapidly usually, can't effectively develop the pertinence maintenance work to the piping lane, therefore the troubleshooting work of piping lane need use a large amount of staff to carry out progressively investigation usually, and is consuming time and power has influenced work efficiency greatly, has improved the cost of enterprise.
Disclosure of Invention
The invention aims to solve the problems and provides a pipe gallery fault analysis method based on keyword co-occurrence, which can analyze and visually display fault information.
The purpose of the invention is realized by the following steps:
the invention discloses a pipe gallery fault analysis method based on keyword co-occurrence, which comprises the following steps of:
the method comprises the following steps: constructing a self-defined dictionary which is based on the general dictionary and covers the special nouns and the special terms of the pipe gallery industry;
step two: after obtaining the fault information, carrying out fault information preprocessing on the fault information, wherein the fault information preprocessing comprises the following steps:
s2.1: and (3) fault information duplicate removal: if the fault information contains Chinese information and English information, comparing the Chinese information with the English information to remove duplication, and recording the duplicate-removed fault information;
s2.2: generating a word set: in a user-defined dictionary, performing word segmentation on the duplicate-removed fault information recorded in S2.1 by using a word segmentation algorithm, removing stop words, and generating a word set by using the fault information after the word segmentation and the stop words removal;
step three: after the word set is obtained, keyword selection is carried out on the word set, and the keyword selection comprises the following steps:
s3.1: word frequency statistics: for the word set generated in S2.2, counting each word in the word set and the word frequency thereof;
s3.2: synonym combination: merging synonyms aiming at the words and the word frequencies thereof in the word set in the S3.1, and adding the word frequencies of the merged synonyms;
s3.3: selecting keywords: selecting M words before word frequency ranking from the word set subjected to synonym combination in S3.2 as keywords, wherein M is a positive integer and is less than or equal to the number of the keywords;
s3.4: visualization of keyword and word cloud: performing word cloud visualization on the keywords selected in the S3.3;
step four: after carrying out word cloud visualization on the keywords, forming a keyword co-occurrence matrix and visualizing the keyword co-occurrence matrix, wherein the method comprises the following steps:
s4.1: initializing a co-occurrence matrix: using N to represent the number of the keywords selected in S3.3, constructing a keyword co-occurrence matrix of (N + 1) × (N + 1), and initializing the first row and the first column of the keyword co-occurrence matrix into the keywords selected in S3.3;
s4.2: and (3) carrying out statistics on co-occurrence frequency of keywords: pairing each keyword in the first row in the keyword co-occurrence matrix in the S4.1 with each keyword in the first column respectively to form a keyword pair, counting the occurrence frequency of the keyword pair in the term set subjected to synonym combination in the S3.2, and counting the occurrence frequency into the keyword co-occurrence matrix in the S4.1 in a one-to-one correspondence manner;
s4.3: visualization of a keyword co-occurrence matrix: visualizing the keyword co-occurrence matrix of the occurrence frequency of the counted keywords in the S4.2; the visualization mode of the keyword co-occurrence matrix is to form a keyword co-occurrence network; the keyword co-occurrence network comprises a plurality of nodes, each node represents a keyword selected in S3.3, all nodes are connected through undirected edges to express the relation between keyword pairs in S4.2, and the color depth and the thickness of the undirected edges are in direct proportion to the occurrence frequency of the keyword pairs; the size of each node is in direct proportion to the node degree of the node;
step five: finally, the pipe gallery faults are judged and checked one by one according to the size of the node and/or the color depth and thickness of the undirected edge.
In the above pipe gallery fault analysis method based on keyword co-occurrence, when keyword word cloud visualization is performed in S3.4, the size of the keyword is in direct proportion to the word frequency of the keyword.
According to the method for analyzing the failure of the pipe gallery based on the keyword co-occurrence, the layout optimization is further performed on the keyword co-occurrence network through a graph layout algorithm.
In the above pipe gallery fault analysis method based on keyword co-occurrence, the graph layout algorithm is a force guide algorithm.
The pipe gallery fault analysis method based on the keyword co-occurrence is characterized in that the force guiding algorithm is a Fruchterman-Reingold algorithm.
The method for analyzing the faults of the pipe gallery based on the co-occurrence of the keywords discovers the fault keywords through a user-defined dictionary and a word frequency statistic means, and displays the fault keywords through keyword cloud visualization; and further constructing a keyword co-occurrence network, and performing layout optimization by adopting a graph layout algorithm to analyze and visually display fault information.
Drawings
FIG. 1 is a flow chart of a keyword co-occurrence based pipe gallery fault analysis method of the present invention;
FIG. 2 is a schematic diagram of a keyword co-occurrence network formed after a keyword co-occurrence matrix is subjected to a visualization step in S4.3 of the present invention;
FIG. 3 is a schematic diagram of the keyword co-occurrence network of the present invention after layout optimization.
Detailed Description
The invention will be further explained with reference to the drawings.
Referring to fig. 1 to 3, the method for analyzing a pipe rack fault based on keyword co-occurrence according to the present invention is shown, and includes the following steps:
the method comprises the following steps: constructing a self-defined dictionary which is based on the general dictionary and covers the professional nouns and the professional terms of the pipe gallery industry;
step two: after obtaining the fault information, carrying out fault information preprocessing on the fault information, wherein the fault information preprocessing comprises the following steps:
s2.1: and (3) fault information duplication removal: if the fault information contains Chinese information and English information, comparing the Chinese information with the English information to remove duplication, and recording the duplicate-removed fault information;
s2.2: generating a word set: in a user-defined dictionary, performing word segmentation on the duplicate-removed fault information in S2.1 by using a word segmentation algorithm, removing stop words, and generating a word set by using the word segmentation and the fault information after removing the stop words;
step three: after the word set is obtained, keyword selection is carried out on the word set, and the keyword selection comprises the following steps:
s3.1: word frequency statistics: for the word set generated in S2.2, counting each word and the word frequency thereof in the word set;
s3.2: synonym combination: merging synonyms aiming at the words and the word frequencies thereof in the word set in the S3.1, and adding the word frequencies of the merged synonyms;
s3.3: selecting keywords: selecting M words before word frequency ranking from the word set subjected to synonym combination in S3.2 as keywords, wherein M is a positive integer and is less than or equal to the number of the keywords;
s3.4: visualization of keyword and word cloud: performing word cloud visualization on the keywords selected in the S3.3, wherein the size of the visualized keywords is in direct proportion to the word frequency of the keywords;
step four: after carrying out word cloud visualization on the keywords, forming a keyword co-occurrence matrix and visualizing the keyword co-occurrence matrix, comprising the following steps:
s4.1: initializing a co-occurrence matrix: representing the number of the keywords selected in the S3.3 by N, constructing a keyword co-occurrence matrix of (N + 1) × (N + 1), and initializing the first row and the first column of the keyword co-occurrence matrix into the keywords selected in the S3.3;
s4.2: and (3) carrying out statistics on co-occurrence frequency of keywords: pairing each keyword in the first row in the keyword co-occurrence matrix in the S4.1 with each keyword in the first column respectively to form a keyword pair, counting the occurrence frequency of the keyword pair in the word set after synonym combination in the S3.2, and counting the occurrence frequency into the keyword co-occurrence matrix in the S4.1 in a one-to-one correspondence manner;
s4.3: visualization of a keyword co-occurrence matrix: visualizing the co-occurrence matrix of the keywords with the occurrence frequency after the keywords are counted in the S4.2; the visualization mode of the keyword co-occurrence matrix is to form a keyword co-occurrence network; the keyword co-occurrence network comprises a plurality of nodes, each node represents a keyword selected in S3.3, all nodes are connected through undirected edges to express the relation between keyword pairs in S4.2, and the color depth and the thickness of the undirected edges are in direct proportion to the occurrence frequency of the keyword pairs; the size of each node is in direct proportion to the node degree of the node; after the keyword co-occurrence network is formed, carrying out layout optimization through a Fruchterman-Reingold algorithm;
finally, judging the failures of the pipe gallery one by one according to the size of the node and/or the color depth and thickness of the non-directional edge; when judging, all the nodes can be sorted according to the size, and are selected from large to small, the largest node is selected firstly, such as the 'pipeline' in fig. 2, then the undirected edges connected with the nodes are sorted according to the color depth and the line thickness, the undirected edges are selected from deep to light and from thick to thin, the undirected edges with the deepest color and the thickest lines are selected firstly, then the keyword pairs connected by the undirected edges are subjected to fault judgment, such as the keyword pair of the 'pipeline' and the 'rust' in fig. 2, and if the keyword pairs are found to be non-fault points through the troubleshooting, the rest keyword pairs are checked step by step according to the method.
Fault keywords are found through a user-defined dictionary and word frequency statistics method, and are displayed through keyword word cloud visualization; and further constructing a keyword co-occurrence network, performing layout optimization by adopting a graph layout algorithm, visually displaying fault information, judging the fault information according to the size of the node and/or the color depth of the undirected edge and the thickness of the line, and performing subsequent investigation.
The above embodiments are provided only for illustrating the present invention and not for limiting the present invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the present invention, and therefore all equivalent technical solutions should also fall within the scope of the present invention, and should be defined by the claims.

Claims (5)

1. A pipe gallery fault analysis method based on keyword co-occurrence is characterized by comprising the following steps:
the method comprises the following steps: constructing a self-defined dictionary which is based on the general dictionary and covers the professional nouns and the professional terms of the pipe gallery industry;
step two: after obtaining the fault information, performing fault information preprocessing on the fault information, wherein the fault information preprocessing comprises the following steps:
s2.1: and (3) fault information duplication removal: if the fault information contains Chinese information and English information, comparing the Chinese information with the English information to remove duplication, and recording the duplicate-removed fault information;
s2.2: generating a word set: in the user-defined dictionary, performing word segmentation on the duplicate-removed fault information recorded in S2.1 by using a word segmentation algorithm, removing stop words, and generating a word set by using the fault information after the word segmentation and the stop words removal;
step three: after the word set is obtained, selecting keywords aiming at the word set, wherein the keyword selection comprises the following steps:
s3.1: word frequency statistics: for the word set generated in S2.2, counting each word and the word frequency thereof in the word set;
s3.2: synonym combination: merging synonyms aiming at the words and the word frequencies thereof in the word set in the S3.1, and adding the word frequencies of the merged synonyms;
s3.3: selecting key words: selecting M words before word frequency ranking from the word set subjected to synonym combination in S3.2 as keywords, wherein M is a positive integer and is less than or equal to the number of the keywords;
s3.4: visualization of keyword and word cloud: performing word cloud visualization on the keywords selected in the S3.3;
step four: after carrying out word cloud visualization on the keywords, forming a keyword co-occurrence matrix and visualizing the keyword co-occurrence matrix, wherein the method comprises the following steps:
s4.1: initializing a co-occurrence matrix: using N to represent the number of the keywords selected in S3.3, constructing a keyword co-occurrence matrix of (N + 1) × (N + 1), and initializing the first row and the first column of the keyword co-occurrence matrix into the keywords selected in S3.3;
s4.2: and (3) carrying out statistics on co-occurrence frequency of keywords: pairing each keyword in the first row in the keyword co-occurrence matrix in the S4.1 with each keyword in the first column respectively to form a keyword pair, counting the occurrence frequency of the keyword pair in the term set subjected to synonym combination in the S3.2, and counting the occurrence frequency into the keyword co-occurrence matrix in the S4.1 in a one-to-one correspondence manner;
s4.3: visualization of a keyword co-occurrence matrix: visualizing the co-occurrence matrix of the keywords with the occurrence frequency after the keywords are counted in the S4.2; the keyword co-occurrence matrix visualization mode is to form a keyword co-occurrence network; the keyword co-occurrence network comprises a plurality of nodes, each node represents a keyword selected in S3.3, all nodes are connected through undirected edges to express the relation between keyword pairs in S4.2, and the color depth and the thickness of the undirected edges are in direct proportion to the occurrence frequency of the keyword pairs; the size of each node is in direct proportion to the node degree of the node;
step five: finally, according to the size of the node and/or the color depth and the thickness of the undirected edge, the faults of the pipe gallery are judged and checked one by one.
2. The method for analyzing the failure of the pipe gallery based on the co-occurrence of the keywords as claimed in claim 1, wherein when the keyword word cloud visualization is performed in S3.4, the size of the keyword is in direct proportion to the word frequency of the keyword.
3. The keyword co-occurrence-based pipe gallery fault analysis method as claimed in claim 2, wherein the keyword co-occurrence network is further optimized in layout by a graph layout algorithm.
4. The keyword co-occurrence-based pipe gallery fault analysis method according to claim 3, wherein the graph layout algorithm is a force guide algorithm.
5. The keyword co-occurrence-based pipe gallery fault analysis method according to claim 4, wherein the force-oriented algorithm is a Fruchterman-reinold algorithm.
CN201910323713.4A 2019-04-22 2019-04-22 Pipe gallery fault analysis method based on keyword co-occurrence Active CN110059319B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910323713.4A CN110059319B (en) 2019-04-22 2019-04-22 Pipe gallery fault analysis method based on keyword co-occurrence

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910323713.4A CN110059319B (en) 2019-04-22 2019-04-22 Pipe gallery fault analysis method based on keyword co-occurrence

Publications (2)

Publication Number Publication Date
CN110059319A CN110059319A (en) 2019-07-26
CN110059319B true CN110059319B (en) 2022-11-18

Family

ID=67320030

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910323713.4A Active CN110059319B (en) 2019-04-22 2019-04-22 Pipe gallery fault analysis method based on keyword co-occurrence

Country Status (1)

Country Link
CN (1) CN110059319B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110427290A (en) * 2019-08-09 2019-11-08 上海天诚通信技术股份有限公司 A kind of computer room Fault Locating Method
CN111078864B (en) * 2019-12-24 2023-04-28 国网山东省电力公司电力科学研究院 Information security system based on knowledge graph
CN115796172B (en) * 2023-02-08 2023-06-02 广东粤港澳大湾区硬科技创新研究院 Fault case recommendation method, device and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744951A (en) * 2014-01-02 2014-04-23 上海大学 Method for ordering significance of keywords in text
CN104766250A (en) * 2015-04-30 2015-07-08 上海化学工业区公共管廊有限公司 Risk factor weight value calculation method for pipe of pipe gallery
CN105677833A (en) * 2016-01-06 2016-06-15 云南电网有限责任公司电力科学研究院 Method for extracting circuit breaker fault characteristic information on basis of text mining technology
CN108304382A (en) * 2018-01-25 2018-07-20 山东大学 Mass analysis method based on manufacturing process text data digging and system
CN109240258A (en) * 2018-07-09 2019-01-18 上海万行信息科技有限公司 Vehicle failure intelligent auxiliary diagnosis method and system based on term vector

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8112393B2 (en) * 2008-12-05 2012-02-07 Yahoo! Inc. Determining related keywords based on lifestream feeds

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103744951A (en) * 2014-01-02 2014-04-23 上海大学 Method for ordering significance of keywords in text
CN104766250A (en) * 2015-04-30 2015-07-08 上海化学工业区公共管廊有限公司 Risk factor weight value calculation method for pipe of pipe gallery
CN105677833A (en) * 2016-01-06 2016-06-15 云南电网有限责任公司电力科学研究院 Method for extracting circuit breaker fault characteristic information on basis of text mining technology
CN108304382A (en) * 2018-01-25 2018-07-20 山东大学 Mass analysis method based on manufacturing process text data digging and system
CN109240258A (en) * 2018-07-09 2019-01-18 上海万行信息科技有限公司 Vehicle failure intelligent auxiliary diagnosis method and system based on term vector

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
"Text Classification with Keywords and Co-occurred Words in Two-stream Neural Network";Jiawen Deng;《2018 5th IEEE International Conference on Cloud Computing and Intelligence Systems (CCIS)》;20190414;第456-460 *
"公共管廊管道的风险评价方法研究";丁志浩等;《石油化工自动化》;20181220;第22-27页 *
"基于加权超图随机游走的文献关键词提取算法";马慧芳等;《电子学报》;20180615;第1410-1414页 *

Also Published As

Publication number Publication date
CN110059319A (en) 2019-07-26

Similar Documents

Publication Publication Date Title
CN110059319B (en) Pipe gallery fault analysis method based on keyword co-occurrence
JP2004164036A (en) Method for evaluating commonality of document
CN108304382B (en) Quality analysis method and system based on text data mining in manufacturing process
CN108197175A (en) The treating method and apparatus of technical supervision data, storage medium, processor
Khun et al. Visualization of Twitter sentiment during the period of US banned huawei
CN104933032A (en) Method for extracting keywords of blog based on complex network
CN107341142B (en) Enterprise relation calculation method and system based on keyword extraction and analysis
US20200098161A1 (en) System and method for executing non-graphical algorithms on a gpu (graphics processing unit)
CN109710762B (en) Short text clustering method integrating multiple feature weights
Yang et al. An n-gram-and-wikipedia joint approach to natural language identification
CN111737993B (en) Method for extracting equipment health state from fault defect text of power distribution network equipment
Zhang et al. Predicting author age from weibo microblog posts
Pal et al. Word sense disambiguation in Bengali: An unsupervised approach
CN110688448B (en) Real-time log clustering analysis method based on reverse table
CN112417893A (en) Software function demand classification method and system based on semantic hierarchical clustering
Yanti et al. Application of named entity recognition via Twitter on SpaCy in Indonesian (case study: Power failure in the Special Region of Yogyakarta)
CN110704638A (en) Clustering algorithm-based electric power text dictionary construction method
Dewinta et al. Customer complaints clusterization of government drinking water company on social media twitter using text mining
Hirokawa et al. Search and analysis of bankruptcy cause by classification network
CN114756656A (en) Hydraulic engineering potential safety hazard description association rule mining method based on improved Apriori algorithm
Kim et al. Mining news events from comparable news corpora: a multi-attribute proximity network modeling approach
Tang et al. An optimization algorithm of Chinese word segmentation based on dictionary
CN105260467A (en) Short message classification method and apparatus
Cao et al. A Fast Randomized Algorithm for Finding the Maximal Common Subsequences
Pokharel Information Extraction Using Named Entity Recognition from Log Messages

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant