CN110647524B - Novel database completion method for power supply rail transit operation and maintenance system - Google Patents
Novel database completion method for power supply rail transit operation and maintenance system Download PDFInfo
- Publication number
- CN110647524B CN110647524B CN201910934906.3A CN201910934906A CN110647524B CN 110647524 B CN110647524 B CN 110647524B CN 201910934906 A CN201910934906 A CN 201910934906A CN 110647524 B CN110647524 B CN 110647524B
- Authority
- CN
- China
- Prior art keywords
- database
- rail transit
- relation
- node
- power supply
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000012423 maintenance Methods 0.000 title claims abstract description 46
- 238000000034 method Methods 0.000 title claims abstract description 42
- 230000002159 abnormal effect Effects 0.000 claims abstract description 18
- 238000005065 mining Methods 0.000 claims description 13
- 238000012216 screening Methods 0.000 claims description 13
- 238000009826 distribution Methods 0.000 claims description 5
- 230000000694 effects Effects 0.000 description 4
- 238000005457 optimization Methods 0.000 description 4
- 238000013461 design Methods 0.000 description 3
- 238000005516 engineering process Methods 0.000 description 2
- 238000011160 research Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000007418 data mining Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000006870 function Effects 0.000 description 1
- 238000013507 mapping Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000005295 random walk Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/21—Design, administration or maintenance of databases
- G06F16/215—Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/23—Updating
- G06F16/2379—Updates performed during online database operations; commit processing
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/24—Querying
- G06F16/245—Query processing
- G06F16/2458—Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
- G06F16/2465—Query processing support for facilitating data mining operations in structured databases
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/901—Indexing; Data structures therefor; Storage structures
- G06F16/9024—Graphs; Linked lists
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2216/00—Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
- G06F2216/03—Data mining
-
- G06Q50/40—
Abstract
The invention discloses a database completion method of a novel power supply rail transit operation and maintenance system. Firstly, a database is discovered by adopting link prediction, and an abnormal relation and an unknown relation in the database are identified; then calculating the confidence degree of the node relation tuple; and finally, judging and adding the data into a database, so that the rail transit operation and maintenance database is more complete. The method has integrity and controllability in the aspect of facing the database completion problem of the professional field, and improves the precision.
Description
Technical Field
The invention relates to a database optimization method in the field of computers for data mining and data maps, in particular to a database completion method for a novel power supply rail transit operation and maintenance system.
Background
In recent years, the optimization design and operation control of a novel power supply rail transit system are greatly developed and popularized. The track traffic operation and maintenance system database is utilized to support the optimized design and operation control of the novel power supply track traffic operation and maintenance system, and the optimization design and operation control become the current popular and important application. By establishing the rail transit operation and maintenance system database, decision support can be effectively provided for the rail transit operation and maintenance system, and related personnel are further helped to optimize the operation and maintenance system.
However, the database of the rail transit operation and maintenance system is not complete in real time, and the database may have an error relationship and an incomplete relationship, and the database also needs to be adjusted continuously according to the service requirements of rail transit. This presents a challenge to maintaining the rail transit operation and maintenance system database in real time. In order to reduce the cost of maintaining the database of the rail transit operation and maintenance system, it is an important subject to research a database completion method of a novel power supply rail transit operation and maintenance system.
The existing link prediction technology considers the effect of single database completion too much, has high time overhead, and is not suitable for the database completion of the rail transit operation and maintenance system. The novel power supply rail transit system database has frequent variability, so the existing research method is not suitable for the novel power supply rail transit system database.
Disclosure of Invention
The invention aims to provide a novel database completion method for a power supply rail transit operation and maintenance system, which combines an active learning mode, can continuously provide a real-time database completion technology for a rail transit operation and maintenance system database, solves the technical problem of completing the database due to rail transit service adjustment and database inherent errors, has short completion time and can realize quick completion.
The technical scheme adopted by the invention for solving the technical problems is as follows:
according to the technical scheme, a clustering method and a graph structure mining method are designed, an active learning mode is combined, abnormal relations and unknown relations in a rail transit database can be rapidly and effectively identified, and after the database is completed by iteration each time, the new database can enable a clustering and graph mining combined screening method to be more accurate.
The invention has the beneficial effects that:
(1) by adopting a database clustering method and a graph structure mining method, a preliminary screening result of possible abnormal relation/unknown relation in the novel power supply rail transit operation and maintenance system is quickly found and identified;
(2) correcting the preliminary screening result by a database self-learning method to obtain a possible normal relation, and calculating the confidence of the possible normal relation;
(3) judging the normal relation of the abnormal relation and the unknown relation of the primary screening result according to the confidence coefficient, returning the abnormal relation and the unknown relation to the power supply rail transit operation and maintenance database, and performing iterative updating, wherein the updated rail transit operation and maintenance database is more perfect compared with the original database;
(4) and (3) returning the updated power supply rail transit operation and maintenance database to the step (1), continuously repeating the iteration step (1) to the step (3) until a normal relation is not judged in the step (3), stopping iteration, completing updating and perfecting the power supply rail transit operation and maintenance database, and enabling the power supply rail transit data to have real-time integrity.
The step 1) is specifically as follows:
1.1) establishing a novel power supply rail transit operation and maintenance system database by using original data acquired during operation of novel power supply rail transit, wherein the database is stored in a server of the novel power supply rail transit operation and maintenance system database, nodes exist in the novel power supply rail transit operation and maintenance system database, the nodes represent entities such as rail entities, train entities and the like and related physical quantities thereof, edges among the nodes represent relations among the nodes, the nodes in the novel power supply rail transit operation and maintenance system database are divided into different node sets by a database clustering method, and meanwhile, the confidence coefficient of each node set is calculated and a confidence coefficient network among the node sets is established;
1.2) calculating the probability that the relation between every two nodes in the database of the novel power supply rail transit operation and maintenance system is an abnormal relation and an unknown relation by a graph structure mining method, and respectively obtaining two probability distributions;
1.3) using the confidence network of 1.1 and the two probability distributions in 1.2, the total weight is calculated using the following formula:
wherein WijRepresenting the total weight between node i and node j.Representing the probability value of the abnormal relation between the node i and the node j,representing the unknown relationship probability value between node i and node j,representing a confidence value between node i and node j;
and finally, selecting the relationship between k two nodes with the highest total weight as a primary screening result.
In the step 1.1), after the confidence degrees of the node sets are obtained, the node sets with higher confidence degrees are merged by adopting a greedy method, specifically, the node sets with the confidence degrees higher than a first confidence threshold value are merged.
The step 3) is specifically as follows: and judging the possible normal relationship, taking the possible normal relationship with the confidence coefficient not higher than the second confidence threshold as the abnormal relationship, and taking the possible normal relationship with the confidence coefficient higher than the second confidence threshold as the normal relationship to return to the database of the rail transit operation and maintenance system for database completion.
According to the invention, an active learning mode is adopted, the updated rail transit operation and maintenance system database is subjected to the step 1 and the step 2 again to obtain a more complete result, the confidence coefficient network and the probability distribution are utilized to jointly screen the abnormal relation and the unknown relation in the rail transit operation and maintenance system database, more accurate and higher-precision identification completion processing is obtained, and the completeness and controllability are further brought.
The invention integrates a plurality of fields, provides a database completion method for a novel power supply rail transit operation and maintenance system, and obviously improves the completion effect compared with the traditional method.
The method has integrity and controllability in the aspect of the completion problem of the database in the professional field, improves the precision, and is very suitable for a rail transit operation and maintenance system.
According to the invention, the sampled public database data set is tested, the number of relation tuples is 26076, and the result shows that compared with the traditional method, the clustering and graph mining combined screening method has the advantages that the time overhead is not obviously increased, and the effect is improved by about 20%.
Drawings
FIG. 1 is a flow chart of the steps performed by the present invention.
Fig. 2 is an explanatory diagram of a rail transit database clustering algorithm.
Detailed Description
The technical solution of the present invention will now be further explained with reference to specific embodiments and examples.
Referring to fig. 1, the specific implementation process and the working principle of the present invention are as follows:
(1) firstly, a clustering method and a graph structure mining method are adopted to quickly find and identify abnormal relations and unknown relations in the novel power supply rail transit operation and maintenance system. And obtaining the preliminary screening result of the abnormal relation/unknown relation in the database. The method comprises two steps: firstly, the rail transit database clustering method is used for mining semantic information in a rail transit operation and maintenance database:
a) the database clustering firstly divides the nodes in the database into different sets, and each node forms a corresponding initial set.
b) The track traffic database may contain some erroneous information and may be somewhat duplicated before the initial set. Some merging of the sets in the database is therefore required, as shown in fig. 2. The merge strategy employs a greedy approach: 1. in the first step, a set is selected according to equal probability, and then the similarity between the current set and the set is calculated. 2. Setting a threshold value, and selecting a set with the maximum similarity and exceeding the threshold value to be combined with the current set. Fig. 2 shows the process of merging four node sets into three sets.
c) And for the node sets after combination, calculating possible probabilities among the sets to obtain a probability network between every two nodes.
And then mining the graph structure information in the rail transit system database by adopting a graph structure mining method. The Node2Vec algorithm is adopted to mine graph structure information in the database in specific implementation, the Node2Vec can be regarded as a result and a deepwalk algorithm for DFS and BFS random walk, and the optimization goal of the Node is to map the Node into a mapping function of a mathematical space vector. And establishing a confidence coefficient network between every two nodes through a graph mining algorithm.
The total weight is calculated using the following formula:
wherein, WijRepresenting the total weight between node i and node j.Representing the probability value of the abnormal relation between the node i and the node j,representing the unknown relationship probability value between node i and node j,representing the confidence value between node i and node j.
And selecting the relationship between k two nodes with the highest total weight as a primary screening result.
(2) Correcting the preliminary screening result by a database self-learning method to obtain a possible normal relation, and calculating the confidence of the possible normal relation; the implementation adopts a TransE algorithm to calculate the confidence of the possible normal relation. The TransE algorithm can calculate the mathematical space vector representation of the nodes and edges in the database according to the characteristics of the rail transit operation and maintenance database, and calculate the confidence coefficient between each node in the database through the distance of the mathematical space vector. Thereby building confidence levels for possible normal relationships.
(3) And finally, returning the normal relation with the calculated confidence coefficient higher than the confidence threshold value to the rail transit database, and updating the rail transit database.
4) And (4) returning the updated power supply rail transit operation and maintenance database to the step (1), and continuously repeating the iteration steps (1) to (3) until the step (3) does not judge that a normal relation is obtained, and stopping iteration.
Finally, in total, the sampled public database data sets are tested, the total number is 26076, and the result shows that compared with the traditional method, the clustering method and the graph structure mining method have the advantages that the time overhead is not obviously increased, and the effect is improved by about 20%.
Claims (1)
1. A database completion method of a novel power supply rail transit operation and maintenance system is characterized by comprising the following steps: the method comprises the following steps:
(1) by adopting a database clustering method and a graph structure mining method, a preliminary screening result of possible abnormal relation/unknown relation in the novel power supply rail transit operation and maintenance system is quickly found and identified;
1.1) constructing and obtaining a novel power supply rail transit operation and maintenance system database by using original data acquired during the operation of novel power supply rail transit, wherein the database is stored in a server of the novel power supply rail transit operation and maintenance system, nodes exist in the novel power supply rail transit operation and maintenance system database, the nodes in the novel power supply rail transit operation and maintenance system database are divided into different node sets by using a database clustering method, and meanwhile, the confidence coefficient of each node set is calculated and a confidence coefficient network between the node sets is established;
1.2) calculating the probability that the relation between every two nodes in the database of the novel power supply rail transit operation and maintenance system is an abnormal relation and an unknown relation by a graph structure mining method, and respectively obtaining two probability distributions;
1.3) using the confidence network of 1.1 and the two probability distributions in 1.2, the total weight is calculated using the following formula:
wherein WijRepresenting the total weight between node i and node j;representing the probability value of the abnormal relation between the node i and the node j,representing the unknown relationship probability value between node i and node j,representing a confidence value between node i and node j;
finally, selecting the relationship between k two nodes with the highest total weight as a primary screening result;
(2) correcting the preliminary screening result by a database self-learning method to obtain a possible normal relation, and calculating the confidence of the possible normal relation;
(3) judging the normal relation of the abnormal relation and the unknown relation of the primary screening result according to the confidence coefficient, returning the abnormal relation and the unknown relation to the power supply rail transit operation and maintenance database, and performing iterative updating;
(4) returning the updated power supply rail transit operation and maintenance database to the step (1), continuously repeating the iteration steps (1) to (3), and stopping iteration until no normal relation is obtained in the step (3);
in the step 1.1), after the confidence degrees of the node sets are obtained, combining the node sets with higher confidence degrees by adopting a greedy method;
the step 3) is specifically as follows: and judging the possible normal relationship, taking the possible normal relationship with the confidence coefficient not higher than the second confidence threshold as the abnormal relationship, and taking the possible normal relationship with the confidence coefficient higher than the second confidence threshold as the normal relationship to return to the database of the rail transit operation and maintenance system for database completion.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910934906.3A CN110647524B (en) | 2019-09-29 | 2019-09-29 | Novel database completion method for power supply rail transit operation and maintenance system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910934906.3A CN110647524B (en) | 2019-09-29 | 2019-09-29 | Novel database completion method for power supply rail transit operation and maintenance system |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110647524A CN110647524A (en) | 2020-01-03 |
CN110647524B true CN110647524B (en) | 2021-11-23 |
Family
ID=68993154
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910934906.3A Active CN110647524B (en) | 2019-09-29 | 2019-09-29 | Novel database completion method for power supply rail transit operation and maintenance system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110647524B (en) |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636454A (en) * | 2015-02-02 | 2015-05-20 | 哈尔滨工程大学 | Large-scale heterogeneous data oriented co-clustering method |
CN108694469A (en) * | 2018-06-08 | 2018-10-23 | 哈尔滨工程大学 | A kind of Relationship Prediction method of knowledge based collection of illustrative plates |
US10268735B1 (en) * | 2015-12-29 | 2019-04-23 | Palantir Technologies Inc. | Graph based resolution of matching items in data sources |
-
2019
- 2019-09-29 CN CN201910934906.3A patent/CN110647524B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN104636454A (en) * | 2015-02-02 | 2015-05-20 | 哈尔滨工程大学 | Large-scale heterogeneous data oriented co-clustering method |
US10268735B1 (en) * | 2015-12-29 | 2019-04-23 | Palantir Technologies Inc. | Graph based resolution of matching items in data sources |
CN108694469A (en) * | 2018-06-08 | 2018-10-23 | 哈尔滨工程大学 | A kind of Relationship Prediction method of knowledge based collection of illustrative plates |
Also Published As
Publication number | Publication date |
---|---|
CN110647524A (en) | 2020-01-03 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN110955780A (en) | Entity alignment method for knowledge graph | |
CN109033170B (en) | Data repairing method, device and equipment for parking lot and storage medium | |
CN113518007B (en) | Multi-internet-of-things equipment heterogeneous model efficient mutual learning method based on federal learning | |
CN108809697B (en) | Social network key node identification method and system based on influence maximization | |
Liu et al. | Fedgru: Privacy-preserving traffic flow prediction via federated learning | |
CN113140254A (en) | Meta-learning drug-target interaction prediction system and prediction method | |
CN113761221A (en) | Knowledge graph entity alignment method based on graph neural network | |
CN112165401A (en) | Edge community discovery algorithm based on network pruning and local community expansion | |
Kim et al. | Reducing model cost based on the weights of each layer for federated learning clustering | |
WO2019014894A1 (en) | Link prediction method and device | |
CN110647524B (en) | Novel database completion method for power supply rail transit operation and maintenance system | |
CN113206756B (en) | Network flow prediction method based on combined model | |
Qin et al. | A wireless sensor network location algorithm based on insufficient fingerprint information | |
CN113515540A (en) | Query rewriting method for database | |
CN109800231B (en) | Real-time co-movement motion mode detection method of track based on Flink | |
CN104156462A (en) | Complex network community mining method based on cellular automatic learning machine | |
CN110717068A (en) | Video retrieval method based on deep learning | |
CN115659807A (en) | Method for predicting talent performance based on Bayesian optimization model fusion algorithm | |
CN114781545A (en) | Method and system for federated learning | |
CN103823843A (en) | Gauss mixture model tree and incremental clustering method thereof | |
CN114529096A (en) | Social network link prediction method and system based on ternary closure graph embedding | |
CN113468156A (en) | Feature fusion enhancement-based data set missing value filling method | |
CN109120438B (en) | Data cooperative transmission method and system under opportunity network | |
WO2023061303A1 (en) | Large-scale fading modeling and estimation method, system, and device, and storage medium | |
CN116703008B (en) | Traffic volume prediction method, equipment and medium for newly built highway |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |