CN110647524B - Novel database completion method for power supply rail transit operation and maintenance system - Google Patents

Novel database completion method for power supply rail transit operation and maintenance system Download PDF

Info

Publication number
CN110647524B
CN110647524B CN201910934906.3A CN201910934906A CN110647524B CN 110647524 B CN110647524 B CN 110647524B CN 201910934906 A CN201910934906 A CN 201910934906A CN 110647524 B CN110647524 B CN 110647524B
Authority
CN
China
Prior art keywords
database
rail transit
relation
node
power supply
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910934906.3A
Other languages
Chinese (zh)
Other versions
CN110647524A (en
Inventor
陈刚
刘晋
潘硕
李辉
陈钦况
江大伟
陈珂
吴晓凡
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhejiang University ZJU
Original Assignee
Zhejiang University ZJU
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhejiang University ZJU filed Critical Zhejiang University ZJU
Priority to CN201910934906.3A priority Critical patent/CN110647524B/en
Publication of CN110647524A publication Critical patent/CN110647524A/en
Application granted granted Critical
Publication of CN110647524B publication Critical patent/CN110647524B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2465Query processing support for facilitating data mining operations in structured databases
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/901Indexing; Data structures therefor; Storage structures
    • G06F16/9024Graphs; Linked lists
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2216/00Indexing scheme relating to additional aspects of information retrieval not explicitly covered by G06F16/00 and subgroups
    • G06F2216/03Data mining
    • G06Q50/40

Abstract

The invention discloses a database completion method of a novel power supply rail transit operation and maintenance system. Firstly, a database is discovered by adopting link prediction, and an abnormal relation and an unknown relation in the database are identified; then calculating the confidence degree of the node relation tuple; and finally, judging and adding the data into a database, so that the rail transit operation and maintenance database is more complete. The method has integrity and controllability in the aspect of facing the database completion problem of the professional field, and improves the precision.

Description

Novel database completion method for power supply rail transit operation and maintenance system
Technical Field
The invention relates to a database optimization method in the field of computers for data mining and data maps, in particular to a database completion method for a novel power supply rail transit operation and maintenance system.
Background
In recent years, the optimization design and operation control of a novel power supply rail transit system are greatly developed and popularized. The track traffic operation and maintenance system database is utilized to support the optimized design and operation control of the novel power supply track traffic operation and maintenance system, and the optimization design and operation control become the current popular and important application. By establishing the rail transit operation and maintenance system database, decision support can be effectively provided for the rail transit operation and maintenance system, and related personnel are further helped to optimize the operation and maintenance system.
However, the database of the rail transit operation and maintenance system is not complete in real time, and the database may have an error relationship and an incomplete relationship, and the database also needs to be adjusted continuously according to the service requirements of rail transit. This presents a challenge to maintaining the rail transit operation and maintenance system database in real time. In order to reduce the cost of maintaining the database of the rail transit operation and maintenance system, it is an important subject to research a database completion method of a novel power supply rail transit operation and maintenance system.
The existing link prediction technology considers the effect of single database completion too much, has high time overhead, and is not suitable for the database completion of the rail transit operation and maintenance system. The novel power supply rail transit system database has frequent variability, so the existing research method is not suitable for the novel power supply rail transit system database.
Disclosure of Invention
The invention aims to provide a novel database completion method for a power supply rail transit operation and maintenance system, which combines an active learning mode, can continuously provide a real-time database completion technology for a rail transit operation and maintenance system database, solves the technical problem of completing the database due to rail transit service adjustment and database inherent errors, has short completion time and can realize quick completion.
The technical scheme adopted by the invention for solving the technical problems is as follows:
according to the technical scheme, a clustering method and a graph structure mining method are designed, an active learning mode is combined, abnormal relations and unknown relations in a rail transit database can be rapidly and effectively identified, and after the database is completed by iteration each time, the new database can enable a clustering and graph mining combined screening method to be more accurate.
The invention has the beneficial effects that:
(1) by adopting a database clustering method and a graph structure mining method, a preliminary screening result of possible abnormal relation/unknown relation in the novel power supply rail transit operation and maintenance system is quickly found and identified;
(2) correcting the preliminary screening result by a database self-learning method to obtain a possible normal relation, and calculating the confidence of the possible normal relation;
(3) judging the normal relation of the abnormal relation and the unknown relation of the primary screening result according to the confidence coefficient, returning the abnormal relation and the unknown relation to the power supply rail transit operation and maintenance database, and performing iterative updating, wherein the updated rail transit operation and maintenance database is more perfect compared with the original database;
(4) and (3) returning the updated power supply rail transit operation and maintenance database to the step (1), continuously repeating the iteration step (1) to the step (3) until a normal relation is not judged in the step (3), stopping iteration, completing updating and perfecting the power supply rail transit operation and maintenance database, and enabling the power supply rail transit data to have real-time integrity.
The step 1) is specifically as follows:
1.1) establishing a novel power supply rail transit operation and maintenance system database by using original data acquired during operation of novel power supply rail transit, wherein the database is stored in a server of the novel power supply rail transit operation and maintenance system database, nodes exist in the novel power supply rail transit operation and maintenance system database, the nodes represent entities such as rail entities, train entities and the like and related physical quantities thereof, edges among the nodes represent relations among the nodes, the nodes in the novel power supply rail transit operation and maintenance system database are divided into different node sets by a database clustering method, and meanwhile, the confidence coefficient of each node set is calculated and a confidence coefficient network among the node sets is established;
1.2) calculating the probability that the relation between every two nodes in the database of the novel power supply rail transit operation and maintenance system is an abnormal relation and an unknown relation by a graph structure mining method, and respectively obtaining two probability distributions;
1.3) using the confidence network of 1.1 and the two probability distributions in 1.2, the total weight is calculated using the following formula:
Figure BDA0002221284380000021
wherein WijRepresenting the total weight between node i and node j.
Figure BDA0002221284380000022
Representing the probability value of the abnormal relation between the node i and the node j,
Figure BDA0002221284380000023
representing the unknown relationship probability value between node i and node j,
Figure BDA0002221284380000024
representing a confidence value between node i and node j;
and finally, selecting the relationship between k two nodes with the highest total weight as a primary screening result.
In the step 1.1), after the confidence degrees of the node sets are obtained, the node sets with higher confidence degrees are merged by adopting a greedy method, specifically, the node sets with the confidence degrees higher than a first confidence threshold value are merged.
The step 3) is specifically as follows: and judging the possible normal relationship, taking the possible normal relationship with the confidence coefficient not higher than the second confidence threshold as the abnormal relationship, and taking the possible normal relationship with the confidence coefficient higher than the second confidence threshold as the normal relationship to return to the database of the rail transit operation and maintenance system for database completion.
According to the invention, an active learning mode is adopted, the updated rail transit operation and maintenance system database is subjected to the step 1 and the step 2 again to obtain a more complete result, the confidence coefficient network and the probability distribution are utilized to jointly screen the abnormal relation and the unknown relation in the rail transit operation and maintenance system database, more accurate and higher-precision identification completion processing is obtained, and the completeness and controllability are further brought.
The invention integrates a plurality of fields, provides a database completion method for a novel power supply rail transit operation and maintenance system, and obviously improves the completion effect compared with the traditional method.
The method has integrity and controllability in the aspect of the completion problem of the database in the professional field, improves the precision, and is very suitable for a rail transit operation and maintenance system.
According to the invention, the sampled public database data set is tested, the number of relation tuples is 26076, and the result shows that compared with the traditional method, the clustering and graph mining combined screening method has the advantages that the time overhead is not obviously increased, and the effect is improved by about 20%.
Drawings
FIG. 1 is a flow chart of the steps performed by the present invention.
Fig. 2 is an explanatory diagram of a rail transit database clustering algorithm.
Detailed Description
The technical solution of the present invention will now be further explained with reference to specific embodiments and examples.
Referring to fig. 1, the specific implementation process and the working principle of the present invention are as follows:
(1) firstly, a clustering method and a graph structure mining method are adopted to quickly find and identify abnormal relations and unknown relations in the novel power supply rail transit operation and maintenance system. And obtaining the preliminary screening result of the abnormal relation/unknown relation in the database. The method comprises two steps: firstly, the rail transit database clustering method is used for mining semantic information in a rail transit operation and maintenance database:
a) the database clustering firstly divides the nodes in the database into different sets, and each node forms a corresponding initial set.
b) The track traffic database may contain some erroneous information and may be somewhat duplicated before the initial set. Some merging of the sets in the database is therefore required, as shown in fig. 2. The merge strategy employs a greedy approach: 1. in the first step, a set is selected according to equal probability, and then the similarity between the current set and the set is calculated. 2. Setting a threshold value, and selecting a set with the maximum similarity and exceeding the threshold value to be combined with the current set. Fig. 2 shows the process of merging four node sets into three sets.
c) And for the node sets after combination, calculating possible probabilities among the sets to obtain a probability network between every two nodes.
And then mining the graph structure information in the rail transit system database by adopting a graph structure mining method. The Node2Vec algorithm is adopted to mine graph structure information in the database in specific implementation, the Node2Vec can be regarded as a result and a deepwalk algorithm for DFS and BFS random walk, and the optimization goal of the Node is to map the Node into a mapping function of a mathematical space vector. And establishing a confidence coefficient network between every two nodes through a graph mining algorithm.
The total weight is calculated using the following formula:
Figure BDA0002221284380000031
wherein, WijRepresenting the total weight between node i and node j.
Figure BDA0002221284380000041
Representing the probability value of the abnormal relation between the node i and the node j,
Figure BDA0002221284380000042
representing the unknown relationship probability value between node i and node j,
Figure BDA0002221284380000043
representing the confidence value between node i and node j.
And selecting the relationship between k two nodes with the highest total weight as a primary screening result.
(2) Correcting the preliminary screening result by a database self-learning method to obtain a possible normal relation, and calculating the confidence of the possible normal relation; the implementation adopts a TransE algorithm to calculate the confidence of the possible normal relation. The TransE algorithm can calculate the mathematical space vector representation of the nodes and edges in the database according to the characteristics of the rail transit operation and maintenance database, and calculate the confidence coefficient between each node in the database through the distance of the mathematical space vector. Thereby building confidence levels for possible normal relationships.
(3) And finally, returning the normal relation with the calculated confidence coefficient higher than the confidence threshold value to the rail transit database, and updating the rail transit database.
4) And (4) returning the updated power supply rail transit operation and maintenance database to the step (1), and continuously repeating the iteration steps (1) to (3) until the step (3) does not judge that a normal relation is obtained, and stopping iteration.
Finally, in total, the sampled public database data sets are tested, the total number is 26076, and the result shows that compared with the traditional method, the clustering method and the graph structure mining method have the advantages that the time overhead is not obviously increased, and the effect is improved by about 20%.

Claims (1)

1. A database completion method of a novel power supply rail transit operation and maintenance system is characterized by comprising the following steps: the method comprises the following steps:
(1) by adopting a database clustering method and a graph structure mining method, a preliminary screening result of possible abnormal relation/unknown relation in the novel power supply rail transit operation and maintenance system is quickly found and identified;
1.1) constructing and obtaining a novel power supply rail transit operation and maintenance system database by using original data acquired during the operation of novel power supply rail transit, wherein the database is stored in a server of the novel power supply rail transit operation and maintenance system, nodes exist in the novel power supply rail transit operation and maintenance system database, the nodes in the novel power supply rail transit operation and maintenance system database are divided into different node sets by using a database clustering method, and meanwhile, the confidence coefficient of each node set is calculated and a confidence coefficient network between the node sets is established;
1.2) calculating the probability that the relation between every two nodes in the database of the novel power supply rail transit operation and maintenance system is an abnormal relation and an unknown relation by a graph structure mining method, and respectively obtaining two probability distributions;
1.3) using the confidence network of 1.1 and the two probability distributions in 1.2, the total weight is calculated using the following formula:
Figure FDA0003210187850000011
wherein WijRepresenting the total weight between node i and node j;
Figure FDA0003210187850000012
representing the probability value of the abnormal relation between the node i and the node j,
Figure FDA0003210187850000013
representing the unknown relationship probability value between node i and node j,
Figure FDA0003210187850000014
representing a confidence value between node i and node j;
finally, selecting the relationship between k two nodes with the highest total weight as a primary screening result;
(2) correcting the preliminary screening result by a database self-learning method to obtain a possible normal relation, and calculating the confidence of the possible normal relation;
(3) judging the normal relation of the abnormal relation and the unknown relation of the primary screening result according to the confidence coefficient, returning the abnormal relation and the unknown relation to the power supply rail transit operation and maintenance database, and performing iterative updating;
(4) returning the updated power supply rail transit operation and maintenance database to the step (1), continuously repeating the iteration steps (1) to (3), and stopping iteration until no normal relation is obtained in the step (3);
in the step 1.1), after the confidence degrees of the node sets are obtained, combining the node sets with higher confidence degrees by adopting a greedy method;
the step 3) is specifically as follows: and judging the possible normal relationship, taking the possible normal relationship with the confidence coefficient not higher than the second confidence threshold as the abnormal relationship, and taking the possible normal relationship with the confidence coefficient higher than the second confidence threshold as the normal relationship to return to the database of the rail transit operation and maintenance system for database completion.
CN201910934906.3A 2019-09-29 2019-09-29 Novel database completion method for power supply rail transit operation and maintenance system Active CN110647524B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910934906.3A CN110647524B (en) 2019-09-29 2019-09-29 Novel database completion method for power supply rail transit operation and maintenance system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910934906.3A CN110647524B (en) 2019-09-29 2019-09-29 Novel database completion method for power supply rail transit operation and maintenance system

Publications (2)

Publication Number Publication Date
CN110647524A CN110647524A (en) 2020-01-03
CN110647524B true CN110647524B (en) 2021-11-23

Family

ID=68993154

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910934906.3A Active CN110647524B (en) 2019-09-29 2019-09-29 Novel database completion method for power supply rail transit operation and maintenance system

Country Status (1)

Country Link
CN (1) CN110647524B (en)

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636454A (en) * 2015-02-02 2015-05-20 哈尔滨工程大学 Large-scale heterogeneous data oriented co-clustering method
CN108694469A (en) * 2018-06-08 2018-10-23 哈尔滨工程大学 A kind of Relationship Prediction method of knowledge based collection of illustrative plates
US10268735B1 (en) * 2015-12-29 2019-04-23 Palantir Technologies Inc. Graph based resolution of matching items in data sources

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104636454A (en) * 2015-02-02 2015-05-20 哈尔滨工程大学 Large-scale heterogeneous data oriented co-clustering method
US10268735B1 (en) * 2015-12-29 2019-04-23 Palantir Technologies Inc. Graph based resolution of matching items in data sources
CN108694469A (en) * 2018-06-08 2018-10-23 哈尔滨工程大学 A kind of Relationship Prediction method of knowledge based collection of illustrative plates

Also Published As

Publication number Publication date
CN110647524A (en) 2020-01-03

Similar Documents

Publication Publication Date Title
CN110955780A (en) Entity alignment method for knowledge graph
CN109033170B (en) Data repairing method, device and equipment for parking lot and storage medium
CN113518007B (en) Multi-internet-of-things equipment heterogeneous model efficient mutual learning method based on federal learning
CN108809697B (en) Social network key node identification method and system based on influence maximization
Liu et al. Fedgru: Privacy-preserving traffic flow prediction via federated learning
CN113140254A (en) Meta-learning drug-target interaction prediction system and prediction method
CN113761221A (en) Knowledge graph entity alignment method based on graph neural network
CN112165401A (en) Edge community discovery algorithm based on network pruning and local community expansion
Kim et al. Reducing model cost based on the weights of each layer for federated learning clustering
WO2019014894A1 (en) Link prediction method and device
CN110647524B (en) Novel database completion method for power supply rail transit operation and maintenance system
CN113206756B (en) Network flow prediction method based on combined model
Qin et al. A wireless sensor network location algorithm based on insufficient fingerprint information
CN113515540A (en) Query rewriting method for database
CN109800231B (en) Real-time co-movement motion mode detection method of track based on Flink
CN104156462A (en) Complex network community mining method based on cellular automatic learning machine
CN110717068A (en) Video retrieval method based on deep learning
CN115659807A (en) Method for predicting talent performance based on Bayesian optimization model fusion algorithm
CN114781545A (en) Method and system for federated learning
CN103823843A (en) Gauss mixture model tree and incremental clustering method thereof
CN114529096A (en) Social network link prediction method and system based on ternary closure graph embedding
CN113468156A (en) Feature fusion enhancement-based data set missing value filling method
CN109120438B (en) Data cooperative transmission method and system under opportunity network
WO2023061303A1 (en) Large-scale fading modeling and estimation method, system, and device, and storage medium
CN116703008B (en) Traffic volume prediction method, equipment and medium for newly built highway

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant