CN110647524B

CN110647524B - Novel database completion method for power supply rail transit operation and maintenance system

Info

Publication number: CN110647524B
Application number: CN201910934906.3A
Authority: CN
Inventors: 陈刚; 刘晋; 潘硕; 李辉; 陈钦况; 江大伟; 陈珂; 吴晓凡
Original assignee: Zhejiang University ZJU
Current assignee: Zhejiang University ZJU
Priority date: 2019-09-29
Filing date: 2019-09-29
Publication date: 2021-11-23
Anticipated expiration: 2039-09-29
Also published as: CN110647524A

Abstract

The invention discloses a database completion method of a novel power supply rail transit operation and maintenance system. Firstly, a database is discovered by adopting link prediction, and an abnormal relation and an unknown relation in the database are identified; then calculating the confidence degree of the node relation tuple; and finally, judging and adding the data into a database, so that the rail transit operation and maintenance database is more complete. The method has integrity and controllability in the aspect of facing the database completion problem of the professional field, and improves the precision.

Description

Novel database completion method for power supply rail transit operation and maintenance system

Technical Field

The invention relates to a database optimization method in the field of computers for data mining and data maps, in particular to a database completion method for a novel power supply rail transit operation and maintenance system.

Background

In recent years, the optimization design and operation control of a novel power supply rail transit system are greatly developed and popularized. The track traffic operation and maintenance system database is utilized to support the optimized design and operation control of the novel power supply track traffic operation and maintenance system, and the optimization design and operation control become the current popular and important application. By establishing the rail transit operation and maintenance system database, decision support can be effectively provided for the rail transit operation and maintenance system, and related personnel are further helped to optimize the operation and maintenance system.

However, the database of the rail transit operation and maintenance system is not complete in real time, and the database may have an error relationship and an incomplete relationship, and the database also needs to be adjusted continuously according to the service requirements of rail transit. This presents a challenge to maintaining the rail transit operation and maintenance system database in real time. In order to reduce the cost of maintaining the database of the rail transit operation and maintenance system, it is an important subject to research a database completion method of a novel power supply rail transit operation and maintenance system.

The existing link prediction technology considers the effect of single database completion too much, has high time overhead, and is not suitable for the database completion of the rail transit operation and maintenance system. The novel power supply rail transit system database has frequent variability, so the existing research method is not suitable for the novel power supply rail transit system database.

Disclosure of Invention

The invention aims to provide a novel database completion method for a power supply rail transit operation and maintenance system, which combines an active learning mode, can continuously provide a real-time database completion technology for a rail transit operation and maintenance system database, solves the technical problem of completing the database due to rail transit service adjustment and database inherent errors, has short completion time and can realize quick completion.

The technical scheme adopted by the invention for solving the technical problems is as follows:

according to the technical scheme, a clustering method and a graph structure mining method are designed, an active learning mode is combined, abnormal relations and unknown relations in a rail transit database can be rapidly and effectively identified, and after the database is completed by iteration each time, the new database can enable a clustering and graph mining combined screening method to be more accurate.

The invention has the beneficial effects that:

(1) by adopting a database clustering method and a graph structure mining method, a preliminary screening result of possible abnormal relation/unknown relation in the novel power supply rail transit operation and maintenance system is quickly found and identified;

(2) correcting the preliminary screening result by a database self-learning method to obtain a possible normal relation, and calculating the confidence of the possible normal relation;

(3) judging the normal relation of the abnormal relation and the unknown relation of the primary screening result according to the confidence coefficient, returning the abnormal relation and the unknown relation to the power supply rail transit operation and maintenance database, and performing iterative updating, wherein the updated rail transit operation and maintenance database is more perfect compared with the original database;

(4) and (3) returning the updated power supply rail transit operation and maintenance database to the step (1), continuously repeating the iteration step (1) to the step (3) until a normal relation is not judged in the step (3), stopping iteration, completing updating and perfecting the power supply rail transit operation and maintenance database, and enabling the power supply rail transit data to have real-time integrity.

The step 1) is specifically as follows:

1.1) establishing a novel power supply rail transit operation and maintenance system database by using original data acquired during operation of novel power supply rail transit, wherein the database is stored in a server of the novel power supply rail transit operation and maintenance system database, nodes exist in the novel power supply rail transit operation and maintenance system database, the nodes represent entities such as rail entities, train entities and the like and related physical quantities thereof, edges among the nodes represent relations among the nodes, the nodes in the novel power supply rail transit operation and maintenance system database are divided into different node sets by a database clustering method, and meanwhile, the confidence coefficient of each node set is calculated and a confidence coefficient network among the node sets is established;

1.2) calculating the probability that the relation between every two nodes in the database of the novel power supply rail transit operation and maintenance system is an abnormal relation and an unknown relation by a graph structure mining method, and respectively obtaining two probability distributions;

1.3) using the confidence network of 1.1 and the two probability distributions in 1.2, the total weight is calculated using the following formula:

wherein W_ijRepresenting the total weight between node i and node j.

Representing the probability value of the abnormal relation between the node i and the node j,

representing the unknown relationship probability value between node i and node j,

representing a confidence value between node i and node j;

and finally, selecting the relationship between k two nodes with the highest total weight as a primary screening result.

In the step 1.1), after the confidence degrees of the node sets are obtained, the node sets with higher confidence degrees are merged by adopting a greedy method, specifically, the node sets with the confidence degrees higher than a first confidence threshold value are merged.

The step 3) is specifically as follows: and judging the possible normal relationship, taking the possible normal relationship with the confidence coefficient not higher than the second confidence threshold as the abnormal relationship, and taking the possible normal relationship with the confidence coefficient higher than the second confidence threshold as the normal relationship to return to the database of the rail transit operation and maintenance system for database completion.

According to the invention, an active learning mode is adopted, the updated rail transit operation and maintenance system database is subjected to the step 1 and the step 2 again to obtain a more complete result, the confidence coefficient network and the probability distribution are utilized to jointly screen the abnormal relation and the unknown relation in the rail transit operation and maintenance system database, more accurate and higher-precision identification completion processing is obtained, and the completeness and controllability are further brought.

The invention integrates a plurality of fields, provides a database completion method for a novel power supply rail transit operation and maintenance system, and obviously improves the completion effect compared with the traditional method.

The method has integrity and controllability in the aspect of the completion problem of the database in the professional field, improves the precision, and is very suitable for a rail transit operation and maintenance system.

According to the invention, the sampled public database data set is tested, the number of relation tuples is 26076, and the result shows that compared with the traditional method, the clustering and graph mining combined screening method has the advantages that the time overhead is not obviously increased, and the effect is improved by about 20%.

Drawings

FIG. 1 is a flow chart of the steps performed by the present invention.

Fig. 2 is an explanatory diagram of a rail transit database clustering algorithm.

Detailed Description

The technical solution of the present invention will now be further explained with reference to specific embodiments and examples.

Referring to fig. 1, the specific implementation process and the working principle of the present invention are as follows:

(1) firstly, a clustering method and a graph structure mining method are adopted to quickly find and identify abnormal relations and unknown relations in the novel power supply rail transit operation and maintenance system. And obtaining the preliminary screening result of the abnormal relation/unknown relation in the database. The method comprises two steps: firstly, the rail transit database clustering method is used for mining semantic information in a rail transit operation and maintenance database:

a) the database clustering firstly divides the nodes in the database into different sets, and each node forms a corresponding initial set.

b) The track traffic database may contain some erroneous information and may be somewhat duplicated before the initial set. Some merging of the sets in the database is therefore required, as shown in fig. 2. The merge strategy employs a greedy approach: 1. in the first step, a set is selected according to equal probability, and then the similarity between the current set and the set is calculated. 2. Setting a threshold value, and selecting a set with the maximum similarity and exceeding the threshold value to be combined with the current set. Fig. 2 shows the process of merging four node sets into three sets.

c) And for the node sets after combination, calculating possible probabilities among the sets to obtain a probability network between every two nodes.

And then mining the graph structure information in the rail transit system database by adopting a graph structure mining method. The Node2Vec algorithm is adopted to mine graph structure information in the database in specific implementation, the Node2Vec can be regarded as a result and a deepwalk algorithm for DFS and BFS random walk, and the optimization goal of the Node is to map the Node into a mapping function of a mathematical space vector. And establishing a confidence coefficient network between every two nodes through a graph mining algorithm.

The total weight is calculated using the following formula:

wherein, W_ijRepresenting the total weight between node i and node j.

representing the confidence value between node i and node j.

And selecting the relationship between k two nodes with the highest total weight as a primary screening result.

(2) Correcting the preliminary screening result by a database self-learning method to obtain a possible normal relation, and calculating the confidence of the possible normal relation; the implementation adopts a TransE algorithm to calculate the confidence of the possible normal relation. The TransE algorithm can calculate the mathematical space vector representation of the nodes and edges in the database according to the characteristics of the rail transit operation and maintenance database, and calculate the confidence coefficient between each node in the database through the distance of the mathematical space vector. Thereby building confidence levels for possible normal relationships.

(3) And finally, returning the normal relation with the calculated confidence coefficient higher than the confidence threshold value to the rail transit database, and updating the rail transit database.

4) And (4) returning the updated power supply rail transit operation and maintenance database to the step (1), and continuously repeating the iteration steps (1) to (3) until the step (3) does not judge that a normal relation is obtained, and stopping iteration.

Finally, in total, the sampled public database data sets are tested, the total number is 26076, and the result shows that compared with the traditional method, the clustering method and the graph structure mining method have the advantages that the time overhead is not obviously increased, and the effect is improved by about 20%.

Claims

1. A database completion method of a novel power supply rail transit operation and maintenance system is characterized by comprising the following steps: the method comprises the following steps:

1.1) constructing and obtaining a novel power supply rail transit operation and maintenance system database by using original data acquired during the operation of novel power supply rail transit, wherein the database is stored in a server of the novel power supply rail transit operation and maintenance system, nodes exist in the novel power supply rail transit operation and maintenance system database, the nodes in the novel power supply rail transit operation and maintenance system database are divided into different node sets by using a database clustering method, and meanwhile, the confidence coefficient of each node set is calculated and a confidence coefficient network between the node sets is established;

wherein W_ijRepresenting the total weight between node i and node j;

representing a confidence value between node i and node j;

finally, selecting the relationship between k two nodes with the highest total weight as a primary screening result;

(3) judging the normal relation of the abnormal relation and the unknown relation of the primary screening result according to the confidence coefficient, returning the abnormal relation and the unknown relation to the power supply rail transit operation and maintenance database, and performing iterative updating;

(4) returning the updated power supply rail transit operation and maintenance database to the step (1), continuously repeating the iteration steps (1) to (3), and stopping iteration until no normal relation is obtained in the step (3);

in the step 1.1), after the confidence degrees of the node sets are obtained, combining the node sets with higher confidence degrees by adopting a greedy method;