CN114785736B - Bullet train distributed big data transmission optimization method, system, equipment and medium - Google Patents

Bullet train distributed big data transmission optimization method, system, equipment and medium Download PDF

Info

Publication number
CN114785736B
CN114785736B CN202210702301.3A CN202210702301A CN114785736B CN 114785736 B CN114785736 B CN 114785736B CN 202210702301 A CN202210702301 A CN 202210702301A CN 114785736 B CN114785736 B CN 114785736B
Authority
CN
China
Prior art keywords
data
load
transmission
file
optimization
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202210702301.3A
Other languages
Chinese (zh)
Other versions
CN114785736A (en
Inventor
易明中
贾志凯
王辉
李燕
孙鹏
陈彦
岳云峰
李静雪
张莉艳
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Academy of Railway Sciences Corp Ltd CARS
Institute of Computing Technologies of CARS
Beijing Jingwei Information Technology Co Ltd
Original Assignee
China Academy of Railway Sciences Corp Ltd CARS
Institute of Computing Technologies of CARS
Beijing Jingwei Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Academy of Railway Sciences Corp Ltd CARS, Institute of Computing Technologies of CARS, Beijing Jingwei Information Technology Co Ltd filed Critical China Academy of Railway Sciences Corp Ltd CARS
Priority to CN202210702301.3A priority Critical patent/CN114785736B/en
Publication of CN114785736A publication Critical patent/CN114785736A/en
Application granted granted Critical
Publication of CN114785736B publication Critical patent/CN114785736B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/12Avoiding congestion; Recovering from congestion
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/27Replication, distribution or synchronisation of data between databases or within a distributed database system; Distributed database system architectures therefor
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/10Flow control; Congestion control
    • H04L47/24Traffic characterised by specific attributes, e.g. priority or QoS
    • H04L47/2425Traffic characterised by specific attributes, e.g. priority or QoS for supporting services specification, e.g. SLA
    • H04L47/2433Allocation of priorities to traffic types
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/625Queue scheduling characterised by scheduling criteria for service slots or service orders
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L47/00Traffic control in data switching networks
    • H04L47/50Queue scheduling
    • H04L47/62Queue scheduling characterised by scheduling criteria
    • H04L47/625Queue scheduling characterised by scheduling criteria for service slots or service orders
    • H04L47/6275Queue scheduling characterised by scheduling criteria for service slots or service orders based on priority

Abstract

The application discloses a bullet train distributed big data transmission optimization method, which is applied to a motor train unit management information system comprising an iron main center node, a plurality of segment-level intermediate nodes connected with the iron main center node and a plurality of level nodes, and comprises the following steps: the method comprises the steps of generating a data source collection, monitoring database load, optimizing weight and iterating, and optimizing data transmission; and dynamically adjusting the transmission sequence of the selected data sources based on the weighted values of the selected data sources, optimizing the data load of each sub-period, and repeating weight optimization iteration until the optimized data load meets the optimal load principle, so that the data load of each sub-period is subjected to peak clipping and valley filling, and the optimal transmission of the distributed big data is met. The method effectively solves the problem of data transmission congestion caused by unbalanced data transmission in a one-to-many centralized transmission topological structure. The application also discloses a bullet train distributed big data transmission optimization system, equipment and a medium.

Description

Bullet train distributed big data transmission optimization method, system, equipment and medium
Technical Field
The application relates to distributed data transmission management, in particular to a bullet train distributed big data transmission optimization method, system, equipment and medium.
Background
At present, the transmission of big data has an increasingly important position in various kinds of distributed systems. Whether the functions of the distributed system are realized, whether the performance reaches the standard and whether the fault rate is lower than an allowable value depends on high efficiency, accuracy and reliability of data transmission to a great extent.
However, the data sources of a distributed system tend to be cross-regional, cross-platform, complex in composition, and often highly unbalanced in time distribution. Often, there is a large amount of data to be transmitted during a certain period or a certain section. And in some other period or some other section, there is idle again, resulting in more serious transmission imbalance. However, hardware investment and software design of the distributed system must be planned according to peak transmission requirements, which causes waste in idle time, but bottlenecks occur in peak time and may fall into a vicious circle to induce various faults to frequently occur.
Regarding optimization of big data transmission of a distributed system, the prior art scheme:
patent publication No.: CN113497761a, the patent name of the invention is "on-board device and communication system and data transmission method", discloses a data transmission method for a communication system of multiple on-board devices on a motor vehicle.
Patent publication No.: CN110519338B, entitled "a data transmission method based on cooperative communication", discloses a data transmission mechanism based on cooperative communication.
However, most of the prior art does not solve the technical problem of optimizing the transmission of the big data from the source. The traffic is just better to be managed, and if the traffic is not managed from sources such as traffic flow and people flow, the work is always half. If most of the traffic flow and people flow out unevenly, the treatment and optimization of the traffic are usually temporary and permanent, and the traffic jam is normalized in the peak time of the central urban area. Theory and practice have shown that source-based remediation and optimization are higher-level strategies.
Therefore, in order to solve the problems in the prior art, adapt to the management requirements of the data transmission of the railway motor car vehicles, effectively prevent the lack of efficient planning of the data transmission, the uneven busy and idle data, and the great waste of system resources, and meanwhile, the data congestion caused by the congestion in busy hours can cause the main business of the motor car to be incapable of acquiring the key data in real time, and bring potential huge risks to the whole railway system. Therefore, an advanced data transmission optimization technology is urgently needed to be introduced, and a set of relatively perfect distributed big data transmission optimization method and system for the bullet train needs to be established urgently to realize the optimization processing of data transmission and provide technical support for data communication.
Disclosure of Invention
The invention provides a bullet train distributed big data transmission optimization method, a system, equipment and a medium, which can effectively prevent data transmission from lacking planning and causing uneven busy and idle data, greatly waste system resources and avoid data congestion caused by congestion in busy hours.
In a first aspect, an embodiment of the present application provides a motor train unit distributed big data transmission optimization method, which is applied to a motor train unit management information system including a head-end node, a plurality of segment-level intermediate nodes connected to the head-end node, and a plurality of stage nodes, and includes:
a data source collection generation step: in a period of time, sorting all operation and maintenance data of a plurality of segment-level intermediate nodes and a plurality of level nodes according to the length of a data file, selecting a plurality of selected data sources of which the length of the data file is greater than a preset length, and generating a data source collection;
monitoring the load of the database: dividing the time interval into a plurality of sub-time intervals, and calculating and monitoring the data load of the iron master central node in each sub-time interval;
and (3) weight optimization iteration step: based on the time sensitivity parameter value range of the selected data sources, respectively and dynamically adjusting the importance parameter and the file length coefficient of each selected data source through the auxiliary coefficient, and calculating the weight value of each selected data source;
and (3) data transmission optimization: and dynamically adjusting the transmission sequence of the selected data sources based on the weight values of the selected data sources, optimizing the data loads of the sub-periods, and repeatedly executing the weight optimization iteration step until the optimized data loads meet the optimal load model, so that the data loads of the sub-periods are subjected to peak clipping and valley filling, and the optimal transmission of the distributed big data is met.
Preferably, the weight optimization iteration step includes a weight optimization iteration model:
when the time sensitivity parameter TS (j) of the selected data source is smaller than the first time sensitivity preset parameter, the weight value W (j) of the selected data source is as follows: w (j) = Ks (j) × S (j) + Kd (j) × D (j), wherein j =0, 1, … …, N is the maximum number of selected data sources, ks (j) is an auxiliary coefficient of the importance parameter S (j), D (j) is a file length coefficient of the jth selected data source D (j), and Kd (j) is an auxiliary coefficient of the file length coefficient D (j).
Preferably, the optimal load model is:
optimized data load L' (i) of each subinterval: [ | L ' (0) - μ | + | L ' (1) - μ | + … … + | L ' (i) - μ | + … … + | L ' (M) - μ | ] is less than or equal to the preset load imbalance metric value, and [ L ' (0) + L ' (1) + … … + L ' (i) + … … + L ' (M) ] is less than or equal to the preset total load amount, where μ is the average data load over the time period, μ = [ L ' (0) + L ' (1) + … … + L ' (i) + … … + L ' (M) ]/M, L ' (i) is the data load over the optimized i sub-period; i =0, 1, … …, M being the number of sub-periods.
Preferably, the importance parameter S (j) is set based on the total business importance and the urgency of the jth selected data source;
setting a time sensitivity parameter TS (j) for the urgency of the total business time of the iron based on each selected jth data source;
the file length coefficient d (j) is set based on the file length of the j-th selected data source, wherein j =0, 1, … …, N is the maximum number of selected data sources.
Preferably, the data transmission optimizing step further includes:
and (3) downloading an optimization step: the iron master central node generates a scheduling file aiming at data to be downloaded which are downloaded to a plurality of segment-level intermediate nodes and a plurality of nodes in the hierarchy, the scheduling file comprises a plurality of destination IP addresses which are arranged in a hierarchical mode, and the data to be downloaded are matched step by step based on the destination IP addresses which are arranged in the hierarchical mode and are forwarded to the segment-level intermediate nodes and the nodes in the hierarchy.
The download optimization step converts a single transmission from the iron main center to a plurality of lower transmission modes, namely a one-to-many transmission mode into a multi-to-many transmission mode, and the section-level middle node is equivalent to a three-way valve, so that the top-down transmission is divided into the parallel transmission of the same level.
Preferably, the data transmission optimizing step further includes:
and (3) similar data transmission optimization: converting the selected data source file into a binary file with a predetermined format in rows, carrying out XOR operation on two adjacent rows of binary data of the binary file one by one, searching differential bits of the two adjacent rows of binary data, marking and then transmitting the differential bits, and realizing optimization of similar data transmission;
and a peer data abstraction optimization step: and uniformly sending the data files of the same type of a plurality of segment-level intermediate nodes or a plurality of level nodes at the same level at the same time, and realizing that the logic of the node data files at the same level is abstracted into a big data file.
Preferably, the weight optimization iteration step includes: when the time sensitivity parameter TS (j) = p of the selected data source, then for the j-th data source D (j), in the p-th sub-period, the weight value W (j) is a predetermined maximum value, and in other sub-periods, the weight value W (j) =0, where j =0, 1, … …, N is the maximum number of the selected data sources, and p =0, 1, … …, M is the number of the sub-periods.
In a second aspect, an embodiment of the present application provides a distributed big data transmission optimization system for a bullet train, which is applied to a management information system for a bullet train including a central node of the railway complex, a plurality of segment-level intermediate nodes connected to the central node of the railway complex, and a plurality of nodes on a certain level, and adopts any one of the above distributed big data transmission optimization methods for a bullet train, including:
the data source collection generation module: in a period of time, sequencing all operation and maintenance data of a plurality of segment-level intermediate nodes and a plurality of level nodes according to the length of a data file, selecting a plurality of selected data sources of which the length of the data file is greater than a preset length, and generating a data source collection;
the database load monitoring module: dividing the time interval into a plurality of sub-time intervals, and calculating and monitoring the data load of the iron master central node in each sub-time interval;
the weight optimization iteration module: based on the time sensitivity parameter value range of the selected data sources, respectively and dynamically adjusting the importance parameter and the file length coefficient of each selected data source through the auxiliary coefficient, and calculating the weight value of each selected data source;
the data transmission optimizing module: dynamically adjusting the uploading or downloading transmission sequence of the selected data sources based on the weight values of the selected data sources, optimizing the data loads of all sub-periods, and repeatedly executing the weight optimization iteration step until the optimized data loads meet the optimal load model, so that the data loads of all sub-periods are subjected to peak clipping and valley filling, and the optimal transmission of distributed big data is met.
In a third aspect, an embodiment of the present application provides a server device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the bullet train distributed big data transmission optimization method as described in any one of the above.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the program, when executed by a processor, implements the bullet train distributed big data transmission optimization method as described in any one of the above.
Compared with the related prior art, the method has the following outstanding advantages:
1. the bullet train distributed big data transmission optimization method and system provided by the invention can prevent the problems that the transmission error probability is increased, the retransmission probability is increased, the transmission quantity is further increased, the transmission congestion is aggravated, the transmission efficiency is reduced, and the retransmission probability is further increased to fall into vicious circle due to the poor transmission efficiency;
2. the data transmission optimization method provided by the invention is characterized in that the characteristics are abstracted into weights to optimize the transmission of the data based on the characteristics of the data; peak clipping and valley filling of data loads in each time period are realized, and the problem of data transmission caused by extremely uneven idle and busy data transmission is avoided;
3. the invention supports rapid data sending through the scheduling file, the scheduling file realizes the functions of step-by-step data matching and forwarding through the target IP addresses arranged in a grading way, the optimized data forwarding efficiency of the invention is greatly improved, the data transmission time is shortened, and meanwhile, one-to-many transmission is converted into a many-to-many transmission mode;
4. the method supports the conversion of the selected data source file into the binary file with the preset format in rows, searches for the difference bits of the binary data in two adjacent rows, marks the difference bits and transmits the difference bits, greatly reduces the transmission data volume of the similar data, greatly saves the transmission bandwidth, and improves the transmission optimization efficiency of a large amount of similar data;
5. the method supports dynamic adjustment of the auxiliary coefficients of all key parameters through continuous iteration, realizes dynamic adjustment of the transmission sequence of the selected data source, optimizes the data load of each sub-period, realizes peak clipping and valley filling of the data load of each sub-period, and meets the optimized transmission of distributed big data;
6. the invention provides technical support for data communication on the basis of optimizing large data transmission.
Drawings
The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:
FIG. 1 is a flow chart of a distributed big data transmission optimization method of a bullet train according to the present invention;
FIG. 2 is a schematic diagram of a management information system of a motor train unit according to an embodiment of the invention;
FIG. 3 is a diagram of the distributed big data transmission optimization architecture of the bullet train of the present invention;
fig. 4 is a schematic diagram of a hardware structure of an apparatus according to an embodiment of the present application.
In the above figures:
100. motor car distributed big data transmission optimization system
10 data source collection generation module 20 database load monitoring module
30. Weight optimization iteration module 40 data transmission optimization module
Detailed Description
In order to make the objects, technical solutions and advantages of the present application more apparent, the present application will be described and illustrated below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of and not restrictive on the broad application. All other embodiments obtained by a person of ordinary skill in the art based on the embodiments provided in the present application without any inventive step are within the scope of protection of the present application.
It is obvious that the drawings in the following description are only examples or embodiments of the present application, and that it is also possible for a person skilled in the art to apply the present application to other similar contexts on the basis of these drawings without inventive effort. Moreover, it should be appreciated that in the development of any such actual implementation, as in any engineering or design project, numerous implementation-specific decisions must be made to achieve the developers' specific goals, such as compliance with system-related and business-related constraints, which may vary from one implementation to another.
The details of one or more embodiments of the application are set forth in the accompanying drawings and the description below to provide a more thorough understanding of the application.
Reference in the specification to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the specification. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of ordinary skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments without conflict.
Unless defined otherwise, technical or scientific terms referred to herein shall have the ordinary meaning as understood by those of ordinary skill in the art to which this application belongs. Reference to "a," "an," "the," and similar words throughout this application are not to be construed as limiting in number, and may refer to the singular or the plural. The use of the terms "including," "comprising," "having," and any variations thereof herein, is meant to cover a non-exclusive inclusion; for example, a process, method, system, article, or apparatus that comprises a list of steps or modules (elements) is not limited to the listed steps or elements, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus. Reference to "connected," "coupled," and the like in this application is not intended to be limited to physical or mechanical connections, but may include electrical connections, whether direct or indirect. The term "plurality" as referred to herein means two or more. "and/or" describes an association relationship of associated objects, meaning that three relationships may exist, for example, "A and/or B" may mean: a exists alone, A and B exist simultaneously, and B exists alone. The character "/" generally indicates that the former and latter associated objects are in an "or" relationship. Reference herein to the terms "first," "second," "third," and the like, are merely to distinguish similar objects and do not denote a particular ordering for the objects.
In order to solve the problems in the prior art, the invention aims to solve the problem of insufficient data transmission and service connection, and sets the weight of each data source of a motor train unit management information system (EMIS) according to the service so as to optimize the current extensive transmission mode. Because the iron bus and each application station, each high-level maintenance station and each section level system are in a one-to-many centralized transmission topology structure. Therefore, when a certain application, a certain advanced maintenance and a certain section level system suddenly transmit a large amount of data to the iron assembly, the transmission of other applications, other advanced maintenance and other section level systems to the iron assembly can be blocked. At the peak, the data transmitted to the total iron can not be loaded into the total iron database in time, so that key business data such as operation, state of a train set and overhaul performance can not be reported in time, and even the whole-path transmission is interrupted. By analysis, it is currently not possible to change the one-to-many centralized transmission topology, but the most important, most urgent, and most critical data and information fractions are still low. The practical and feasible method is to grade the data and information to be transmitted of each application station, each advanced maintenance station and each section level system according to the importance degree, the emergency degree, the service and the like, and determine the weight. The priority with high weight is high, and the priority with low weight is low. The transmission is first with high priority, and the transmission is then suspended even after low priority. The problem that when a certain application station, a certain advanced maintenance station and a certain section level system suddenly transmit a large amount of data and information to the railway station, the transmission of other application stations, other advanced maintenance stations and other section level systems to the railway station is blocked is reliably and effectively avoided from the source. And when the data and the information with high priority are successfully transmitted to the iron complex and are successfully loaded into the database, feeding back some information to each application institute, each advanced maintenance and each section level system, and recovering the transmission of the data and the information with low priority to the iron complex. Of course, the weights should not be static, and should be dynamically adjusted and optimized according to the changed importance, urgency, traffic and network status, etc. The feedback information of the iron to each application station, each advanced maintenance station and each section level system should be optimized according to the transmission state, the network state and other dynamics. In summary, the invention aims to improve the current extensive transmission mode of a motor train unit management information system (EMIS) into a fine mode, form a closed loop with service, network states and the like, avoid the blockage and interruption of transmission, provide technical support for data communication, and provide data support for life cycle management and digital accurate overhaul.
Fig. 1 is a flowchart of a distributed big data transmission optimization method for a bullet train according to the present invention, and as shown in fig. 1, an embodiment of the present invention provides a distributed big data transmission optimization method for a bullet train, which is applied to a motor train unit management information system (EMIS) including a central node of a central iron assembly, a plurality of segment-level intermediate nodes connected to the central node of the central iron assembly, and a plurality of hierarchical nodes, and includes:
a data source collection generation step S10: in a period of time, sorting all operation and maintenance data of a plurality of segment-level intermediate nodes and a plurality of level nodes according to the length of a data file, selecting a plurality of selected data sources of which the length of the data file is greater than a preset length, and generating a data source collection; the specific embodiment of the present invention may set the preset length of the data file to be greater than or equal to 100kb, but the present invention is not limited thereto, and may also set the preset length of other data files.
Database load monitoring step S20: dividing the time interval into a plurality of sub-time intervals, and calculating and monitoring the data load of the iron master central node in each sub-time interval; in the embodiment of the present invention, the time period is set to be one day, the sub-time period is set to be 1 hour, and 24 sub-time periods exist in one time period.
Weight optimization iteration step S30: based on the time sensitivity parameter value range of the selected data sources, respectively and dynamically adjusting the importance parameter and the file length coefficient of each selected data source through the auxiliary coefficient, and calculating the weight value of each selected data source;
data transmission optimization step S40: and dynamically adjusting the transmission sequence of the selected data sources based on the weight values of the selected data sources, optimizing the data loads of the sub-periods, and repeatedly executing the weight optimization iteration step until the optimized data loads meet the optimal load model, so that the data loads of the sub-periods are subjected to peak clipping and valley filling, and the optimal transmission of the distributed big data is met.
Preferably, the weight optimization iteration step S30 includes a weight optimization iteration model:
when the time sensitivity parameter TS (j) of the selected data source is smaller than the first time sensitivity preset parameter, the weight value W (j) of the selected data source is as follows: w (j) = Ks (j) × S (j) + Kd (j) × D (j), wherein j =0, 1, … …, N is the maximum number of selected data sources, ks (j) is an auxiliary coefficient of the importance parameter S (j), D (j) is a file length coefficient of the jth selected data source D (j), and Kd (j) is an auxiliary coefficient of the file length coefficient D (j). In the embodiment of the present invention, the first time-sensitive predetermined parameter may be set to 0, but the present invention is not limited thereto, and other preset lengths of data files may also be set. Wherein the initial values of Ks (j) and Kd (j) are both set to 1. However, as the iteration progresses, ks (j) and Kd (j) tend to the optimal values. However, the present invention is not limited thereto, and other initial values may be set.
Preferably, the weight optimization iteration step S30 includes:
when the time sensitivity parameter TS (j) = p of the selected data source, then for the j-th data source D (j), in the p-th sub-period, the weight value W (j) is a predetermined maximum value, and in other sub-periods, the weight value W (j) =0, where j =0, 1, … …, N is the maximum number of the selected data sources, and p =0, 1, … …, M is the number of the sub-periods.
In the embodiment of the present invention, the predetermined maximum value of the weight value W (j) may be set to 100, but the present invention is not limited thereto, and the predetermined maximum value of other weight values W (j) may also be set to 100.
Preferably, the optimal load model is:
optimized data load L' (i) of each subinterval: [ | L ' (0) - μ | + | L ' (1) - μ | + … … + | L ' (i) - μ | + … … + | L ' (M) - μ | ] is less than or equal to the preset load imbalance metric value, and [ L ' (0) + L ' (1) + … … + L ' (i) + … … + L ' (M) ] is less than or equal to the preset total load amount, where μ is the average data load over the time period, μ = [ L ' (0) + L ' (1) + … … + L ' (i) + … … + L ' (M) ]/M, L ' (i) is the data load over the optimized i sub-time period; i =0, 1, … …, M being the number of sub-periods.
In the embodiment of the present invention, the preset load imbalance metric value may be set to be less than or equal to 10, but the present invention is not limited thereto, and other preset load imbalance metric values may also be set;
the specific embodiment of the present invention may set the preset total load amount to select the optimal value according to the sum of the actual data loads of each sub-period, but the present invention is not limited thereto, and may also set other preset total load amounts.
Preferably, the importance parameter S (j) is set based on the total business importance and the urgency of the jth selected data source;
setting a time sensitivity parameter TS (j) for the urgency of the total business time of the iron based on each selected jth data source;
the file length coefficient d (j) is set based on the file length of the j selected data source, wherein j =0, 1, … …, N is the maximum number of the selected data sources.
Preferably, in the present invention, the data transmission optimizing step S40 further includes:
and (3) downloading an optimization step: the iron main central node generates a scheduling file aiming at data to be downloaded which are downloaded to a plurality of segment-level intermediate nodes and a plurality of nodes in the same level, the scheduling file comprises a plurality of target IP addresses which are arranged in a hierarchical mode, the data to be downloaded are matched step by step based on the target IP addresses which are arranged in the hierarchical mode and are forwarded to the segment-level intermediate nodes and the nodes in the same level, and one-to-many transmission is converted into many-to-many transmission.
Preferably, in the present invention, the data transmission optimizing step S40 further includes:
and (3) similar data transmission optimization: converting the selected data source file into a binary file with a predetermined format in rows, carrying out XOR operation on two adjacent rows of binary data of the binary file one by one, searching differential bits of the two adjacent rows of binary data, marking and then transmitting the differential bits, and realizing optimization of similar data transmission;
and a peer data abstraction optimization step: and uniformly sending the data files of the same type of a plurality of segment-level intermediate nodes or a plurality of level nodes at the same level at the same time, and realizing that the logic of the node data files at the same level is abstracted into a big data file.
The following detailed description of specific embodiments of the invention refers to the accompanying drawings in which:
1. EMUs Management Information System (EMIS)
In the embodiment of the invention, the invention is based on, but not limited to, large data transmission of a distributed motor train unit management information system (EMIS). As shown in fig. 2, a topology of large data transmission of a distributed "motor train unit management information system (EMIS)";
EMIS EMU (Electric Multiple Unit) Management Information System, namely the 'Motor train Unit Management Information System', is a large-scale distributed operation and maintenance Management Information System covering 18 railway offices: there are 1 iron master center node: iron is total; a number of distributed nodes: including a total of 67 utilities, 7 high-end services, and 25 segment-level systems. Thus, EMIS corresponds to a distributed database, 1 central node database: i.e., a ferrous total database (containing 3 database instances); several node databases: the method comprises (database 1 of the EMIS level, database 2 of the EMIS level, database j of the … … EMIS level, database 67 of the … … EMIS level) applying 67 EMIS level databases, 7 high-level modification databases, (EMIS section level database 1, EMIS section level database 2 … … EMIS section level database i … … EMIS section level database 25) 25 EMIS section level databases.
Based on the consideration of safety, load sharing (or overhead) and the like, the total iron database server does not directly perform bidirectional data transmission with each node database, but performs bidirectional data transmission with each node database through the total iron interface server, and then performs bidirectional data transmission with the total iron database uniformly through the total iron interface server. The method comprises the steps that the iron master interface server is used for isolating the outside from the iron master database; and secondly, the load of the iron total database for data transmission is shared by using the iron total interface server, and the load for large data transmission is large because the data volume is large, particularly the peak data volume is large. The EMIS iron total database can be said to be the most critical core in the whole 'EMIS management information system' (EMIS). Once a fault occurs, the normal operation of the whole motor train unit management information system (EMIS) can be affected. The transmission of data from each application institute of EMIS, each advanced maintenance, each section level system database to the EMIS iron total database is called uploading in the patent; the transmission of data from the EMIS ferrous database to the various utilities, the various high-level services, and the various segment-level system databases is referred to as downloading in this patent.
At present, the management information system (EMIS) of the motor train unit can be roughly divided into three layers and four levels: the first layer, namely the uppermost layer, is an iron master central node and consists of an iron master interface server and an iron master database server; the second level is 25 middle level nodes, namely 25 segment level nodes, which are composed of 25 segment level database servers. Direct bidirectional data transmission is arranged between each middle layer node and the uppermost layer central node; the third level is more specific (including two levels), and the lowest level node includes 25 nodes at all levels, but only 7 nodes at advanced level. This is because not all segment level nodes contain senior nodes, only 7 segment level nodes contain senior nodes. As in fig. 2, the ith segment level node does not contain an advanced repair node. Direct bidirectional data transmission is arranged between each hierarchical node and the central node at the uppermost layer, and direct bidirectional data transmission is also arranged between each advanced repair node and the central node at the uppermost layer. Meanwhile, direct bidirectional data transmission is carried out between some stage nodes and the segment stage nodes to which the stage nodes belong; some segment level nodes including the advanced repair node also have direct bidirectional data transmission with the advanced repair node below. In addition, the lowest-layer node and the advanced repair node can also have indirect bidirectional data transmission between the middle-layer node and the uppermost-layer central node.
The EMIS second and third tier node servers typically contain the same functional modules: scheduling management, job management, technology management, equipment management, logistics management, cost management, security management, quality management and comprehensive management. Certainly, because the services of the nodes are different, the function modules with the same name can generate different uploading data sources and receive various downloading data sources generated by the central node at the uppermost layer. The top-level central node needs to receive all the various uploading data sources generated by the second-level and third-level nodes, generate various downloading data sources, and download the data sources to all the second-level and third-level nodes. Therefore, the biggest bottleneck of the whole motor train unit management information system (EMIS) is the central node at the uppermost layer. Most of the load of the central node is used for receiving various uploading data sources, generating various downloading data and downloading the data to all the second-layer and third-layer nodes. Optimizing large data transmissions for the "EMIS" eliminates or alleviates this maximum bottleneck to a large extent. According to long-term actual engineering practice observation, the factor which has the greatest influence on the load of the EMIS iron total database is proved to be data transmission. The goal of this patent may thus be reduced to optimizing data transfer. And the load of the total iron database is optimized by setting the weight of each data source to optimize data transmission. The weights and rules in the present invention can be viewed as adjustment factors to optimize the total database load.
2. Mathematical modeling:
l (0), L (1), … …, L (i), … …, L (23), wherein 0 represents the 0 th sub-period of a day period, namely, the sub-period from 0 to 1 point in the morning; 23 denotes the 23 rd sub-period of the day, i.e. the sub-period from 23 o 'clock to 24 o' clock. L (i) represents the load of the total EMIS iron database at the ith time period. The 'EMUs Management Information System (EMIS)' described in this patent is matched with a system for monitoring the load of a total railway database according to 24 sub-periods. In the embodiment of the present invention, the sub-period is divided into 24 sub-periods, but the present invention is not limited to this, and other number of sub-periods may be adopted.
2.2, data sources D (1), D (2), … …, D (j), … … and D (n). In the "motor train unit management information system (EMIS)" described in this patent, in order to facilitate the optimization of transmission, a plurality of nodes of the same kind in the second layer or the third layer are merged and abstracted into one logically large node, which is the abstraction of the first layer. For example, 25 segment level nodes of the second layer are merged and abstracted into a logically large segment level node. Since our program has been optimized to produce some sort of data in these 25 segment-level nodes simultaneously and upload them simultaneously. Therefore, for the central node at the uppermost layer, the transmission optimization logic is equivalent to a segment level node for generating and uploading the data. That is, each segment level node generates 1 part of certain data, and for the central node at the uppermost layer, it is equivalent to a logically large segment level node generating 25 parts of this data; similarly, 67 hierarchical nodes of the third layer are merged and abstracted into a logical large hierarchical node; similarly, 7 high-level repair nodes on the third layer are merged and abstracted into a logically large high-level repair node; only the first n data sources with the largest transmission data volume abstracted from the uppermost layer central node, the logically large segment level node and the logically large high level repair node can be considered, and the data sources are the second layer abstraction. For the purpose of optimization, the data amount required to be transmitted by the abstracted jth data source D (j) can be further abstracted into the order in the n data sources, which is a third layer abstraction. If D (j) =1, it indicates that it may be trusted with a greater probability that the amount of data that needs to be transmitted by the jth data source D (j) is the largest among the n data sources, that is, the sequence is 1; if D (j) = n, it indicates that the data amount required to be transmitted by the jth data source D (j) can be trusted with a greater probability is the smallest among the n data sources, i.e., the sequence is n. Since the amount of data to be transmitted by the jth data source D (j) is generally varied, there is a factor, such as Kd (j), to correct when the weight W (j) of the jth data source D (j) is set according to the amount of data to be transmitted by the jth data source D (j). However, the motor train unit management information system (EMIS) is provided with a system for monitoring the transmission quantity of each node, and the variation quantity of the data quantity required to be transmitted by the jth data source D (j) is not too frequent and too large, so that the coefficient Kd (j) is generally not required to be frequently and greatly adjusted.
2.3, importance of data source Significance S (1), S (2), … …, S (j), … …, S (n). In the EMUs Management Information System (EMIS) described in the patent, the services of the same type of nodes are also the same, so that the importance of various services of the same type of nodes is the same. For example, if the data source 1 (corresponding to the service 1) of the 1 st segment level node is the most important, the data sources 1 (corresponding to the service 1) of the 2 nd, … … and 7 segment level nodes are also the most important. If the data source 3 (corresponding to the service 3) of the 1 st segment level node is least important, the data sources 3 (corresponding to the service 3) of the 2 nd, … … and 7 segment level nodes are also least important; the importance S (j) of the above abstracted jth data source is closely related to the service j. The more important the service j is, the larger S (j) is; the more urgent the service j is, the larger S (j) is; the less important the service j is, the smaller S (j) is; the less urgent the service j is, the smaller S (j) is; for optimization, the abstracted importance S (j) of the jth data source D (j) can be further abstracted into an order among the n importance, which is a third-level abstraction. If S (j) =1, D (j) is indicated to be most important; if S (j) = n, D (j) is least important; like D (j), S (j) is one of the factors in setting the weight W (j) of the jth data source. And also has a coefficient Ks (j) to correct S (j), since S (j) is not fixed. However, according to long-term observation, the importance S (j) of the jth data source D (j) does not change too frequently, and thus, the coefficient Ks (j) is generally not required to be adjusted frequently and greatly.
2.4 Time Sensitivity of data Source TS (1), TS (2), … …, TS (j), … …, TS (n). If TS (j) =2, it indicates that the data source D (j) must transmit in 2 slots; if TS (j) =23, it indicates that the data source D (j) must transmit during 23 periods; if the data source D (j) can be transmitted at any time interval, it can be defined as TS (j) <0 on the mathematical model, i.e. without time-sensitivity.
2.5, weight of data source: w (1), W (2), … …, W (j), … …, W (n). W (j) represents the weight of the jth data source. In general, if the importance S (j) of the jth data source D (j) is larger, the weight W (j) of the jth data source D (j) can be set larger. However, since the importance S (j) of the jth data source D (j) is not necessarily large, but the amount of data to be transferred is very large, the weight W (j) of the jth data source D (j) needs to be set to be large. Like D (j), the weight W (j) can also be abstracted as an ordering. If W (j) =1, it indicates that, empirically, it may be possible to confidence with a greater probability that the weight W (j) of the jth data source D (j) is the top among the n data sources, i.e., the rank is 1; if W (j) = n, it indicates that, empirically, it can be trusted with a greater probability that the weight W (j) of the jth data source D (j) is the last among the n data sources, i.e., ordered as n.
2.6, the interface server generates a scheduling file according to the file to be downloaded. The scheduling file is essentially a text file and comprises two parts, namely Tracker information and file information. The Tracker information is mainly addresses of segment-level servers required for forwarding and necessary settings for the servers. The file information is calculated and generated according to a certain verification algorithm for the downloaded file. The interface server need only download a file to one of the segment level servers, not all of the segment level servers. The received file is then forwarded by the segment level server to the exercise authority, the Advance Server, and other segment level servers below it. The other segment-level servers forward the received file to the application, the senior level server and other more segment-level servers. This corresponds to a fission reaction. Say, 1 to 2, 2 to 4, 4 to 8. Alternatively, 1 to 3, 3 to 9, and 9 to 27. Thus, the transmission amount can be greatly reduced, and the transmission time can be greatly shortened. The transmission efficiency is greatly improved, and the loads of the total iron database server and the total iron interface server are also greatly reduced. In summary, one-to-many transmission is converted to many-to-many transmission. The section level middle node is equivalent to a three-way valve, and the transmission from top to bottom is divided into the parallel transmission of the same level.
2.7, transmission of similar information. Our business results in a very similarity between the rows of certain tables in the database. I.e. each row has a number of fields (corresponding to columns), e.g. tens of fields, but the number of fields is equal and only the content of one or a few fields differs from the next row. If the text file transmission mode is adopted, after the text file is converted into the text file, the length of each line in the text file is equal to that of the next line, but only a few characters are different between the two lines. Therefore, row i can be transferred in a full row, but row i +1 only needs to transfer the field or fields that are different from row i. Similarly, row i +2 also only needs to transfer the field or fields that are different from row i +1, and so on. Thus, the transmission amount can be greatly reduced. Not only transmission efficiency is improved, but also loads of the iron total database server and the iron total interface server are lightened. The receiving end finally restores all the rows according to the received ith row, one or more fields of which the i +1 row is different from the ith row, one or more fields of which the i +2 row is different from the i +1 row, and the like. Still further, binary file transfers may be employed. Because there are only a small number of differences, if any, in the fields that differ. For example, a certain field stores a serial number several tens of bits long, but each row of the field is different in only one bit, and the other several tens of bits are all the same. After the (i + 1) th row is converted into a binary file and the (i + 1) th row is subjected to exclusive OR with the (i) th row, the (i + 1) th row is a binary string with the length of hundreds of bits, and only one bit or a few bits of the binary string are different from the (i) th row.
3. The invention can be abstracted as the following optimization problem:
3.1 Given that each L (i), D (j), S (j), TS (j), i =0, 1, … …, i, … …, 23 of an EMIS day represents 24 time periods of the day; j =0, 1, … …, j, … …, n is the first n data sources with the largest data transmission amount abstracted above, D (j) represents the data source with the j th bit of the data transmission amount, and how to set the weight W (j) of the j th data source is determined, so that the total EMIS iron database loads L '(0), L' (1), … …, L '(i), … …, and L' (23) optimized by the present patent have the following characteristics: [ | L '(0) - μ | + | L' (1) - μ | + … … + | L '(i) - μ | + … … + | L' (23) - μ | ] is less than or equal to the preset load imbalance metric value, as small as possible, and [ L '(0) + L' (1) + … … + L '(i) + … … + L' (23) ] is less than or equal to the preset total load, and is also as small as possible. Wherein, L' (0) is the load of 0 time period after the optimization of the patent; l' (i) is the load at time i optimized by this patent; l' (23) is the load for the 23 th time period after optimization by this patent.
Obviously, if L ' (0) = L ' (1) = … … = L ' (23) = μ, where μ is the average load for a certain period of time during the day, i.e., μ = [ L (0) + L (1) + … … + L (i) + … … + L (23) ]/24, [ | L ' (0) - μ | + | L ' (1) - μ | + … … + | L ' (i) - μ | + … … + | L ' (23) - μ | ] is certainly the smallest and zero. Of course, this is only an ideal situation, and it is practically impossible to make the loads in each time period completely consistent after the optimization of the patent. Therefore, the practical aim of the invention is to load peak clipping and valley filling, and to uniformize the load of the total EMIS iron database in each time interval, so that the load in each time interval which originally has great difference can be consistent as much as possible after the optimization of the patent. Differentiation from lean to rich was optimized to be co-rich. The realistic goal is that [ | L ' (0) - μ | + | L ' (1) - μ | + … … + | L ' (i) - μ | + … … + | L ' (23) - μ | ] be as small as possible, where | L ' (i) - μ | represents the absolute value of the difference between L ' (i) and μ, i.e., the degree to which L ' (i) deviates from μ. If the optimization is to be accelerated, the absolute value may be squared instead. But this may lead to increased oscillations in the optimization process. It must be pointed out that: whether [ | L '(0) - μ | + | L' (1) - μ | + … … + | L '(i) - μ | + … … + | L' (23) - μ | ], or { [ L '(0) - μ ]2+ [ L' (1) - μ ]2+ … … + [ L '(i) - μ ]2+ … … + [ L' (23) - μ ]2}, then [ L '(0) + L' (1) + … … + L '(i) + … … + L' (23) ] is not necessarily minimal, but can only be nearly minimal. However, because the load peak clipping and valley filling are carried out in each period, the impact on the load of the EMIS iron total database can be greatly reduced, and the EMIS iron total database can stably run in a state close to the optimal load with a large confidence probability. The level of imbalance between transmission and load is optimized to the maximum. The probability of transmission interruption can be reduced, and the probability of other faults of the EMIS iron total database can be reduced. And hardware investment can be saved, and the investment cost-effectiveness ratio is improved.
3.2 several basic steps to determine the weights W (j) are as follows:
1. when TS (j) = p >0, then for the jth data source D (j), in the pth period, W (j) should be as large as possible, and in other periods, W (j) =0; the jth data source D (j) must transmit during the pth period, or during a period adjacent to the pth period. However, according to long-term observation, the data source ratio of TS (j) = p >0 is small. Even if a data source generated by a certain service has time sensitivity, only a small part of the data source often has time sensitivity, and most data without time sensitivity can be separated out to form a new data source without time sensitivity.
2. When TS (j) < the first time-sensitive predetermined parameter =0, for the jth data source D (j), if S (j) is large, W (j) should be as large as possible; if d (j) is large, W (j) should be as large as possible; therefore, W (j) = Ks (j) × S (j) + Kd (j) × D (j), ks (j) is the coefficient of S (j), kd (j) is the coefficient of D (j), and D (j) is the file length coefficient of the jth data source D (j). The initial values of Ks (j) and Kd (j) may be constant values, such as 1. However, as the iteration progresses, ks (j) and Kd (j) tend to be optimized.
3. When TS (j) 1 )<At 0, if W (j) 1 ) =1, then j 1 The data sources should be arranged in the 1 st small i of the total database load of the EMIS iron 1 Interval 1 st transmission. Thus, L (i) after optimization 1 ) Becomes L' (i) 1 ) L' (i) may be confidence with greater probability, not necessarily the smallest any more 1 )>L(i 1 ) And the valley filling is realized. If it is j 1 The data sources originally arranged at j 1 With a period of transmission, L' (j) can be trusted with a greater probability 1 )<L(j 1 ) Realizing peak clipping;
4. when TS (j) 2 )<At 0, if W (j) 2 ) =2, then j 2 The data sources should attempt to schedule the 1 st smallest i in the EMIS iron total database load 1 Interval 2 transmission. Then L' (i) may be further trusted with greater probability 1 )>L(i 1 ) And further realizing valley filling. But if optimized L' (i) 1 ) Ratio L (i) 1 ) Increased more, then j 2 The data sources should attempt to schedule the 2 nd smallest i in the EMIS Total database load 2 Interval 1 st transmission; then there may be a greater probability of confidence L' (i) 2 )>L(i 2 ) To i, pair 2 And (4) filling the valley in time intervals. If it is j 2 The data source originally scheduled to transmit in the j2 th time interval can be confidence L' (j) with a larger probability 2 )<L(j 2 ) To j pair 2 Peak clipping is realized in time intervals;
5. when TS (j) 3 )<When 0, if W (j) 3 ) =3, then j 3 The data sources should attempt to schedule the 1 st smallest i in the EMIS iron total database load 1 Interval 3 transmission, then L' (i) can be further trusted with greater probability 1 )>L(i 1 ) Further on i 1 The valley filling is realized in time intervals. But if optimized L' (i) 1 ) Ratio L (i) 1 ) Increase more, then j 3 The data sources should be arranged in the 2 nd smallest load of the EMIS iron total database 2 The 1 st transmission of the epoch, then L' (i) may be further trusted with greater probability 2 )>L(i 2 ) Further on i 2 The valley filling is realized in time intervals. But if optimized L' (i) 2 ) Ratio L (i) 2 ) Increase more, then j 3 The data sources should be arranged in the 2 nd smallest load of the EMIS iron total database 2 The 2 nd transmission of the epoch, then L' (i) may be further trusted with greater probability 2 )>L(i 2 ) Further on i 2 The valley filling is realized in time intervals. But if optimized L' (i) 2 ) Ratio L (i) 2 ) Or more increase, then j 3 The data sources should be arranged in the I of the 3 rd smallest load of the EMIS iron total database 3 The 1 st transmission of the epoch, then L' (i) may be further trusted with greater probability 3 )>L(i 3 ) To i, to 3 And (4) filling the valley in time intervals. If the j-th 3 The data source is originally arranged at j 3 With a period of transmission, L' (j) can be trusted with a greater probability 3 )<L(j 3 ) To j pair 3 Peak clipping is realized in time intervals;
6. when TS (j) 4 )<When 0, if W (j) 4 ) =4, analogize … … according to above rule
7. When TS (j) n )<When 0, if W (j) n ) = n, then W (j) n ) At minimum, then j n The data sources can be randomly arranged in the I with smaller load of the EMIS iron total database n The last transmission of a time period, then L' (i) can be trusted with a greater probability n )>L(i n ) To i, pair n The valley filling is realized in time intervals. If the j-th n The data source is originally arranged at j n With a period of transmission, L' (j) can be trusted with a greater probability n )<L(j n ) To j pair n Peak clipping is realized in time intervals; that is to say that the temperature of the molten steel is,the least important data sources may be randomly scheduled to be transmitted last during a period of relatively light load. But if optimized L' (i) n ) Ratio L (i) n ) Increased more, then j n The data sources can be randomly arranged in the I division with smaller load of the EMIS iron total database n Another less loaded period outside the period is the last transmission.
In summary, the data source with the highest weight should be scheduled to transmit first in the time period with the lowest load, and the data source with the lowest weight may be randomly scheduled to transmit last in the time period with the lower load. This is essentially a process of iterative iterations, with an iteration cycle of one day, and the initial ordering of W (j) should be as consistent as possible with the ordering of the inverse of L (i). That is, if the rank of L (i) from large to small before the first iteration is L (23), L (22), … …, L (i), … …, L (0), the rank of the reciprocal of L (i) from large to small is 1/L (0), 1/L (1), … …, 1/L (i), … …, 1/L (23), that is, the rank of the load L (0) at time interval of 0 before the first iteration =23, and the rank is the smallest at the end; the load L (23) at 23 hours is ranked =1, with the highest ranking being the first and the highest load. Then the initial period for the data source D (1), D (2), … …, D (j), … …, D (n) with weight W (1), W (2), … …, W (j), … …, W (n) to schedule transmission in the first iteration may be: 0.1, … …, i, … …, 23. That is, the data source D (1) with the largest weight may be initially scheduled to be transmitted first during the 0 period with the smallest load, and the data source D (n) with the smallest weight may be initially scheduled to be transmitted last during the 23 period with the largest load. Note that: n is often greater than 23, and thus multiple data sources may be scheduled for transmission over the same time period. The next day, i.e., after an iteration cycle, it is reasonable to calculate whether the optimized loads L '(0), L' (1), … …, L '(i), … …, and L' (23) are reasonable, i.e., the order of L '(i) from large to small is not necessarily L' (23), L '(22), … …, L' (i), … …, and L '(0), and [ | + | L' (1) - μ | + … … + | L '(i) - μ | + … … + | L' (23) - μ | ] may become small. And adjusting the time period for each data source to schedule transmission according to the loads L '(0), L' (1), … …, L '(i), … … and L' (23) after the first optimization, and obtaining the loads L '' (0), L '' (1), … …, L '' (i), … … and L '' (23) after the second iteration optimization on the third day. It may sometimes be found that the less loaded period of the first day becomes the more loaded period of the second day and vice versa. Because, too many data sources may be adjusted to be transmitted during the originally less loaded time period, the result is that the originally less loaded time period becomes a more loaded time period. Therefore, this is an iterative optimization process. The transmission interval scheduling of the n +1 th day must be adjusted according to the actual load fed back from the n +1 th day, the transmission interval scheduling of the n +2 th day must be adjusted according to the actual load fed back from the n +1 th day, and so on, … …. But empirically this is a convergent iterative process. That is, after several iterations, [ | L '(0) - μ | + | L' (1) - μ | + … … + | L '(i) - μ | + … … + | L' (23) - μ | ] will decrease greatly, but the slope of the decrease will be gradually gentle, and [ L '(0) + L' (1) + … … + L '(i) + … … + L' (23) ] will also decrease. This is mainly because, after peak clipping and valley filling optimization, the load at each time interval is relatively uniform, so the probability of transmission error is reduced, and the probability of retransmission is correspondingly reduced, thereby reducing the total transmission amount, further optimizing transmission, and forming a virtuous cycle.
The above-described iterative optimization process can be implemented manually or automatically, but with a small amount of necessary manual intervention. The latter approach is recommended.
7. 25 segment level nodes, 67 application nodes and 7 advanced repair nodes of the motor train unit management information system (EMIS) can be abstracted into 99 destinations. The metro interface server originally has to download the same file to 99 destinations, now only 1. If the time required for downloading a file to the 1 st destination is t 1 The time required for downloading to the 2 nd destination is t 2 ,., the time required to drop to the 99 th destination is t 99 Then the total time required to download the file to all 99 destinations is t 1 +......+t i +......+t 99 . Now only the file needs to be downloaded to the ith destination, and the time needed is only t i . Obviously t is i << t 1 +......+t i +......+t 99 . The ith destination is then passed to 3 more, 3 more to 9And 9 passed to the remaining 12 of 25. The total time required is approximately 4t i But 4t i Also has<< t 1 +......+t i +......+t 99 . Most importantly, the transmission amount of the iron master interface server is reduced to about 1/99 of the original transmission amount for the uppermost node. Therefore, the total transmission time is greatly shortened, and the loads of the total iron database server and the total iron interface server are also greatly reduced. Of course, the scheduling file is slightly larger than the original file because it has more Tracker information. The Tracker information is actually a list of IP addresses and a chronological table of forwarding. The first 25 of the 99 destinations are not made to be 25 segment-level nodes, and the segment-level node with the shortest download time can be found basically through long-term observation. Suppose that the segment level node with the shortest download time is the 1 st, i.e. t 1 = min{t 1 ,......,t i ,......,t 99 And then, the downloading and forwarding steps can be as follows: 1) The interface server only downloads a certain file to the 1 st segment level node; 2) The 1 st segment level node forwards the file to the 2 nd, 3 rd and 4 th segment level nodes; 3) The 2 nd segment level node forwards the file to 5 th, 6 th and 7 th segment level nodes, the 3 rd segment level node forwards the file to 8 th, 9 th and 10 th segment level nodes, and the 4 th segment level node forwards the file to 11 th, 12 th and 13 th segment level nodes; 4) The 5 th segment level node forwards the file to the 14 th and 15 th segment level nodes, the 6 th segment level node forwards the file to the 16 th and 17 th segment level nodes, the 7 th segment level node forwards the file to the 18 th and 19 th segment level nodes, the 8 th segment level node forwards the file to the 20 th segment level node, the 9 th segment level node forwards the file to the 21 st segment level node, the 10 th segment level node forwards the file to the 22 th segment level node, the 11 th segment level node forwards the file to the 23 th segment level node, the 12 th segment level node forwards the file to the 24 th segment level node, and the 13 th segment level node forwards the file to the 25 th segment level node. Therefore, the IP address list and the forwarding precedence list contained in the Tracker information are very small and can be almost ignored compared to the original file, because the size of the Tracker information is usually one or several numbers lower than the size of the original fileMagnitude. When all the segment level nodes forward the file to other segment level nodes, the file is also downloaded to the lower application station or the advanced repair node. But this is independent of the ferrobus interface server. Of course, all destinations will also upload information of successful receipt of the file to the metro interface server, which is essentially the same as before. The difference is that a destination that did not successfully receive the file would only require its home to retransmit. If the 25 th segment level node fails to successfully receive the file, the 25 th segment level node will only ask its home and the 13 th segment level node will re-forward the file to it. Whereas the previous 25 th segment level node would require the ferroportic interface server to re-download the file to it. In summary, only if the 1 st segment level node fails to receive the file successfully, the 1 st segment level node will require the ferrobus interface server to re-download the file to it. All other segment level nodes that fail to receive the file will not require the head office interface server to re-download the file to it, but will only require its home to re-forward the file to it. Therefore, the transmission amount and load of the iron master interface server are further reduced. Of course, all information that the destination successfully received the file is fed back to the ferrobus interface server, as before. In a word, after the one-to-many transmission is converted into the many-to-many transmission, the section-level intermediate node is equivalent to a three-way valve, and the transmission from top to bottom is divided into the parallel transmission at the same level, so that the load of the iron main central node is greatly reduced, the transmission efficiency of the whole system is greatly improved, and the transmission time is greatly shortened.
9. If the transmission of the similar information adopts binary file transmission, the actual binary stream to be transmitted is actually a binary stream, and most bits in the binary stream are 0, and only few bits are 1. The simplified mathematical model is as follows: after conversion into binary files, both rows i and i +1 have n binary values, but row i +1 has only m i+1 Each binary value differs from row i, and m i+1 <<n, i.e. m i+1 /n <0.1. Then after XOR between row i +1 and row i, only m i+1 One 1 and the rest are all 0. Therefore, theoretically only m needs to be transmitted i+1 1 can be obtained from line iLine i + 1. Due to m<<n, i.e. m/n<0.1. Similarly, row i +2 has only m i+2 The binary value differs from row i +1, and m i+2 <<n, i.e. m i+2 /n <0.1. Then after the i +2 th row is XOR-ed with the i +1 th row, only m i+2 One 1 and the rest are all 0. Therefore, theoretically only m needs to be transmitted i+2 The 1's result is row i +2 from row i + 1. Therefore, as long as m i+1 <<n, i.e. m i+1 /n < 0.1,m i+2 <<n, i.e. m i+2 /n <The transmission mode can further improve the transmission efficiency and further reduce the load of a total iron database server and a total iron interface server.
In a second aspect, an embodiment of the present application provides a distributed big data transmission optimization system 100 for a multiple unit train, which is applied to a multiple unit train management information system including a central node of a central railway train, multiple segment-level intermediate nodes connected to the central node of the central railway train, and multiple nodes in a hierarchical level, and adopts any one of the distributed big data transmission optimization methods for a multiple unit train, as shown in fig. 3, including:
the data source collection generating module 10: the method comprises the steps that all operation and maintenance data of a plurality of segment-level intermediate nodes and a plurality of level nodes are sorted according to the length of data files in a period, a plurality of selected data sources with the length of the data files larger than a preset length are selected, and a data source collection is generated;
the database load monitoring module 20: the system comprises a data center node, a time interval, a data load monitoring unit, a central processing unit and a central processing unit, wherein the data load monitoring unit is used for dividing the time interval into a plurality of sub-time intervals, and calculating and monitoring the data load of the iron center node in each sub-time interval;
weight optimization iteration module 30: the system is used for dynamically adjusting the importance parameter and the file length coefficient of each selected data source through the auxiliary coefficient based on the time sensitivity parameter value range of the selected data source, and calculating the weight value of each selected data source;
data transmission optimization module 40: the method is used for dynamically adjusting the uploading or downloading transmission sequence of the selected data sources based on the weight values of the selected data sources, optimizing the data loads of all sub-periods, and repeatedly executing the weight optimization iteration step until the optimized data loads meet the optimal load model, so that the data loads of all sub-periods are subjected to peak clipping and valley filling, and the optimal transmission of the distributed big data is met.
In a third aspect, an embodiment of the present application provides a server device, which includes a memory, a processor, and a computer program stored on the memory and executable on the processor, and when the processor executes the computer program, the processor implements the bullet train distributed big data transmission optimization method as described in any one of the above.
In addition, the bullet train distributed big data transmission optimization method of the embodiment of the application described in conjunction with fig. 1 can be implemented by a server device. Fig. 4 is a schematic diagram of a hardware structure of a server device according to an embodiment of the present application.
The computer device may comprise a processor 81 and a memory 82 in which computer program instructions are stored.
Specifically, the processor 81 may include a Central Processing Unit (CPU), or A Specific Integrated Circuit (ASIC), or may be configured to implement one or more Integrated circuits of the embodiments of the present Application.
In some of these embodiments, the computer device may also include a communication interface 83 and a bus 80. As shown in fig. 4, the processor 81, the memory 82, and the communication interface 83 are connected via the bus 80 to complete communication therebetween.
In a fourth aspect, the present application provides a computer-readable storage medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the method for optimizing distributed big data transmission of a bullet train according to any one of the above descriptions.
Compared with the prior art, the method can optimize the large data transmission of the distributed motor train unit management information system (EMIS) from the source, and has certain universality. With the increasing number of large and even ultra-large distributed systems, data transmission, especially transmission of large data, has an increasingly important position and an increasingly difficult position. Because the data sources of a large distributed system are often cross-regional, cross-platform, complex in composition, and often unbalanced in time distribution. If the large data transmission is not optimized from the source, the result is usually half the result and the effect is not good enough. For example, the transmission efficiency is poor, so the transmission error probability is increased, the retransmission probability is increased, the transmission amount is further increased, the transmission congestion is aggravated, the transmission efficiency is reduced, the retransmission probability is further increased, and the system falls into a vicious circle. The core idea of the invention is to utilize the characteristics of the data itself and abstract the characteristics into weights to optimize the transmission of the data itself. In short, the transmission of the data is optimized by utilizing the rule of the data. The increase in hardware, bandwidth, capacity, computing power, etc. of distributed systems has not always kept pace with the increase in data volume. And in many cases, the growth in hardware, bandwidth, capacity, computing power, etc. is rapidly approaching the ceiling, subject to practical constraints. In addition, the topology of the distributed system tends to be more difficult to change. Therefore, optimizing the transmission of the data by using the characteristics of the data is an optimization strategy with half effort, can greatly reduce the probability of falling into vicious circle, and can realize the small-sized maraca to some extent under the condition of not increasing investment. The optimization strategy is suitable for most distributed systems and has a large universal value. The present invention provides technical support for data tunneling, which is beneficial for many distributed systems. For example, the method is favorable for service integration and mining, is favorable for realizing one-stop collection of technical state data of a product in the whole life cycle of design, manufacture, application and maintenance of the whole machine, is favorable for realizing the communication of the whole chain manufacturing and maintenance data such as part design, manufacture, application and maintenance, is favorable for timely mastering the reliability level and the fault development rule and trend of the whole machine and parts, and has very important significance for promoting the digital accurate maintenance and deep repair process modification of equipment.
All possible combinations of the technical features of the above embodiments may not be described for the sake of brevity, but should be considered as within the scope of the present disclosure as long as there is no contradiction between the combinations of the technical features.
The above-mentioned embodiments only express several embodiments of the present application, and the description thereof is more specific and detailed, but not construed as limiting the scope of the invention. It should be noted that, for a person skilled in the art, several variations and modifications can be made without departing from the concept of the present application, which falls within the scope of protection of the present application. Therefore, the protection scope of the present patent application shall be subject to the appended claims.

Claims (7)

1. A bullet train distributed big data transmission optimization method is applied to a motor train unit management information system comprising an iron main center node, a plurality of segment-level intermediate nodes connected with the iron main center node and a plurality of level nodes, and comprises the following steps:
a data source collection generation step: in a period of time, sorting all the operation and maintenance data of the plurality of segment-level intermediate nodes and the plurality of level nodes according to the length of a data file, selecting a plurality of selected data sources of which the length of the data file is greater than a preset length, and generating a data source collection;
monitoring the load of the database: dividing the time interval into a plurality of sub-time intervals, and calculating and monitoring the data load of the iron master central node in each sub-time interval;
and (3) weight optimization iteration step: based on the time sensitivity parameter value range of the selected data source, dynamically adjusting the importance parameter and the file length coefficient of each selected data source through an auxiliary coefficient, and calculating the weight value of each selected data source;
and (3) data transmission optimization: dynamically adjusting the uploading or downloading transmission sequence of the selected data sources based on the weight values of the selected data sources, arranging the data source with the largest weight to be transmitted firstly in the time period with the smallest load, randomly arranging the data source with the smallest weight to be transmitted last in the time period with the smaller load, optimizing the data load of each sub-time period, and repeatedly executing the weight optimization iteration step until the optimized data load meets an optimal load model, so that the data load of each sub-time period is subjected to peak clipping and valley filling, and the optimized transmission of distributed big data is met;
the weight optimization iteration step comprises:
when the time sensitivity parameter TS (j) of the selected data source is smaller than the first time sensitivity preset parameter, the weight value W (j) of the selected data source is as follows: w (j) = Ks (j) × S (j) + Kd (j) × D (j), wherein j =0, 1, … …, N is the maximum number of the selected data sources, S (j) is an importance parameter, ks (j) is an auxiliary coefficient of the importance parameter S (j), D (j) is a file length coefficient of the jth selected data source D (j), and Kd (j) is an auxiliary coefficient of the file length coefficient D (j);
the weight optimization iteration step further comprises:
when the time sensitivity parameter TS (j) = p for the selected data source, then for the j-th data source D (j), in the p-th sub-period, the weight value W (j) is a predetermined maximum value, and in the other sub-periods, the weight value W (j) =0, where j =0, 1, … …, N is the maximum number of the selected data sources, and p =0, 1, … …, M is the number of the sub-periods;
the optimal load model is as follows:
[ | L ' (0) - μ | + | L ' (1) - μ | + … … + | L ' (i) - μ | + … … + | L ' (M) - μ | ] is less than or equal to a preset load imbalance metric value, and [ L ' (0) + L ' (1) + … … + L ' (i) + … … + L ' (M) ] is less than or equal to a preset total load amount, where μ is the average data load among the time periods, and μ = [ L ' (0) + L ' (1) + … … + L ' (i) + … … + L ' (M) ]/M, L ' (i) is the data load for the optimized i sub-period; i =0, 1, … …, M being the number of said sub-periods.
2. The distributed big data transmission optimization method of the bullet train according to claim 1, wherein the importance parameter S (j) is set based on the importance and urgency of the total business of the jth selected data source;
the time sensitivity parameter TS (j) is set based on the urgency of the total business time of the jth selected data source;
the file length coefficient d (j) is set based on the file length of the jth selected data source, wherein j =0, 1, … …, N is the maximum number of the selected data sources.
3. The distributed big data transmission optimization method for the bullet train according to claim 1, wherein the data transmission optimization step further comprises:
and (3) downloading an optimization step: the head-of-iron central node generates a scheduling file aiming at data to be downloaded which are downloaded to the plurality of segment-level intermediate nodes and the plurality of nodes in the hierarchy, the scheduling file comprises a plurality of destination IP addresses which are arranged in the hierarchy, and the data to be downloaded are matched step by step based on the destination IP addresses which are arranged in the hierarchy and are forwarded to the plurality of segment-level intermediate nodes and the plurality of nodes in the hierarchy.
4. The distributed big data transmission optimization method for motor cars according to claim 1 or 3, wherein the data transmission optimization step further comprises:
and (3) similar data transmission optimization: converting the file of the selected data source into a binary file with a predetermined format in rows, performing exclusive OR operation on two adjacent rows of binary data of the binary file one by one, searching differential bits of the two adjacent rows of binary data, marking the differential bits, and transmitting the differential bits to realize optimization of similar data transmission;
and a peer data abstraction optimization step: and uniformly sending the data files of the same type of the plurality of segment-level intermediate nodes or the plurality of level nodes at the same level at the same time, so as to realize that the logic abstraction of the node data files at the same level is a big data file.
5. A bullet train distributed big data transmission optimization system is applied to a motor train unit management information system comprising a railway main center node, a plurality of section-level intermediate nodes connected with the railway main center node and a plurality of level nodes, and the bullet train distributed big data transmission optimization method according to any one of claims 1 to 4 is adopted and comprises the following steps:
the data source collection generation module: the operation and maintenance data processing method comprises the steps that all operation and maintenance data of a plurality of segment-level intermediate nodes and a plurality of level nodes are sorted according to the length of a data file in a period, a plurality of selected data sources with the length of the data file being larger than a preset length are selected, and a data source collection is generated;
the database load monitoring module: the system is used for dividing the time interval into a plurality of sub-time intervals, and calculating and monitoring the data load of the iron master central node in each sub-time interval;
the weight optimization iteration module: the system is used for dynamically adjusting the importance parameter and the file length coefficient of each selected data source through an auxiliary coefficient based on the time sensitivity parameter value range of the selected data source, and calculating the weight value of each selected data source;
the data transmission optimizing module: the method is used for dynamically adjusting the uploading or downloading transmission sequence of the selected data sources based on the weight values of the selected data sources, optimizing the data loads of the sub-periods, and repeatedly executing the weight optimization iteration step until the optimized data loads meet an optimal load model, so that the data loads of the sub-periods are subjected to peak clipping and valley filling, and the optimal transmission of the distributed big data is met.
6. A server device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the computer program implements the method for distributed big data transmission optimization for railcars according to any of claims 1 to 4.
7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements a method for distributed big data transmission optimization of a motor vehicle according to any of claims 1 to 4.
CN202210702301.3A 2022-06-21 2022-06-21 Bullet train distributed big data transmission optimization method, system, equipment and medium Active CN114785736B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210702301.3A CN114785736B (en) 2022-06-21 2022-06-21 Bullet train distributed big data transmission optimization method, system, equipment and medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210702301.3A CN114785736B (en) 2022-06-21 2022-06-21 Bullet train distributed big data transmission optimization method, system, equipment and medium

Publications (2)

Publication Number Publication Date
CN114785736A CN114785736A (en) 2022-07-22
CN114785736B true CN114785736B (en) 2022-10-25

Family

ID=82421016

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210702301.3A Active CN114785736B (en) 2022-06-21 2022-06-21 Bullet train distributed big data transmission optimization method, system, equipment and medium

Country Status (1)

Country Link
CN (1) CN114785736B (en)

Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102030023A (en) * 2006-10-02 2011-04-27 通用电气公司 Method for optimizing railroad train operation for a train including multiple distributed-power locomotives

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9233696B2 (en) * 2006-03-20 2016-01-12 General Electric Company Trip optimizer method, system and computer software code for operating a railroad train to minimize wheel and track wear
US8789062B2 (en) * 2011-04-05 2014-07-22 Teradata Us, Inc. Workload management of a concurrently accessed database server
CN106058855A (en) * 2016-06-16 2016-10-26 南京工程学院 Active power distribution network multi-target optimization scheduling method of coordinating stored energy and flexible load
CN111162524A (en) * 2020-01-08 2020-05-15 中国电力科学研究院有限公司 Control method and system for electric vehicle charging user to access power distribution network
CN112859615B (en) * 2021-01-22 2024-01-30 西安建筑科技大学 Household electricity load optimization scheduling method, system, equipment and readable storage medium
CN113595774B (en) * 2021-07-19 2023-08-01 广西大学 IAGA algorithm-based high-speed train networking topology optimization method

Patent Citations (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102030023A (en) * 2006-10-02 2011-04-27 通用电气公司 Method for optimizing railroad train operation for a train including multiple distributed-power locomotives

Also Published As

Publication number Publication date
CN114785736A (en) 2022-07-22

Similar Documents

Publication Publication Date Title
CN107045667B (en) Measurement and control data transmission integrated station network resource scheduling method
Tormos et al. A genetic algorithm for railway scheduling problems
CN1271444A (en) Transmission unit receiving and storing means
CN101296152A (en) Data scheduling method and system of equity linked network
CN114417417A (en) Industrial Internet of things privacy protection system and method based on federal learning
CN110351780A (en) A kind of communication means based on code cache, system and storage medium
Shukla et al. Fault tolerance based load balancing approach for web resources in cloud environment.
CN114785736B (en) Bullet train distributed big data transmission optimization method, system, equipment and medium
CN106209990A (en) The appreciable request scheduling method of cost under a kind of distribution strange land cloud data center
CN109242240A (en) Task based on unit time distribution and timeliness control develops cloud platform
CN102724100A (en) Board resource distribution system and method aiming at combination service
CN106791932A (en) Distributed trans-coding system, method and its device
CN109889573A (en) Based on the Replica placement method of NGSA multiple target in mixed cloud
Zhao et al. Urban rail transit scheduling under time-varying passenger demand
CN114417577A (en) Cross-platform resource scheduling and optimization control method
CN101193031B (en) Data processing method and network based on distributed hash table
CN113986222A (en) API (application programming interface) translation system for cloud computing
CN114154685A (en) Electric energy data scheduling method in smart power grid
Bogatyrev et al. Redundant maintenance of a non-uniform query stream by a sequence of nodes that are grouped together in groups
CN115022233B (en) Transmission method capable of customizing point-to-multipoint data transmission completion time
CN108234626A (en) A kind of ships data processing method and system
Li et al. Method of Collaborative Scheduling of Acquisition Tasks Based on Artificial Intelligence
CN117237004B (en) Energy storage device transaction processing method and device and storage medium
Chen et al. Task offloading and replication for vehicular cloud computing: A multi-armed bandit approach
CN114841665A (en) LNG distributed energy Internet of things system with multi-level management platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant