CN116708240A

CN116708240A - Time delay dynamic compensation method, device, equipment, medium and system

Info

Publication number: CN116708240A
Application number: CN202310808322.8A
Authority: CN
Inventors: 赵良; 张贺; 魏步征; 董姗; 屈文秀; 满祥锟; 林琳
Original assignee: China United Network Communications Group Co Ltd
Current assignee: China United Network Communications Group Co Ltd
Priority date: 2023-07-03
Filing date: 2023-07-03
Publication date: 2023-09-05

Abstract

The invention discloses a time delay dynamic compensation method, a device, equipment, a medium and a system, wherein the method is applied to synchronous network transmission equipment and comprises the following steps: transmitting a first time synchronization signal to a synchronous network probe device; receiving a first time synchronization signal deviation returned by the synchronous network probe equipment; the first time synchronization signal deviation is obtained by measuring and analyzing the first time synchronization signal by the synchronous network probe equipment, and is the difference value between the first theoretical time and the first actual time; transmitting the second time synchronization signal to the synchronous network transmission equipment; the second time synchronization signal is obtained by compensating the first time synchronization signal according to the deviation of the first time synchronization signal and the reinforcement learning model. The time delay dynamic compensation method is realized based on a reinforcement learning model, and can detect the time delay asymmetry value in real time and automatically compensate.

Description

Time delay dynamic compensation method, device, equipment, medium and system

Technical Field

The present invention relates to the field of communications networks, and in particular, to a method, apparatus, device, medium, and system for dynamically compensating time delay.

Background

In the prior art, the mode of asymmetric delay compensation mainly adopts the modes of probe, instrument or 5G base station built-in and the like to measure fixed bidirectional delay difference, and then carries out delay compensation, but the mode can not automatically carry out compensation, if the mode can not automatically carry out compensation, the following defects exist:

1) When the application environment changes, such as the optical fiber path switching caused by the network OLP switching, the bidirectional delay difference value changes, which affects the synchronization precision and the service quality of the network, and then the bidirectional delay difference value needs to be manually obtained again and compensated again;

2) After each measurement, one-time compensation is carried out according to the measured value, and when the corrected value is larger, the instantaneous jump of clock precision can be caused to influence the service;

3) The delay asymmetry value cannot be detected in real time.

Disclosure of Invention

The invention aims to solve the technical problems of the prior art, and provides a time delay dynamic compensation method, a device, equipment, a medium and a system based on a reinforcement learning model, which can automatically compensate time delay.

In a first aspect, the present invention provides a method for dynamically compensating time delay, which is applied to a synchronous network transmission device, and the method includes the following steps:

Step K1: transmitting a first time synchronization signal to a synchronous network probe device;

step K2: receiving a first time synchronization signal deviation returned by the synchronous network probe equipment;

wherein:

the first time synchronization signal deviation is obtained by measuring and analyzing the first time synchronization signal by the synchronization network probe device, the first time synchronization signal deviation is a difference value between a first theoretical time and a first actual time, the first theoretical time is a theoretical time when the first time synchronization signal arrives at the synchronization network probe device, and the first actual time is an actual time when the first time synchronization signal arrives at the synchronization network probe device;

step K3: compensating the first time synchronization signal according to the first time synchronization signal deviation and the reinforcement learning model to obtain a second time synchronization signal;

step K4: and sending the second time synchronizing signal to the synchronous network probe equipment so as to finish the time delay dynamic compensation.

Further, in the step K3, the first time synchronization signal is compensated according to the deviation of the first time synchronization signal and the reinforcement learning model to obtain a second time synchronization signal, which specifically includes the steps of:

step K31: comparing the first time synchronization signal deviation with a preset threshold value:

When the first time synchronization signal deviation is larger than a preset threshold value, the step K32 is entered; when the deviation of the first time synchronizing signal is smaller than or equal to a preset threshold value, the time delay compensation is not carried out on the first time synchronizing signal, and the first time synchronizing signal is modified into a second time synchronizing signal;

step K32: acquiring an environmental state S corresponding to the first time synchronization signal _t And the environment state S corresponding to the first time synchronization signal _t As input to a reinforcement learning model; the environment state comprises a time error TE, a maximum time interval error MTIE, a time deviation TDEV and a frequency deviation;

step K33: outputting motion compensation in the reinforcement learning model according to the time error TE, the maximum time interval error MTIE, the time deviation TDEV and the frequency deviation to obtain a second time synchronization signal; the step of outputting motion compensation in the reinforcement learning model comprises outputting a compensation direction and an output compensation value step length in the reinforcement learning model, wherein the compensation direction comprises positive compensation, negative compensation and zero compensation, and the compensation value step length is a preset value;

step K34: measuring and analyzing the deviation of the second time synchronizing signal to obtain the deviation of the second time synchronizing signal, and comparing the deviation of the second time synchronizing signal with a preset threshold value:

And when the deviation of the second time synchronization signal is larger than a preset threshold, repeating the steps K32 to K33 until the deviation of the second time synchronization signal is smaller than or equal to the preset threshold, and modifying the first time synchronization signal into the second time synchronization signal.

Further, the reinforcement learning model is a value network and a strategy network based on the first time synchronization signal deviation,

in the step K33, according to the time error TE, the maximum time interval error MTIE, the time deviation TDEV, and the frequency deviation, the motion compensation is output in the reinforcement learning model to obtain a second time synchronization signal, which specifically includes:

the environment state S corresponding to the first time synchronization signal _t Inputting a value network, wherein the value network obtains each possible action value of time delay compensation according to the environment state, and each possible action value comprises positive compensation action, negative compensation action and zero compensation action, and the sum of the probability of the positive compensation action, the probability of the negative compensation action and the probability of the zero compensation action is 1; the environmental state S _t The method comprises the steps of time error TE, maximum time interval error MTIE, time deviation TDEV and frequency deviation;

the strategy network selects delay compensation according to the maximum probability value of each possible action value;

After the synchronous network transmission equipment receives the action, the environment state is changed into the environment state S _t+1 And return a prize value r to the probe device _t ；

Wherein:

when the environment state S _t+1 Maximum absolute value of upper and lower peak values of TE clock performance at the same time as the previous timeEnvironmental state S of (2) _t When the absolute maximum of the upper and lower peaks of TE clock performance is smaller or equal,

or alternatively, the first and second heat exchangers may be,

environmental state S _t+1 MTIE to ambient state S _t The MTIE of (2) is smaller;

or alternatively, the first and second heat exchangers may be,

environmental state S _t+1 TDEV of (2) than ambient state S _t Is smaller;

or alternatively, the first and second heat exchangers may be,

environmental state S _t+1 When the MTIE of (a) is smaller than the ITU-T g.811/g.812 standard specification,

or alternatively, the first and second heat exchangers may be,

environmental state S _t+1 When the TDEV of (c) is smaller than the ITU-T g.811/g.812 standard specification,

or alternatively, the first and second heat exchangers may be,

environmental state S _t+1 Frequency offset ratio environment state S _t When the frequency offset of (c) is closer to 0,

the prize value r _t A bonus value for the bonus;

otherwise prize value r _t To decrease the prize value.

Further, in the step K32, an environmental state S corresponding to the first time synchronization signal is obtained _t After the time range threshold value X is passed, the environment state corresponding to the first time synchronizing signal is obtained;

in the step K33, according to the time error TE, the maximum time interval error MTIE, the time deviation TDEV, and the frequency deviation, the motion compensation is output in the reinforcement learning model to obtain a second time synchronization signal, which includes the following steps:

Determining a time range threshold X and determining a performance threshold Y; wherein: the time range threshold X represents continuous X times t, and the performance threshold Y is the threshold range of the time error TE;

for data in each t time range, calculating the average μ and standard deviation σ of TE;

judging whether the TE value is distributed in the range of (μ -3σ, μ+3σ):

outputting motion compensation as zero compensation in the reinforcement learning model if the average value of TE in the continuous X t time ranges is smaller than Y and the numerical value of TE is distributed in the range of (mu-3 sigma, mu+3 sigma), and determining that the second time synchronization signal is identical to the first time synchronization signal, and replacing the first time synchronization signal with the second time synchronization signal; if the TE average value of any data in the t time range is larger than Y or the numerical distribution of TE is not in the range of (mu-3 sigma, mu+3 sigma), determining that the second time synchronization signal is obtained by positive compensation or negative compensation of the first time synchronization signal.

In a second aspect, the present invention provides a method for dynamically compensating time delay, the method being applied to a synchronous network probe device, the method comprising the steps of:

step S1: receiving a first time synchronization signal transmitted by a synchronous network transmission device;

Step S2: measuring and analyzing the first time synchronization signal to obtain a first time synchronization signal deviation, and transmitting the first time synchronization signal deviation to a synchronous network transmission device;

wherein:

the first time synchronization signal deviation is a difference value between a first theoretical time and a first actual time, the first theoretical time is a theoretical time when the first time synchronization signal arrives at the synchronous network probe device, and the first actual time is an actual time when the first time synchronization signal arrives at the synchronous network probe device;

step S3: receiving a second time synchronization signal transmitted by the synchronous network transmission equipment, thereby completing time delay dynamic compensation; the second time synchronization signal is obtained by compensating the first time synchronization signal according to the deviation of the first time synchronization signal and the reinforcement learning model.

In a third aspect, the present invention provides a delay dynamic compensation apparatus, the apparatus being applied to a synchronous network probe device, the apparatus comprising:

a first receiving unit, configured to receive a first time synchronization signal transmitted by a synchronization network transmission device;

the measurement analysis unit is connected with the first receiving unit and is used for measuring and analyzing the first time synchronizing signal to obtain a first time synchronizing signal deviation and transmitting the first time synchronizing signal deviation to the synchronous network transmission equipment; the first time synchronization signal deviation is a difference value between a first theoretical time and a first actual time, the first theoretical time is a theoretical time when the first time synchronization signal arrives at the synchronous network probe device, and the first actual time is an actual time when the first time synchronization signal arrives at the synchronous network probe device;

The first receiving unit is also used for receiving a second time synchronizing signal transmitted by the synchronous network transmission equipment so as to finish time delay dynamic compensation;

the second time synchronization signal is obtained by compensating the first time synchronization signal according to the deviation of the first time synchronization signal and the reinforcement learning model.

In a fourth aspect, the present invention provides a delay dynamic compensation device, which is applied to a synchronous network transmission apparatus, and the device includes:

the first sending unit is used for sending a first time synchronizing signal to the synchronous network probe equipment;

the second receiving unit is connected with the first transmitting unit and is used for receiving the first time synchronization signal deviation returned by the synchronous network probe equipment; the first time synchronization signal deviation is a difference value between a first theoretical time and a first actual time, the first theoretical time is a theoretical time when the first time synchronization signal arrives at the synchronous network probe device, and the first actual time is an actual time when the first time synchronization signal arrives at the synchronous network probe device;

the compensation unit is connected with the second receiving unit and is used for compensating the first time synchronizing signal according to the first time synchronizing signal deviation and the reinforcement learning model to obtain a second time synchronizing signal;

And the second sending unit is connected with the compensation unit and is used for sending a second time synchronizing signal to the synchronous network probe equipment so as to finish time delay dynamic compensation.

In a fifth aspect, the present invention provides an electronic device comprising a memory and a processor, the memory having a computer program stored therein, the processor performing a delay dynamics compensation method according to the above when the processor runs the computer program stored in the memory.

In a sixth aspect, the present invention provides a computer readable storage medium having stored thereon a computer program which, when executed by a processor, performs a method of dynamic compensation of latency according to the above.

In a seventh aspect, the present invention provides a delay dynamic compensation system, the system comprising a synchronous network probe device and a synchronous network transmission device,

the synchronous network transmission equipment transmits a first time synchronous signal to the synchronous network probe equipment;

the synchronous network probe device sends a first time synchronous signal deviation to synchronous network transmission equipment;

the synchronous network probe equipment receives a second time synchronous signal transmitted by the synchronous network transmission equipment, so that time delay dynamic compensation is completed; the second time synchronization signal is obtained by the synchronization network transmission equipment through compensation of the first time synchronization signal according to the first time synchronization signal deviation and the reinforcement learning model.

Wherein:

the first time synchronization signal deviation is obtained by measuring and analyzing the first time synchronization signal by the synchronization network probe device, the first time synchronization signal deviation is a difference value between a first theoretical time and a first actual time, the first theoretical time is a theoretical time when the first time synchronization signal arrives at the synchronization network probe device, and the first actual time is an actual time when the first time synchronization signal arrives at the synchronization network probe device.

The invention has the beneficial effects that:

1. the invention carries out time delay dynamic compensation based on the reinforcement learning model, thereby being capable of detecting the time delay asymmetry value in real time and automatically compensating.

2. When the network environment changes, the invention can also detect the time delay asymmetry value in real time and automatically compensate, thereby greatly simplifying the workload of time delay measurement and asymmetry compensation and also needing no manual intervention.

3. The present invention is capable of correcting the device synchronization accuracy bias back into the normal range by continuous dynamic compensation during the hold state when the device is out of lock.

4. The invention carries out time delay dynamic compensation based on the reinforcement learning model, thereby realizing time delay compensation in a gradual way, avoiding the influence of sudden large value compensation on the service caused by instantaneous jump of precision and realizing the automation and the intellectualization of network operation and maintenance.

5. The invention can effectively ensure the stability of time delay compensation through setting the time range and the performance threshold value, thereby avoiding the influence of sudden fluctuation on a communication system.

Drawings

Fig. 1 is a schematic diagram of implementing dynamic delay compensation by using reinforcement learning in a synchronous network transmission device according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of 1588v2 topology in an embodiment of the invention;

FIG. 3 is a schematic diagram of a strategy network in a reinforcement learning model according to an embodiment of the present invention;

fig. 4 is a schematic diagram of implementing dynamic delay compensation by using reinforcement learning in a 5G base station apparatus according to an embodiment of the present invention;

fig. 5 is a schematic diagram of a delay dynamic compensation flow in an embodiment of the invention;

FIG. 6 is a schematic diagram of a dynamic delay compensation device according to an embodiment of the present invention;

fig. 7 is a schematic diagram of another delay dynamic compensation device according to an embodiment of the invention.

Wherein, the reference numerals: 10. the system comprises a first receiving unit 20, a measurement analysis unit 30, a first transmitting unit 40, a second receiving unit 50, a compensation unit 60 and a second transmitting unit.

Detailed Description

In order to make the technical scheme of the present invention better understood by those skilled in the art, the following detailed description of the embodiments of the present invention will be given with reference to the accompanying drawings.

It is to be understood that the specific embodiments and figures described herein are merely illustrative of the invention, and are not limiting of the invention.

It is to be understood that the various embodiments of the invention and the features of the embodiments may be combined with each other without conflict.

It is to be understood that only the portions relevant to the present invention are shown in the drawings for convenience of description, and the portions irrelevant to the present invention are not shown in the drawings.

It should be understood that each unit and module in the embodiments of the present invention may correspond to only one physical structure, may be formed by a plurality of physical structures, or may be integrated into one physical structure.

It will be appreciated that, without conflict, the functions and steps noted in the flowcharts and block diagrams of the present invention may occur out of the order noted in the figures.

It is to be understood that the flowcharts and block diagrams of the present invention illustrate the architecture, functionality, and operation of possible implementations of systems, apparatuses, devices, methods according to various embodiments of the present invention. Where each block in the flowchart or block diagrams may represent a unit, module, segment, code, or the like, which comprises executable instructions for implementing the specified functions. Moreover, each block or combination of blocks in the block diagrams and flowchart illustrations can be implemented by hardware-based systems that perform the specified functions, or by combinations of hardware and computer instructions.

It should be understood that the units and modules related in the embodiments of the present invention may be implemented by software, or may be implemented by hardware, for example, the units and modules may be located in a processor.

Example 1:

as shown in fig. 1, 4 and 5, the present embodiment provides a delay dynamic compensation method, which is applied to a synchronous network transmission device,

the method comprises the following steps of K1 to K4:

step K1: and sending the first time synchronization signal to the synchronous network probe device.

Delay dynamic compensation (Dynamic Delay Compensation) is a technique for synchronizing delay measurement and correction in a network. In synchronous networks, certain delays in the transmission of data packets may occur due to unavoidable delays in network transmission. In order to ensure clock synchronization between the various nodes in the network, these delays need to be measured and compensated for. The basic principle of delay dynamic compensation is to estimate the delay by measuring the arrival time difference of the data packets in the network and correct the clock according to the estimated delay value. The specific implementation manner may be to record the sending and receiving time stamps of the data packet on the node in the network, then calculate the time delay of the data packet in the transmission process, and according to the measurement result, adjust the frequency or phase of the local clock to realize the compensation of the time delay.

When the time delay dynamic compensation is carried out, a first time synchronizing signal is required to be sent to the synchronous network probe equipment, the first time synchronizing signal is an accurate time mark or a periodic synchronizing signal, and the probe equipment can be ensured to acquire the most accurate synchronous reference time by sending the synchronizing signal to the synchronous network probe equipment. The synchronous network probe device typically communicates with a synchronous network transmission device and determines the delay in the network by measuring the delay, and the synchronous network probe device is able to detect various delays introduced during data transmission, such as transmission delay, processing delay, propagation delay, and the like. By collecting and analyzing the time delay information, the time delay difference between different nodes in the network can be calculated, the dynamic compensation is carried out, and the time delay compensation strategy is continuously monitored and regulated in real time, so that the whole synchronous network is ensured to always maintain a high-precision and stable synchronous state.

Step K2: and receiving a first time synchronization signal deviation returned by the synchronous network probe device.

Specifically, the first time synchronization signal deviation is obtained by measuring and analyzing the first time synchronization signal by the synchronization network probe device, the first time synchronization signal deviation is a difference value between a first theoretical time and a first actual time, the first theoretical time is a theoretical time when the first time synchronization signal arrives at the synchronization network probe device, and the first actual time is an actual time when the first time synchronization signal arrives at the synchronization network probe device;

Step K3: compensating the first time synchronization signal according to the first time synchronization signal deviation and the reinforcement learning model to obtain a second time synchronization signal; the reinforcement learning model is a value network and a strategy network based on the first time synchronization signal bias.

Specifically, in step K3, the first time synchronization signal is compensated according to the first time synchronization signal deviation and the reinforcement learning model to obtain the second time synchronization signal, which specifically includes the steps of:

when the first time synchronization signal deviation is larger than a preset threshold value, the step K32 is entered; when the deviation of the first time synchronizing signal is smaller than or equal to a preset threshold value, the time delay compensation is not carried out on the first time synchronizing signal, and the first time synchronizing signal is replaced by a second time synchronizing signal;

step K32: acquiring an environmental state S corresponding to the first time synchronization signal _t And the environment state S corresponding to the first time synchronization signal _t As input to a reinforcement learning model; the environmental state includes a time error TE, a maximum time interval error MTIE, a time offset TDEV, and a frequency offset;

step K33: outputting motion compensation in the reinforcement learning model according to the time error TE, the maximum time interval error MTIE, the time deviation TDEV and the frequency deviation to obtain a second time synchronization signal; outputting motion compensation in the reinforcement learning model comprises outputting a compensation direction and an output compensation value step length in the reinforcement learning model, wherein the compensation direction comprises positive compensation, negative compensation and zero compensation, and the compensation value step length is a preset value;

In one embodiment, the reinforcement learning model is a value network and a strategy network derived based on the first time synchronization signal bias.

In step K33, according to the time error TE, the maximum time interval error MTIE, the time deviation TDEV, and the frequency deviation, outputting motion compensation in the reinforcement learning model to obtain a second time synchronization signal, which specifically includes:

the environment state S corresponding to the first time synchronization signal _t Inputting a value network, wherein the value network obtains all possible action values of delay compensation according to the environment state, and all possible action values comprise positive compensation action, negative compensation action and zero compensation action, and the sum of the probability of the positive compensation action, the probability of the negative compensation action and the probability of the zero compensation action is 1; environmental state S _t The method comprises the steps of time error TE, maximum time interval error MTIE, time deviation TDEV and frequency deviation;

Wherein:

when the environment state S _t+1 The absolute maximum value of the upper peak and the lower peak of TE clock performance at the moment is higher than the environmental state S at the last moment _t When the absolute maximum of the upper and lower peaks of TE clock performance is smaller or equal,

or alternatively, the first and second heat exchangers may be,

environmental state S _t+1 TDEV of (2) than ambient state S _t Is smaller;

or alternatively, the first and second heat exchangers may be,

prize valuer _t A bonus value for the bonus;

otherwise prize value r _t To decrease the prize value.

In one embodiment, in step K32, the environmental state S corresponding to the first time synchronization signal is acquired _t After the time range threshold value X is passed, the environment state corresponding to the first time synchronizing signal is obtained;

in step K33, according to the time error TE, the maximum time interval error MTIE, the time deviation TDEV, and the frequency deviation, outputting motion compensation in the reinforcement learning model to obtain a second time synchronization signal, including the steps of:

judging whether the TE value is distributed in the range of (μ -3σ, μ+3σ):

and when the deviation of the second time synchronization signal is larger than a preset threshold, repeating the steps K32 to K33 until the deviation of the second time synchronization signal is smaller than or equal to the preset threshold, and replacing the first time synchronization signal with the second time synchronization signal.

Wherein:

the first time synchronization signal deviation is obtained by measuring and analyzing the first time synchronization signal by the synchronization network probe device, the first time synchronization signal deviation is the difference value between first theoretical time and first actual time, the first theoretical time is the theoretical time for the first time synchronization signal to reach the synchronization network probe device, and the first actual time is the actual time for the first time synchronization signal to reach the synchronization network probe device; and by analogy, the nth time synchronizing signal deviation is obtained by measuring and analyzing the nth time synchronizing signal by the synchronizing network probe equipment, the nth time synchronizing signal deviation is the difference value between the nth theoretical time and the nth actual time, the nth theoretical time is the theoretical time for the nth time synchronizing signal to reach the synchronizing network probe equipment, and the nth actual time is the actual time for the nth time synchronizing signal to reach the synchronizing network probe equipment; n is a natural number greater than 1.

Example 2:

as shown in fig. 1, 3 and 4, the present embodiment provides a delay dynamic compensation method, which is applied to a synchronous network probe device, and the method includes the following steps:

specifically, the first time synchronization signal deviation is a difference value between a first theoretical time and a first actual time, the first theoretical time is a theoretical time when the first time synchronization signal arrives at the synchronous network probe device, and the first actual time is an actual time when the first time synchronization signal arrives at the synchronous network probe device;

Example 3:

as shown in fig. 6, this embodiment provides a delay dynamic compensation device, which corresponds to the method of embodiment 1, and is applied to a synchronous network probe device, and the device includes:

a first receiving unit 10, configured to receive a first time synchronization signal transmitted by a synchronous network transmission device;

The measurement analysis unit 20 is connected with the first receiving unit 10, and is configured to measure and analyze the first time synchronization signal to obtain a first time synchronization signal deviation, and send the first time synchronization signal deviation to the synchronous network transmission device; the first time synchronization signal deviation is the difference value between a first theoretical time and a first actual time, wherein the first theoretical time is the theoretical time for the first time synchronization signal to reach the synchronous network probe device, and the first actual time is the actual time for the first time synchronization signal to reach the synchronous network probe device;

the first receiving unit 10 is further configured to receive a second time synchronization signal transmitted by the synchronous network transmission device, thereby completing dynamic time delay compensation;

wherein: the second time synchronization signal is obtained by compensating the first time synchronization signal according to the deviation of the first time synchronization signal and the reinforcement learning model.

Wherein the delay compensation is based on the environmental state S corresponding to the first time synchronization signal _t And the environment state S corresponding to the first time synchronization signal _t As an input to the reinforcement learning model, outputting motion compensation in the reinforcement learning model; the environmental conditions include time error TE, maximum time interval error MTIE, time offset TDEV, and frequency offset.

Example 4:

as shown in fig. 7, this embodiment provides a delay dynamic compensation device, which is applied to a synchronous network transmission device, and the device includes:

a first transmitting unit 30, configured to transmit a first time synchronization signal to a synchronous network probe device;

the second receiving unit 40 is connected to the first transmitting unit 30, and is configured to receive a first time synchronization signal deviation returned by the synchronous network probe device; the first time synchronization signal deviation is the difference value between a first theoretical time and a first actual time, wherein the first theoretical time is the theoretical time for the first time synchronization signal to reach the synchronous network probe device, and the first actual time is the actual time for the first time synchronization signal to reach the synchronous network probe device;

the compensation unit 50 is connected to the second receiving unit 40, and is configured to compensate the first time synchronization signal according to the first time synchronization signal deviation and the reinforcement learning model, so as to obtain a second time synchronization signal;

and the second transmitting unit 60 is connected with the compensating unit 50 and is used for transmitting a second time synchronizing signal to the synchronous network probe device so as to complete time delay dynamic compensation.

Example 5:

as shown in fig. 1, the scheme adopts reinforcement learning technology in machine learning, and adds a relevant decision function module in synchronous network probe equipment, synchronous network transmission equipment or 5G base station equipment, thereby realizing a dynamic time delay compensation mechanism. The specific process of delay compensation for synchronous network transmission equipment is as follows:

For synchronous network transmission equipment, since no reference signal exists, synchronous network probe equipment with a satellite antenna is externally connected to measure the time performance of a synchronous network clock.

When the initial state of the opening is reached:

1) The clock performance state of the synchronous network transmission device serves as an environment, and a clock and time signal is output to the synchronous network probe device.

2) The synchronous network probe device is equivalent to an agent in reinforcement learning, and performs performance measurement on clock and time signals of the synchronous network transmission device based on satellite signals of the agent as a reference to obtain an environmental state s. s contains features such as TIE, TE, etc.

3) A state analysis function is added to the probe apparatus, and for the environmental state s, outliers are removed, and the average value of data and the maximum absolute value of the upper and lower peaks of TE numerical distribution in (mu-3 sigma, mu+3 sigma) are calculated according to the 3 sigma principle. And calculating MTIE, TDEV and frequency offset.

4) Above each timestamp t, the output action a of the probe device to the synchronous network transmission device _t For compensation direction, positive compensation, negative compensation and uncompensated are included. Each offset value step may be set, for example, to 1ns.

5) And after the synchronous network transmission equipment receives the action at the moment t, corresponding time delay compensation is carried out on the input port of the synchronous network transmission equipment.

6) State change after receiving action of synchronous network transmission equipment to s _t+1 And return the prize value r _t 。

7) Prize value r returned by synchronous network transmission equipment to probe equipment _t . When s is _t+1 The absolute maximum value of the upper peak and the lower peak of TE clock performance at the same time is higher than the previous time s _t When the absolute maximum of the upper and lower peaks of TE is smaller or equal, or s _t+1 Ratio s _t When MTIE, TDEV is smaller than ITU-T G.811/G.812 standard prescribed value, or s _t+1 Ratio s _t When the frequency offset is closer to 0, the prize value r _t To give a bonus value, otherwise the bonus value r _t To decrease the prize value.

The most critical link is the strategy of action decision. Here a neural network is created, using as input the environmental state s, comprising four features, respectively: TE, MTIE, TDEV, frequency offset, i.e. a vector of length 4. The action a to be performed is taken as output. The probability of each action is estimated, and an action is randomly selected based on the estimated probability. The probability distribution of the motion is pi _θ (a|s), where θ is a parameter of the policy function pi, can be parameterized by a neural network _θ A function. As shown in fig. 3. The sum of the probabilities of all actions is 1, i.e. Σ _a∈A Pi theta (a|s) =1, where a is the set of all actions. Pi _θ The network represents the agent's policies, called policy network, as shown in fig. 3.

The neural network comprises an input layer, a plurality of full-connection hidden layers in the middle and an output layer. The output node is 3, representing a probability distribution of 3 actions. The gradient descent algorithm is adopted to optimize the network, and the optimization goal is to make the occurrence probability of the bonus points bigger and the occurrence probability of the bonus points smaller. During interaction, selecting action a with maximum probability _t ＝argmax _a And pi theta (a|st) is used as a decision result to act in a delay compensation environment.

During the stable period after compensation, the clock performance is only slightly changed under normal conditions, dynamic compensation is not required in real time every second, a time range threshold X and a performance threshold Y can be set for the time range, when the average value of data with TE values distributed in (mu-3 sigma, mu+3 sigma) is smaller than Y in continuous X t time ranges, the dynamic compensation is not performed, and otherwise, the dynamic compensation starts to be restarted.

Specifically, as shown in fig. 2, the 1588v2 topology diagram is that the end transmission device of the synchronization network outputs clock and time signals to the 5G base station device, so as to meet the high-precision synchronization requirement of the 5G service. And simultaneously, a clock and a time signal can be output to the synchronous network probe equipment so as to monitor and measure the signal precision. The output clock signals include, but are not limited to, synchronous Ethernet SyncE, 2Mbit/s, 2MHz, 10MHz signals, etc., and the output time signals include, but are not limited to, 1PPS, 1PPS+ToD, PTP signals, etc.

The synchronous network probe device, and the 5G base station device supporting the measurement function, may measure output signals of the end device based on the beidou/GPS satellite reference signals, including measuring time interval deviation TIE (Time Interval Error) and time deviation TE (Time Error), and calculating maximum time interval error MTIE (Maximum Time Interval Error), time deviation TDEV (Time Deviation), frequency offset, and the like based on TIE.

As shown in fig. 4, in this embodiment, the delay compensation for the 5G base station apparatus is specifically as follows:

for the 5G device, since the 5G device itself has a condition of installing a satellite antenna, i.e., itself has a capability of measuring the performance of an input signal using a satellite signal as a reference, the 5G base station device itself can integrate an environment and an agent. As shown in fig. 4. The operation steps are the same as described above.

For example, the TE offset range for the clock/time input signal of the 5G base station is 100-160 ns. When X is set to 10, the performance threshold setting Y is set to 10 ns. Since the average value 130ns exceeds the threshold, a dynamic compensation mechanism is run. After reinforcement learning, the neural network outputs an action for the first time, if the action is 1ns, the state after compensation becomes 101-161 ns, and compared with the state that the maximum value is changed from 160 to 161 and the absolute value is larger, the neural network reduces the rewarding value. If the action is compensated for-1 ns, the state after compensation becomes 99-159 ns, and the bonus awards are added. With this cycle, dynamic compensation is finally achieved to compensate-130 ns, and the signal deviation range is-30 ns. At this point, steady state is reached and only monitoring is performed without compensation. When the network structure suddenly changes, such as cutting, fault and the like, and the signal deviation exceeds the threshold value, dynamic compensation is performed again, so that the problem of dynamic time delay compensation is solved.

Example 5 corresponds to examples 1 to 4.

Example 6:

based on the same technical idea, the present embodiment provides an electronic device, which includes a memory and a processor, the memory storing a computer program, and the processor executing the delay dynamic compensation method according to embodiment 1 when the processor runs the computer program stored in the memory.

Example 7:

based on the same technical idea, the present embodiment provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, performs the delay dynamic compensation method according to embodiment 1.

Example 8:

the embodiment provides a time delay dynamic compensation system, which comprises synchronous network probe equipment and synchronous network transmission equipment,

the synchronous network probe device receives a second time synchronization signal transmitted by the synchronous network transmission device; the second time synchronization signal is obtained by compensating the first time synchronization signal according to the deviation of the first time synchronization signal and the reinforcement learning model.

Wherein:

the first time synchronization signal deviation is obtained by measuring and analyzing the first time synchronization signal by the synchronization network probe device, the first time synchronization signal deviation is the difference value between first theoretical time and first actual time, the first theoretical time is the theoretical time for the first time synchronization signal to arrive at the synchronization network probe device, and the first actual time is the actual time for the first time synchronization signal to arrive at the synchronization network probe device.

It is to be understood that the above embodiments are merely illustrative of the application of the principles of the present invention, but not in limitation thereof. Various modifications and improvements may be made by those skilled in the art without departing from the spirit and substance of the invention, and are also considered to be within the scope of the invention.

Claims

1. A method for dynamically compensating time delay, which is applied to synchronous network transmission equipment, characterized in that the method comprises the following steps:

Wherein:

2. The method of claim 1, wherein,

in the step K3, the first time synchronization signal is compensated according to the deviation of the first time synchronization signal and the reinforcement learning model to obtain a second time synchronization signal, which specifically includes the steps of:

3. The method of claim 2, wherein,

the reinforcement learning model is a value network and a strategy network based on the first time synchronization signal bias,

Wherein:

or alternatively, the first and second heat exchangers may be,

environmental state S _t+1 TDEV of (2) than ambient state S _t Is smaller;

or alternatively, the first and second heat exchangers may be,

the prize value r _t A bonus value for the bonus;

otherwise prize value r _t To decrease the prize value.

4. The method of claim 2, wherein,

in the step K32, an environmental state S corresponding to the first time synchronization signal is obtained _t After the time range threshold value X is passed, the environment state corresponding to the first time synchronizing signal is obtained;

judging whether the TE value is distributed in the range of (μ -3σ, μ+3σ):

5. A method for dynamically compensating time delay, which is applied to synchronous network probe equipment, and is characterized by comprising the following steps:

wherein:

6. A delay dynamic compensation device applied to a synchronous network probe device, characterized in that the device comprises:

7. A delay dynamic compensation device applied to synchronous network transmission equipment, characterized in that the device comprises:

8. An electronic device comprising a memory and a processor, the memory having a computer program stored therein, the processor performing the delay dynamics compensation method according to any one of claims 1 to 4 or claim 5 when the processor runs the computer program stored in the memory.

9. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, performs the delay dynamics compensation method according to any one of claims 1 to 4 or claim 5.

10. A time delay dynamic compensation system is characterized by comprising synchronous network probe equipment and synchronous network transmission equipment,

the synchronous network probe equipment receives a second time synchronous signal transmitted by the synchronous network transmission equipment, so that time delay dynamic compensation is completed; the second time synchronization signal is obtained by the synchronization network transmission equipment through compensation of the first time synchronization signal according to the first time synchronization signal deviation and the reinforcement learning model;

Wherein: