CN114598425B

CN114598425B - BATS code-based data transmission method, device, equipment and readable storage medium

Info

Publication number: CN114598425B
Application number: CN202210293052.7A
Authority: CN
Inventors: 刘恒; 王士恒; 马征; 苏金领; 周权; 杨思远
Original assignee: Southwest Jiaotong University
Current assignee: Southwest Jiaotong University
Priority date: 2022-03-23
Filing date: 2022-03-23
Publication date: 2023-03-17
Anticipated expiration: 2042-03-23
Also published as: CN114598425A

Abstract

The invention provides a data transmission method, a device, equipment and a readable storage medium based on BATS codes, relating to the technical field of wireless communication and comprising the steps of obtaining at least one file to be transmitted; optimizing the value of the BATS code according to a reinforcement learning algorithm to obtain an optimal value, wherein the optimal value is the value with the highest transmission efficiency at the current moment and in the current environment state; the invention provides a BATS code transmission method replacing channel detection estimation, which utilizes a reinforcement learning mode to search for the optimal value in the transmission process of the BATS code, and takes the transmitted channel condition as the environment, and makes the transmission process gradually reach the corresponding optimal value under the current channel condition through continuous transmission, learning and adjustment of the value.

Description

Data transmission method, device and equipment based on BATS code and readable storage medium

Technical Field

The present invention relates to the field of wireless communication technologies, and in particular, to a bat code-based data transmission method, apparatus, device, and readable storage medium.

Background

Before data transmission is performed by using the bat code, channel detection and estimation need to be performed on a transmitted channel, and after the process is completed, a transmitting end can determine information of a current channel, such as hop count and packet loss rate of a transmission network. According to the obtained channel conditions, the sending end can calculate the optimal degree distribution in the current transmission scene, so as to realize efficient transmission, which needs a large amount of test data. And when the channel condition is poor, the packet loss rate is high and the hop count of the transmission network is large, more data is needed to perform detection estimation on the channel, the transmission time is wasted, and the transmission resources are consumed. However, there is no transmission solution of the bat code for the above phenomenon.

Disclosure of Invention

The present invention is directed to a method, an apparatus, a device and a readable storage medium for data transmission based on bat codes, so as to solve the above problems. In order to achieve the purpose, the technical scheme adopted by the invention is as follows:

in a first aspect, the present application provides a data transmission method based on a bat code, including: acquiring at least one file to be transmitted; optimizing the value of the BATS code according to a reinforcement learning algorithm to obtain an optimal value, wherein the optimal value is the value with the highest transmission efficiency at the current moment and in the current environment state; and using the optimal value as the value of the BATS code, and transmitting all the files to be transmitted to a receiving end after generating batches through the BATS code coding.

In a second aspect, the present application further provides a data transmission device based on a bat code, including: the device comprises a first acquisition unit, a second acquisition unit and a transmission unit, wherein the first acquisition unit is used for acquiring at least one file to be transmitted; the optimization unit is used for optimizing the value of the BATS code according to a reinforcement learning algorithm to obtain an optimal value, wherein the optimal value is the value with the highest transmission efficiency at the current moment and in the current environment state; and the transmission unit is used for using the optimal value as the value of the BATS code, generating batches of all the files to be transmitted through the BATS code, and transmitting the batches to a receiving end.

In a third aspect, the present application further provides a bat code-based data transmission device, including:

a memory for storing a computer program;

a processor configured to implement the steps of the data transmission method based on the BATS code when the computer program is executed.

In a fourth aspect, the present application further provides a readable storage medium, where a computer program is stored, and when the computer program is executed by a processor, the steps of the data transmission method based on the bat code are implemented.

The invention has the beneficial effects that:

the invention provides a BATS code transmission method for replacing channel detection estimation, which utilizes a reinforcement learning mode to explore an optimal value in the transmission process of the BATS code, and takes the transmission channel condition as the environment to ensure that the transmission process gradually reaches the corresponding optimal value under the current channel condition through continuous transmission, learning and adjustment of the value. Meanwhile, the method can also quickly cope with the problem that the channel condition changes, when the channel changes, the transmission times are bound to change, at the moment, the optimal value changes according to the calculation of the feedback function, when the environmental condition is stable, the changed value can adapt to the transmission channel of the current environmental condition, and compared with the prior art, the channel does not need to be tested and evaluated again. The transmission time and the consumed transmission resources are reduced.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by the practice of the embodiments of the invention. The objectives and other advantages of the invention will be realized and attained by the structure particularly pointed out in the written description and claims hereof as well as the appended drawings.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the embodiments will be briefly described below, it should be understood that the following drawings only illustrate some embodiments of the present invention and therefore should not be considered as limiting the scope, and for those skilled in the art, other related drawings can be obtained according to the drawings without inventive efforts.

Fig. 1 is a schematic flow chart of a data transmission method based on bat codes according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of a bat code-based data transmission device according to an embodiment of the present invention;

fig. 3 is a schematic structural diagram of a second logic unit according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a bat code-based data transmission device according to an embodiment of the present invention.

The labels in the figure are: 1. a first acquisition unit; 2. an optimization unit; 21. a second acquisition unit; 22. a first initialization unit; 23. a first calculation unit; 231. a first transmission unit; 232. a third acquisition unit; 233. a first calculation subunit; 24. a first pre-judging unit; 25. a second calculation unit; 251. a second transmission unit; 252. a first update unit; 253. a fourth acquisition unit; 254. a second calculation subunit; 26. a third calculation unit; 27. a first logic unit; 271. a first logic subunit; 272. a second logic subunit; 273. a fifth obtaining unit; 274. a third logical subunit; 275. a third computing subunit; 276. a fourth calculation subunit; 277. a second updating unit; 278. a second logic unit; 28. a first circulation unit; 291. an acquisition and initialization unit; 292. a fourth calculation unit; 293. a second pre-judging unit; 294. a third updating unit; 295. a fourth updating unit; 296. a fifth calculation unit; 297. a sixth calculation unit; 298. a seventh calculation unit; 299. a second circulation unit; 3. a transmission unit; 801. a processor; 802. a memory; 803. a multimedia component; 804. an I/O interface; 805. a communication component.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. The components of embodiments of the present invention generally described and illustrated in the figures herein may be arranged and designed in a wide variety of different configurations. Thus, the following detailed description of the embodiments of the present invention, as presented in the figures, is not intended to limit the scope of the invention, as claimed, but is merely representative of selected embodiments of the invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that: like reference numbers and letters refer to like items in the following figures, and thus, once an item is defined in one figure, it need not be further defined and explained in subsequent figures. Meanwhile, in the description of the present invention, the terms "first", "second", and the like are used only for distinguishing the description, and are not to be construed as indicating or implying relative importance.

In order to more effectively utilize network coding characteristics to achieve better data transmission performance in wireless communication transmission and reduce the computational complexity and delay of the whole system, shenghao Yang et al introduced 2011 a concept of batch Sparse Codes (BATS Codes). The BATS code is a novel coding and decoding scheme combining fountain codes and network codes, the BATS code uses the concept of the fountain codes to generate enough batch information to transmit data packets at a transmitting end, random linear network coding is carried out on the received data packets in the same batch at a relay node and the data packets are transmitted to a receiving end, the receiving end does not need to consider channel conditions, and decoding of all the data packets can be completed only by receiving enough batch data packets. The BATS code can well integrate the advantages of fountain codes and network codes, and simultaneously make up the defect of high computational complexity of the network codes. The bat code has a higher throughput compared to the fountain code, and has a lower coding complexity compared to the network coding, which is similar to the fountain code, and also has a good non-rate characteristic. Therefore, the BATS code combines the advantages of the fountain code and the network code, has higher throughput, lower complexity and no-rate characteristic, and finally achieves the effect of low delay under the condition of ensuring the high reliability of a transmission system.

Example 1:

the embodiment provides a data transmission method based on BATS codes.

Referring to fig. 1, it is shown that the method comprises step S100, step S200 and step S300.

S100, obtaining at least one file to be transmitted.

And S200, optimizing the value of the BATS code according to a reinforcement learning algorithm to obtain an optimal value, wherein the optimal value is the value with the highest transmission efficiency at the current moment and in the current environment state.

It should be noted that, in the present application, the bat code itself is not changed, and details about the bat code construction process are not described in the present application. As known to those skilled in the art, the output of the reinforcement learning algorithm is a Q table, that is, the value corresponding to the maximum Q value in the Q table is selected as the optimal value in the present application. Specifically, the median value is regarded as a state in strong learning in the present application.

And S300, using the optimal value as the value of the BATS code, encoding all files to be transmitted by the BATS code to generate batches, and transmitting the batches to a receiving end.

In the prior art, before the transmission of the bat code, the channel needs to be detected and estimated, and then the data transmission can be performed by setting the optimal value according to the channel condition. However, this procedure requires a large amount of test data to be transmitted, and when the channel condition changes, the optimal transmission parameters obtained based on the previous channel detection and estimation will no longer be applicable, and the test data needs to be retransmitted to obtain new channel parameters.

In the application, an enhanced learning mode is used to search for an optimal value in the transmission process of the BATS code. In the transmission process of the BATS code, the transmission channel condition is used as the environment, and the transmission process gradually reaches the corresponding optimal value under the current channel condition through continuous transmission, learning and adjustment of the value. Meanwhile, the method can also quickly cope with the problem that the channel condition changes, when the channel changes, the transmission times must change, at the moment, the optimal value changes according to the calculation of the feedback function, and when the environmental condition is stable, the changed value can adapt to the transmission channel of the current environmental condition. Compared with the prior art, the method does not need to test and evaluate the channel again. The transmission time and the consumed transmission resources are reduced.

Meanwhile, in the present application, the existing Q learning algorithm, sarsa algorithm, and the like may be directly used in the strong learning algorithm. It should be understood that all reinforcement learning algorithms are output in the form of a converged Q-table, and the state space of the reinforcement learning algorithm is set as all possible values in the present application, that is, the state is listed in the last output Q-table, and the value corresponding to the state is also the value of the value. And selecting the corresponding state with the maximum Q value as the optimal value from the Q table of the convergence output.

Specifically, the Q learning algorithm is exemplified in the present application.

Step S200 includes steps S210, S220, S230, S240, S250, S260, S270, and S280.

And S210, acquiring the batch number of the BATS codes.

It should be noted that the number of BATS code batches is artificially defined, and may be different in different use environments, and for convenience of understanding, M is referred to as the number of batches in the present application.

And S220, initializing to generate a Q table and randomly generating an initial state, wherein the Q table is a table with 3 x batches of times, and the numerical value of the initial state is a value of the receiving end solvability.

It should be noted that, in this step, the number of states is set to the number of degrees selectable in transmission, that is, the states included in the state space of the Q learning algorithm used are sequentially increased from 1 to M, and the total number is M. I.e., S = S, (S =1, 2.., M), where S is a state space and S represents this state, and indicates that the transmitting end will perform data transmission with S as a value. Meanwhile, the action space of the Q learning algorithm contains three actions in total, and the method specifically comprises the following steps: the method comprises the steps of subtracting one from the value of the BATS code in the current state to carry out data transmission, keeping the value of the BATS code in the current state to carry out data transmission, and adding one to the value of the BATS code in the current state to carry out data transmission.

And S230, calculating to obtain initial average batch times and transmission times by taking the initial state as a basis, wherein the initial average batch times are the number of batches required for completing unit data transmission in the initial state, and the transmission times are the times of BATS code data transmission.

Specifically, the calculation method for the initial average batch number in this step is as follows:

and S231, taking the value corresponding to the initial state as a value of the BATS code, and transmitting the first information to a receiving end for a preset number of times after the first information is coded by the BATS code, wherein the first information is a data packet with a preset number.

It should be noted that, in the present application, the preset number of data packages may be selected according to actual situations, for example, all data packages obtained by using one file code to be transmitted in the process of searching the optimal value, or k data packages in all data packages of one file code to be transmitted are transmitted. It should be noted that the preset number of times in this step is preferably 5 times, and may be other numbers such as 6 times, and the present application is not limited specifically. The purpose is to reduce the influence of environmental factors on the number of received batches by calculating the same transmission in S233 for multiple times, and eliminate the influence of sudden environmental changes on value exploration.

S232, acquiring the initial total batch number, wherein the initial total batch number is the number of the batches received under the preset number feedback by the receiving end.

It should be noted that, in the present application, the number of batches is counted and stored by using Count.

And S233, calculating to obtain the initial average batch number according to the initial total batch number and the preset transmission number.

It is understood that in the present application the number of deliveries is stored using Times and is set to Times =1.

The calculation formula of the step is as follows:

Num＝Count(S)/Times(S)

where Num is the average batch number at the initial time, i.e., the initial average batch number, count (S) represents the number of batches in the current state, and Times (S) represents the transmission number in the current state.

And S240, calculating the next action in the current state according to the Q table and executing the next action.

Note that a greedy policy is used in the present application to determine the next action. The specific process is as follows: and establishing a greedy algorithm, determining the next action a in the action space through the greedy algorithm, and executing the action determined by the greedy algorithm. Since how to specifically implement the greedy policy to determine the next action is prior art, it is not described in this application.

And S250, calculating to obtain the state average batch number and the updated transmission number according to the transmission number and the execution result in the next action, wherein the state average batch number is the number of batches required for completing the unit data transmission in the next action.

It should be noted that the present application determines the state average batch number by actual transmission. Specifically, in the present application, step S250 includes step S251, step S252, step S253, and step S254.

And S251, taking the value in the next action as the value of the BATS code, and transmitting the first information to a receiving end for preset times after the first information is coded by the BATS code, wherein the first information is a data packet with preset quantity.

This step is required to be the same as S231 except that the use value is different.

And S252, updating the transmission times to a value after one is added.

Namely, the updating formula in the step is as follows:

Times(S_)＝Times(S_)+1

wherein Times (S _) is the number of passes in the next action.

And S253, acquiring the state total batch number, wherein the state total batch number is the number of the batches received under the preset number fed back by the receiving end.

The total number of status lots in this step is denoted by E (S _).

And S254, calculating according to the state total batch number, the transmission number and a preset first formula group to obtain the state average batch number.

The first formula group in this step is as follows:

LastNum＝Num

Num＝(Count(S_)+E(S_))/Times(S_)

in this step, count (S _) is the number of batches in the initial action, E (S _) is the total number of state batches, and Num is the average number of state batches corresponding to the next action at the current time, i.e., the average number of state batches. LastNum is a memory action where the Num value before storage represents the average number of batches at the initial time.

It should be noted that, since the Q learning algorithm is a way of continuously performing iterative computation, it can be understood by those skilled in the art that, in this step, the number of the batches cumulatively stored in Count (S _) is the total number of the batches accepted by the receiving end before the next action is performed.

And S260, calculating according to the state average batch and a preset condition judgment formula to obtain a transmission state, wherein the transmission state comprises success and failure.

Specifically, for the calculation of the transmission state, step S261, step S262, and step S263 are included in the present application.

And S261, if the condition judgment formula is satisfied, updating the transmission state of the receiving end to be failure, otherwise, updating the transmission state of the receiving end to be success.

Meanwhile, it should be noted that the condition judgment formula in this step is:

Num>k×LastNum

where, num and LastNum are defined in step S254, k is the number of data packets in one transmission, which can be referred to the explanation in step S231. It should be noted that, the actual meaning of the condition judgment formula in this step is: and comparing the average batch number corresponding to the current state with the product of the average batch number corresponding to the next action at the current moment and the transmission data packet, and if the average batch number corresponding to the current state is larger, determining that the batch is not solvable.

Meanwhile, similar to step S254, this step is in a loop, and in the subsequent determination process, num and LastNum both need to update their values for determination.

And S262, acquiring the current time and the starting time for executing the next action.

S263, if the absolute value of the difference between the current time and the start time is greater than the preset threshold, updating the transmission status of the receiving end to failure.

That is, in this step, there are two judgment transmission states, the first one is obtained by the ratio calculation mentioned in S261, and the other one is based on time as a judgment basis. It should be noted that, in the time determination, a person skilled in the art may select the preset threshold according to the actual situation, and no specific limitation is made in this application.

S270, judging the transmission state, and if the transmission state of the receiving end is successful, updating the Q table, the current action and the current state.

It should be noted that, updating the current action and the current state in this step is in the prior art, and is not described in detail in this application.

Here, in the determination of the transmission state, step S270 includes step S271, step S272, step S273, and step S274.

S271, if the transmission status of the receiving end is failure, setting the next action corresponding to the current status and the reward expectation corresponding to the status action to negative infinity in the Q table, where the status action is the next action corresponding to all statuses after the current status.

It should be noted that, in this step, to transmit the value with the value corresponding to the current state, a situation of batch unavailability may occur, in which a corresponding position in the Q table is set to negative infinity, which represents that the action is prohibited from being used in this state, and corresponding values of all actions in the state following this state on the Q table are set to be prohibited from being used.

And S272, calculating a feedback value by using a preset third formula and the state average batch number.

The preset third formula in this step is as follows:

R＝LastNum-Num

where R is the feedback value, and Num and LastNum are defined in step S254.

And S273, calculating to obtain an updated value according to a preset second formula, the feedback value and the Q table, and updating the Q value of the next action corresponding to the current state to be the updated value.

The second formula in step is:

Q(S,a)＝Q(S,a)+α(R+Q(S_,a’)-Q(S,a))

q (S, a) is the Q value corresponding to the action a in the current state, alpha is the updating step length preset to be 0.1, R is the feedback value, and Q (S _, a') is the Q value corresponding to the next action.

And S274, updating the current action as the next action, and updating the current state as the value corresponding to the next action.

And S280, the next action in the current state is calculated according to the Q table and executed again until the preset maximum learning times or the Q table converges.

In the above steps S210 to S280, that is, the Q learning algorithm is used to combine the design of the feedback function with the number of batches consumed for each transmission, and the number of states is set as the number of selectable degrees during transmission, so that the transmission process gradually reaches the corresponding optimal value under the current channel condition through continuous transmission, learning and adjustment of the metric value. Meanwhile, the method can also quickly deal with the problem that the channel condition changes, when the channel changes, the transmission times are changed, the optimal value is changed according to the calculation of the feedback function, and the changed value is adapted to the current transmission channel.

However, as the transmission progresses, it is most desirable that the Q table converges to a certain state, and the value corresponding to the state is the best value in the current transmission environment. However, when the number of states is large and if the number of transmissions increases with the value, the first reinforcement learning algorithm (the Q learning algorithm above) may not converge to a certain state, but to two or more states, the total number of which is significantly smaller than the number of states in the first reinforcement learning algorithm. At this time, a second reinforcement learning is required, and the reinforcement learning algorithm needs to modify the state number and the motion space to a certain extent to achieve the purpose of converging to a value.

That is, step S200 is also included in the method, and step S290 is also included in the method.

And S290, judging whether the Q table is converged to a state, if not, learning and optimizing the Q table based on a second reinforcement learning algorithm, and updating the Q table to be the Q table after learning and optimizing.

Specifically, the present application includes step S291, step S292, step S293, step S294, step S295, step S296, step S297, step S298, and step S299.

And S291, acquiring at least two convergence values and the number of the convergence values in the Q table, wherein the convergence values are values reaching a convergence state, initializing to generate a second Q table and randomly generating a second initial state, the Q table is a table of the number of the convergence values to the number of the convergence values, and the value of the initial state is a value capable of being resolved by a receiving end.

For convenience of description, the number of convergence values is hereinafter referred to as m. In the strong learning algorithm in the step, m state spaces exist, wherein each state in the state spaces is transmitted by a value reaching a convergence state; the action space has m actions, wherein each action in the action space is transmitted by the value jumping to the convergence state on the basis of the current value, and the total number of the actions is m.

S292, calculating a second initial average batch number according to the second initial state, where the second initial average batch number is a number of batches required for completing the transmission of the unit data in the initial state.

It should be noted that the calculation process in this step is similar to step S230, and is not repeated in this step.

And S293, calculating the next action in the current state according to the second Q table.

It should be noted that, in this step, a greedy algorithm is also used to obtain the next action, the calculation process in this step is similar to step S240, and details are not described in this step.

S294, updating the next state to be the value corresponding to the next action.

S295, look up the next state in the Q table corresponding to the expected action, and update the next action to be the expected action, where the expected action is the action corresponding to the next state with the maximum Q value in the Q table.

And step S296, calculating the average batch number of the state according to the execution result in the next action.

It should be noted that the calculation process in this step is similar to step S250, and is not repeated in this step.

S297, the value obtained by subtracting the state average batch number from the initial average batch number is used as a feedback value.

It should be noted that the calculation process in this step is similar to step S272, and is not repeated in this step.

And S298, calculating to obtain an updated value according to a preset second formula, the feedback value and the second Q table, updating the reward expectation of the next action corresponding to the current state to an updated value, updating the current state to a next state, and updating the current action to be the next action.

It should be noted that the calculation process in this step is similar to step S273, and is not repeated in this step.

And S299, restarting to calculate the next action in the current state according to the second Q table until the preset maximum learning times or the second Q table is converged, and updating the Q table into the second Q table.

In the application, firstly, in the transmission process of the BATS code, reinforcement learning is introduced to replace channel detection and estimation, so that the value is continuously optimized in the transmission process, and the optimal value is obtained. In the application, in the two reinforcement learning algorithms, the state space definition and the action selection of the reinforcement learning algorithm are respectively designed, and meanwhile, the feedback function of the reinforcement learning algorithm can obtain a good learning effect according to the transmission consumption batch. For such cases where the transmission was unsuccessful due to a large value, the first reinforcement learning algorithm has made an additional processing method. Finally, in the face of the situation of unconvergence possibly caused by the first learning, the invention provides a scheme for carrying out secondary learning by adopting a new reinforcement learning algorithm, the scheme carries out redesign on a state space and actions, and can more directly obtain an optimal value compared with the first reinforcement learning algorithm, thereby ensuring that the value can be converged to a certain fixed value in the learning process.

Compared with the traditional BATS code in data transmission, in the prior art, channel detection and estimation are needed, and after determining information such as packet loss rate, the sending end can calculate the optimal degree distribution in the current transmission scene, so that efficient transmission is realized. Therefore, a large amount of test data is consumed before transmission, transmission time is wasted, and transmission resources are consumed.

The transmission scheme provided by the invention does not need to detect and estimate the channel, but firstly appoints a value to transmit, and continuously optimizes the value by adopting the reinforcement learning algorithm designed by the invention along with the continuous transmission to finally obtain an optimal transmission value, so that the data can be transmitted immediately without using a large amount of test data or extra test time. And the channel condition may change with the change of the environment, but the channel condition obtained by detecting the channel in the prior art is not suitable any more, and the channel detection estimation needs to be performed again to determine a new optimal degree distribution. This consumes more data and more time to transmit. The reinforcement learning method provided by the invention can solve the problem that the number of consumed batches is inevitably changed after the channel condition is changed, and the Q table is changed after the feedback function is calculated, so that a new transmission value is obtained in the continuous transmission process, the method is suitable for a new transmission environment, is more flexible, does not need a large amount of test data, and does not waste time. When the channel condition changes and is severe, the degree distribution performance obtained by the traditional scheme may suddenly slide down, even the transmission is interrupted, but the scheme of the invention can quickly adapt to the channel change and maintain the normal operation of the transmission.

Example 2:

as shown in fig. 2 to fig. 3, the present embodiment provides a bat code-based data transmission device, which includes:

the device comprises a first obtaining unit 1, which is used for obtaining at least one file to be transmitted.

And the optimizing unit 2 is used for optimizing the value of the BATS code according to a reinforcement learning algorithm to obtain an optimal value, wherein the optimal value is the value with the highest transmission efficiency at the current moment and in the current environment state.

And the transmission unit 3 is used for using the optimal value as the value of the BATS code, generating batches of all files to be transmitted through the BATS code, and transmitting the batches to the receiving end.

In some specific embodiments, the optimization unit 2 comprises:

the second acquiring unit 21 is configured to acquire the batch number of the bat code.

The first initializing unit 22 is configured to initialize and generate a Q table and a randomly generated initial state, where the Q table is a table of 3 × batch times, and a numerical value of the initial state is a value solvable by the receiving end.

The first calculating unit 23 is configured to calculate an initial average batch number and a transmission number according to the initial state, where the initial average batch number is a number of batches required to complete transmission of unit data in the initial state, and the transmission number is a number of times that the bat code transmits data.

And the first pre-judging unit 24 is used for calculating the next action in the current state according to the Q table and executing the next action.

A second calculating unit 25, configured to calculate a state average batch number and an updated transmission number according to the transmission number and the execution result in the next action, where the state average batch number is the number of batches required to complete the unit data transmission in the next action.

The third calculating unit 26 calculates a transmission status according to the status average batch and a preset condition judgment formula, where the transmission status includes success and failure.

The first logic unit 27 is configured to determine a transmission status, and if the transmission status of the receiving end is successful, update the Q table, the current action, and the current status.

A first looping unit 28 for resuming the calculation of the predicted action in the current state from the Q table and executing the predicted action until a preset maximum number of learning or the Q table converges.

In some specific embodiments, the first calculating unit 23 includes:

the first transmission unit 231 is configured to use a value corresponding to the initial state as a value of the bat code, and transmit the first information to the receiving end for a preset number of times after the first information is encoded by the bat code, where the first information is a preset number of data packets.

The third obtaining unit 232 is configured to obtain an initial total batch number, where the initial total batch number is a number of batches received under transmission of a preset number fed back by the receiving end.

A first calculating subunit 233, configured to calculate an initial average batch number according to the initial total batch number and a preset number of transmissions.

In some specific embodiments, the second computing unit 25 includes:

the second transmission unit 251 is configured to use the value in the next action as the value of the bat code, and transmit the first information to the receiving end for the preset number of times after the first information is encoded by the bat code, where the first information is a preset number of data packets.

The first updating unit 252 is configured to update the transmission times to a value after adding one.

The fourth obtaining unit 253 is configured to obtain a total number of status batches, where the total number of status batches is the number of batches received by the preset number of times fed back by the receiving end.

And a second calculating subunit 254, configured to calculate an average batch number of states according to the total batch number of states, the transmission number of times, and a preset first formula group.

In some specific embodiments, the first logic unit 27 includes:

a first logic subunit 271, configured to set, in the Q table, a next action corresponding to the current state and an incentive expectation corresponding to the state action to negative infinity if the transmission state of the receiving end is failure, where the state action is a next action corresponding to all states after the current state.

In some specific embodiments, the first logic unit 27 further includes:

a second logic subunit 272, configured to update the transmission state of the receiving end to be failed if the state average batch number is greater than the product of the initial average batch number and the transmission number, otherwise, to be successful.

A fifth acquiring unit 273 for acquiring the current time and the start time for performing the next action.

A third logic subunit 274, configured to update the transmission status of the receiving end to be failed if the absolute value of the difference between the current time and the start time is greater than a preset threshold.

In some specific embodiments, the first logic unit 27 further includes:

and the third calculating subunit 275 is configured to calculate a feedback value according to a preset third formula and the state average batch number.

The fourth calculating subunit 276 is configured to calculate an updated value according to the preset second formula, the feedback value, and the Q table, and update the Q value of the next action corresponding to the current state to an updated value.

The second updating unit 277 is configured to update the current action as the next action, and update the current status as the value corresponding to the next action.

In some specific embodiments, the optimization unit 2 further includes:

the second logic unit 278 is configured to determine whether the Q table converges to a state, and if the Q table does not converge to a state, perform learning optimization on the Q table based on a second reinforcement learning algorithm, and update the Q table to be the learning-optimized Q table.

In some specific embodiments, the second logic unit 278 further includes:

an obtaining and initializing unit 291, configured to obtain at least two convergence values and the number of the convergence values in the Q table, where the convergence values are values that reach a convergence state, and initialize to generate a second Q table and a second initial state, where the Q table is a table in which the number of the convergence values is the number of the convergence values, and the value of the initial state is a value that can be resolved by a receiving end.

The fourth calculating unit 292 is configured to calculate a second initial average batch number according to the second initial state, where the second initial average batch number is a number of batches required to complete transmission of unit data in the initial state.

The second pre-judging unit 293 is configured to calculate a next action in the current state according to the second Q table.

The third updating unit 294 is configured to update the value corresponding to the next action in the next state.

The fourth updating unit 295 is configured to look up the next state in the Q table corresponding to the expected action, and update the next action as the expected action, where the expected action is the action corresponding to the next state with the largest Q value in the Q table.

The fifth calculating unit 296 is configured to calculate the state average batch number according to the execution result in the next action.

A sixth calculating unit 297 is configured to use the value of the initial average batch number minus the state average batch number as a feedback value.

The seventh calculating unit 298 is configured to calculate an updated value according to a preset second formula, the feedback value, and the second Q table, update the Q value of the next action corresponding to the current state to an updated value, update the current state to a next state, and update the current action to be the next action.

And a second looping unit 299, configured to restart calculating a next action in the current state according to the second Q table until the preset maximum learning number or the second Q table converges, and update the Q table to the second Q table.

It should be noted that, regarding the apparatus in the above embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated herein.

Example 3:

corresponding to the above method embodiment, the present embodiment further provides a data transmission device based on the bat code, and a data transmission device based on the bat code described below and a data transmission method based on the bat code described above may be referred to in correspondence.

Fig. 4 is a block diagram illustrating a bat code based data transmission device 800 according to an example embodiment. As shown in fig. 4, the bat code-based data transmission device 800 may include: a processor 801, a memory 802. The BATS code based data transmission device 800 may also include one or more of a multimedia component 803, an I/O interface 804, and a communication component 805.

The processor 801 is configured to control the overall operation of the bat code-based data transmission device 800, so as to complete all or part of the steps in the bat code-based data transmission method. The memory 802 is used to store various types of data to support operation of the bat code-based data transmission device 800, such data including, for example, instructions for any application or method operating on the bat code-based data transmission device 800, as well as application-related data such as contact data, transceived messages, pictures, audio, video, and so forth. The Memory 802 may be implemented by any type of volatile or non-volatile Memory device or combination thereof, such as Static Random Access Memory (SRAM), electrically Erasable Programmable Read-Only Memory (EEPROM), erasable Programmable Read-Only Memory (EPROM), programmable Read-Only Memory (PROM), read-Only Memory (ROM), magnetic Memory, flash Memory, magnetic disk, or optical disk. The multimedia components 803 may include screen and audio components. Wherein the screen may be, for example, a touch screen and the audio component is used for outputting and/or inputting audio signals. For example, the audio component may include a microphone for receiving external audio signals. The received audio signal may further be stored in the memory 802 or transmitted through the communication component 805. The audio assembly also includes at least one speaker for outputting audio signals. The I/O interface 804 provides an interface between the processor 801 and other interface modules, such as a keyboard, mouse, buttons, etc. These buttons may be virtual buttons or physical buttons. The communication component 805 is used for wired or wireless communication between the bat code-based data transmission device 800 and other devices. Wireless communication, such as Wi-Fi, bluetooth, near Field Communication (NFC), 2G, 3G, or 4G, or a combination of one or more of them, so that the corresponding communication component 805 may include: wi-Fi module, bluetooth module, NFC module.

In an exemplary embodiment, the bat code-based data transmission Device 800 may be implemented by one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), digital Signal Processing Devices (DSPDs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), controllers, microcontrollers, microprocessors or other electronic components for performing the above-mentioned bat code-based data transmission method.

In another exemplary embodiment, a computer-readable storage medium including program instructions for implementing the steps of the above-described bat code-based data transmission method when executed by a processor is also provided. The computer readable storage medium may be, for example, the memory 802 described above that includes program instructions that are executable by the processor 801 of the bat code-based data transmission device 800 to perform the bat code-based data transmission method described above.

Example 4:

corresponding to the above method embodiment, this embodiment further provides a readable storage medium, and a readable storage medium described below and a data transmission method based on the bat code described above may be referred to in correspondence with each other.

A readable storage medium, on which a computer program is stored, which, when being executed by a processor, implements the steps of the data transmission method based on the bat code of the above-mentioned method embodiment.

The readable storage medium may be a usb disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and various other readable storage media capable of storing program codes.

The above is only a preferred embodiment of the present invention, and is not intended to limit the present invention, and various modifications and changes will occur to those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily think of the changes or substitutions within the technical scope of the present invention, and shall cover the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the claims.

Claims

1. A data transmission method based on BATS code is characterized by comprising the following steps:

acquiring at least one file to be transmitted;

optimizing the value of the BATS code according to a reinforcement learning algorithm to obtain an optimal value, wherein the optimal value is the value with the highest transmission efficiency at the current moment and in the current environment state;

and using the optimal value as the value of the BATS code, and transmitting all the files to be transmitted to a receiving end after generating batches through the BATS code coding.

2. The BATS code-based data transmission method of claim 1, wherein the optimizing the BATS code value according to a reinforcement learning algorithm to obtain an optimal value comprises:

acquiring the batch number of the BATS codes;

initializing a Q table and a randomly generated initial state, wherein the Q table is a table of 3 times of batches, and the numerical value of the initial state is a solvable value of the receiving terminal;

calculating to obtain initial average batch times and transmission times by taking the initial state as a basis, wherein the initial average batch times are the number of batches required for completing unit data transmission in the initial state, and the transmission times are the times of BATS code data transmission;

calculating the next action in the current state according to the Q table and executing the next action;

calculating to obtain state average batch number and updated transmission number according to the transmission number and the execution result in the next action, wherein the state average batch number is the number of batches required for completing unit data transmission in the next action;

calculating to obtain a transmission state according to the transmission times, the state average batch times and the initial average batch times, wherein the transmission state comprises success and failure;

judging the transmission state, and if the transmission state of the receiving end is successful, updating a Q table, a current action and a current state;

and restarting to calculate the next action in the current state according to the Q table and executing the next action until the preset maximum learning times or the Q table is converged.

3. A BATS code-based data transmission method according to claim 2, wherein the determining the transmission status comprises:

and if the transmission state of the receiving end is failure, setting the next action corresponding to the current state and the reward expectation corresponding to the state action to be negative infinity in a Q table, wherein the state action is the next action corresponding to all the states after the current state.

4. The BATS code-based data transmission method of claim 2, wherein the resuming the next action in the current state according to the Q table and performing the next action until a preset maximum learning number or the Q table converges further comprises:

and judging whether the Q table is converged to a state, if not, learning and optimizing the Q table based on a second reinforcement learning algorithm, and updating the Q table to be the Q table after learning and optimizing.

5. A bat code-based data transmission apparatus, comprising:

the device comprises a first acquisition unit, a second acquisition unit and a transmission unit, wherein the first acquisition unit is used for acquiring at least one file to be transmitted;

the optimization unit is used for optimizing the value of the BATS code according to a reinforcement learning algorithm to obtain an optimal value, wherein the optimal value is the value with the highest transmission efficiency at the current moment and in the current environment state;

and the transmission unit is used for using the optimal value as the value of the BATS code, generating batches of all the files to be transmitted through the BATS code, and transmitting the batches to a receiving end.

6. A BATS code-based data transmission device according to claim 5, wherein the optimization unit comprises:

the second acquisition unit is used for acquiring the batch number of the BATS code;

a first initialization unit, configured to generate a Q table and a randomly generated initial state in an initialization manner, where the Q table is a table of 3 × batch times, and a numerical value of the initial state is a value solvable by the receiving end;

a first calculating unit, configured to calculate, based on the initial state, an initial average batch number and a transmission number, where the initial average batch number is a number of batches required to complete transmission of unit data in the initial state, and the transmission number is a number of times that the bat code transmits data;

the first prejudging unit is used for calculating the next action in the current state according to the Q table and executing the next action;

a second calculating unit, configured to calculate, according to the number of transmissions and an execution result in the next action, an average number of state batches and the updated number of transmissions, where the average number of state batches is a number of batches required to complete transmission of unit data in the next action;

a third calculating unit, configured to calculate a transmission state according to the transmission times, the state average batch times, and the initial average batch times, where the transmission state includes success and failure;

the first logic unit is used for judging the transmission state, and if the transmission state of the receiving end is successful, updating a Q table, a current action and a current state;

and the first circulation unit is used for restarting to calculate the predicted action in the current state according to the Q table and executing the predicted action until the preset maximum learning times or the Q table is converged.

7. A BATS code based data transmission device according to claim 6, wherein the first logic unit comprises:

and the first logic subunit is configured to, if the transmission state of the receiving end is a failure, set both a next action corresponding to the current state and an incentive expectation corresponding to the state action to negative infinity in a Q table, where the state action is a next action corresponding to all states subsequent to the current state.

8. The BATS code-based data transmission device of claim 6, wherein the optimization unit further comprises:

and the second logic unit is used for judging whether the Q table is converged to a state, if not, performing learning optimization on the Q table based on a second reinforcement learning algorithm, and updating the Q table to be the Q table after learning optimization.

9. A bat code-based data transmission device, comprising:

a memory for storing a computer program;

processor for implementing the steps of the bat code based data transmission method according to any one of claims 1 to 4 when executing said computer program.

10. A readable storage medium, characterized by: the readable storage medium has stored thereon a computer program which, when being executed by a processor, carries out the steps of the BATS code-based data transmission method according to any one of claims 1 to 4.