CN113343725B

CN113343725B - Anti-collision method and system for multiple RFID readers

Info

Publication number: CN113343725B
Application number: CN202110399355.2A
Authority: CN
Inventors: 杨律青; 黄晨曦; 丘以书; 钱伟华; 李鼎昭; 林岚良; 沈少钦
Original assignee: Xiamen University
Current assignee: Xiamen University
Priority date: 2021-04-14
Filing date: 2021-04-14
Publication date: 2022-07-19
Anticipated expiration: 2041-04-14
Also published as: CN113343725A

Abstract

The invention provides an anti-collision method and an anti-collision system for multiple RFID readers, wherein the method comprises the following steps: acquiring a reading request of a reader, and randomly allocating a channel resource path to the reader according to the reading request; carrying out simulation processing on the randomly distributed channel resource paths to output corresponding reward values so as to update the Q values according to the reward values and add the updated Q values to a pre-established temporary storage table; after the temporary storage table is fully stored, inputting the temporary storage table serving as a training set into a pre-established BP neural network for training to obtain a trained anti-collision model; acquiring the number of channel resource paths available to the current reader, and inputting the number of the channel resource paths available to the current reader into the trained anti-collision model to obtain an optimal channel resource path; thereby effectively reducing the probability of collision.

Description

Anti-collision method and system for multiple RFID readers

Technical Field

The invention relates to the technical field of radio frequency identification, in particular to an anti-collision method of multiple RFID readers, a computer readable storage medium, computer equipment and an anti-collision system of the multiple RFID readers.

Background

In the related technology, the internet of things collects article information data through an equipment terminal, and an RFID technology is taken as a key technology in terminal equipment and can span the gap between the real world and the virtual world; although the RFID technology has many incomparable advantages, the wide application of the RFID technology can greatly promote the development of production and life, but some defects and factors still exist to restrict the development of the RFID technology; the collision problem is an important factor restricting the development of the RFID technology, and the existing anti-collision method has high memory overhead in the calculation process of a system in a multi-reader scene, so that the efficiency of searching for optimal channel resources is low.

Disclosure of Invention

The present invention is directed to solving, at least to some extent, one of the technical problems in the art described above. Therefore, one object of the present invention is to provide an anti-collision method for multiple RFID readers, which performs optimal channel resource search by Sarsa algorithm, reduces signal interference rate, and reduces memory overhead in the calculation process by combining with BP neural network, improves search efficiency, thereby effectively reducing probability of collision.

A second object of the invention is to propose a computer-readable storage medium.

A third object of the invention is to propose a computer device.

The fourth purpose of the invention is to provide an anti-collision system of the RFID multi-reader.

In order to achieve the above object, an embodiment of a first aspect of the present invention provides an anti-collision method for multiple RFID readers, including the following steps: acquiring a reading request of a reader, and randomly allocating a channel resource path to the reader according to the reading request; carrying out simulation processing on the randomly distributed channel resource paths to output corresponding reward values so as to update the Q values according to the reward values and add the updated Q values to a pre-established temporary storage table; after the temporary storage table is fully stored, inputting the temporary storage table serving as a training set into a pre-established BP neural network for training to obtain a trained anti-collision model; and acquiring the number of the channel resource paths available to the current reader, and inputting the number of the channel resource paths available to the current reader into the trained anti-collision model to obtain an optimal channel resource path.

According to the anti-collision method of the RFID multi-reader, the reading request of the reader is firstly obtained, and a channel resource path is randomly distributed to the reader according to the reading request; carrying out simulation processing on the randomly distributed channel resource paths to output corresponding reward values so as to update the Q values according to the reward values, adding the updated Q values into a pre-established temporary storage table, and inputting the temporary storage table serving as a training set into a pre-established BP neural network for training after the temporary storage table is full of memory so as to obtain a trained anti-collision model; finally, the number of channel resource paths available for the current reader is obtained, and the number of the channel resource paths available for the current reader is input into the trained anti-collision model, so that an optimal channel resource path is obtained; therefore, the optimal channel resource is searched through the Sarsa algorithm, the signal interference rate is reduced, the memory overhead in the calculation process is reduced by combining the BP neural network, the searching efficiency is improved, and the probability of collision is effectively reduced.

In addition, the anti-collision method for the RFID multiple readers according to the above embodiments of the present invention may further have the following additional technical features:

optionally, the Q value includes a state S and an action a, where the state S represents the number of channel resource paths available to the current reader, and the action a represents allocating one channel resource path to the current reader.

Optionally, updating the Q value according to the bonus value, and adding the updated Q value to a temporary storage table established in advance, includes: searching in the pre-established temporary storage table according to the Q value corresponding to the randomly allocated channel resource path to judge whether the Q value exists in the temporary storage table; if yes, directly updating the Q value through the Q value and the reward value, and adding the updated Q value into a pre-established temporary storage table; if not, inputting the state S in the Q value into a pre-established BP neural network to output the Q values predicted by all actions, selecting the maximum Q value and the reward value to update the Q value, and adding the updated Q value into a pre-established temporary storage table.

Optionally, the Q value is updated according to the following formula:

Q’(s，a)＝(1-α)Q(s，a)+αγC(s，a)+Q(s’，a’)

where Q ' (s, a) represents the updated Q value of the current operation state, α represents the learning rate, Q (s, a) represents the Q value of the current operation state, γ represents the discount factor, C (s, a) represents the average prize value in the current state, and Q (s ', a ') represents the Q value of the next state operation.

Optionally, after the temporary storage table is full, inputting the temporary storage table as a training set into a pre-established BP neural network for training, so as to obtain a trained anti-collision model, including: judging whether the temporary storage table is full; if yes, inputting the temporary storage table serving as a training set into a pre-established BP neural network for training, updating the learning times, emptying the contents in the temporary storage table, and judging whether the learning times are reached according to the current learning times and the preset learning times; if not, judging whether the learning times are reached directly according to the current learning times and preset learning times; and if the learning times are reached, finishing the channel resource path distribution, and if the learning times are not reached, re-acquiring the reading request of the reader to perform a new round of iterative training.

In order to achieve the above object, a second embodiment of the present invention provides a computer-readable storage medium, on which an anti-collision program of an RFID multi-reader is stored, and when executed by a processor, the anti-collision program of the RFID multi-reader implements the anti-collision method of the RFID multi-reader as described above.

According to the computer readable storage medium of the embodiment of the invention, the anti-collision program of the RFID multi-reader is stored, so that the processor can realize the anti-collision method of the RFID multi-reader when executing the anti-collision program of the RFID multi-reader, thereby effectively reducing the probability of collision.

In order to achieve the above object, a third embodiment of the present invention provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the processor implements the above anti-collision method for multiple RFID readers.

According to the computer equipment provided by the embodiment of the invention, the anti-collision program of the RFID multi-reader is stored in the memory, so that the anti-collision program of the RFID multi-reader is executed by the processor to realize the anti-collision method of the RFID multi-reader, thereby effectively reducing the probability of collision.

In order to achieve the above object, a fourth aspect of the present invention provides an anti-collision system for multiple RFID readers, including: the acquisition module is used for acquiring a reading request of a reader and randomly allocating a channel resource path to the reader according to the reading request; the updating processing module is used for carrying out simulation processing on the randomly distributed channel resource paths so as to output corresponding reward values, so that the Q value is updated according to the reward values, and the updated Q value is added into a pre-established temporary storage table; the training module is used for inputting the temporary storage table serving as a training set to a pre-established BP neural network for training after the temporary storage table is fully stored so as to obtain a trained anti-collision model; and the anti-collision module is used for acquiring the number of channel resource paths available to the current reader and inputting the number of channel resource paths available to the current reader into the trained anti-collision model so as to obtain an optimal channel resource path.

According to the anti-collision system of the RFID multi-reader, the reading request of the reader is obtained through the obtaining module, and a channel resource path is randomly distributed to the reader according to the reading request; the randomly distributed channel resource paths are subjected to simulation processing through an updating processing module so as to output corresponding reward values, so that the Q value is updated according to the reward values, and the updated Q value is added into a pre-established temporary storage table; after the temporary storage table is fully stored through the training module, the temporary storage table is used as a training set and is input into a pre-established BP neural network for training, so that a trained anti-collision model is obtained; finally, the number of channel resource paths available for the current reader is obtained through an anti-collision module, and the number of the channel resource paths available for the current reader is input into a trained anti-collision model, so that an optimal channel resource path is obtained; therefore, the optimal channel resource is searched through the Sarsa algorithm, the signal interference rate is reduced, the memory overhead in the calculation process is reduced by combining the BP neural network, the searching efficiency is improved, and the probability of collision is effectively reduced.

Optionally, the update processing module is further configured to: searching in the pre-established temporary storage table according to the Q value corresponding to the randomly allocated channel resource path to judge whether the Q value exists in the temporary storage table; if yes, directly updating the Q value through the Q value and the reward value, and adding the updated Q value into a pre-established temporary storage table; if not, inputting the state S in the Q value into a pre-established BP neural network to output the Q values predicted by all actions, selecting the maximum Q value and the reward value to update the Q value, and adding the updated Q value into a pre-established temporary storage table.

Drawings

Fig. 1 is a schematic flow chart illustrating an anti-collision method of RFID multiple readers according to an embodiment of the present invention;

fig. 2 is a flowchart illustrating an anti-collision method of an RFID multi-reader according to an embodiment of the present invention;

FIG. 3 is a diagram of a learning model of the sarsa algorithm after the neural network has been introduced according to one embodiment of the present invention;

fig. 4 is a block diagram illustrating an RFID multi-reader collision avoidance system according to an embodiment of the present invention.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

In order to better understand the above technical solutions, exemplary embodiments of the present invention will be described in more detail below with reference to the accompanying drawings. While exemplary embodiments of the invention are shown in the drawings, it should be understood that the invention can be embodied in various forms and should not be limited to the embodiments set forth herein. Rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the scope of the invention to those skilled in the art.

In order to better understand the technical solution, the technical solution will be described in detail with reference to the drawings and the specific embodiments.

Fig. 1 is a schematic flow chart of an anti-collision method for multiple RFID readers according to an embodiment of the present invention, and as shown in fig. 1, the anti-collision method for multiple RFID readers according to an embodiment of the present invention includes the following steps:

step 101, obtaining a reading request of a reader, and randomly allocating a channel resource path to the reader according to the reading request.

That is, after receiving a request sent by a certain reader, the system randomly allocates a channel resource path to the reader.

After the operation of selecting the channel resource path assignment is completed, the algorithm generates the next state and operation.

And 102, performing simulation processing on the randomly distributed channel resource paths to output corresponding reward values so as to update the Q values according to the reward values, and adding the updated Q values to a pre-established temporary storage table.

As an example, performing analog simulation on the randomly allocated channel resource paths to determine a channel collision condition, and returning a reward value according to the channel collision condition; for example, the number of readers occupying the channel resource path in the same time slot can be judged according to the randomly allocated channel resource path, and a probability value is returned according to the ratio of the number of occupied readers to the total number of readers to be used as the embodiment of the reward value.

It should be noted that the Q value includes a state S and an action a, where the state S represents the number of channel resource paths available to the current reader, and the action a represents allocating one channel resource path to the current reader.

As a specific embodiment, searching in a pre-established temporary storage table according to a Q value corresponding to a randomly allocated channel resource path to determine whether the Q value exists in the temporary storage table; if so, directly updating the Q value through the Q value and the reward value, and adding the updated Q value into a pre-established temporary storage table; if not, inputting the state S in the Q value into a pre-established BP neural network to output the Q values of all action predictions, selecting the maximum Q value and the reward value to update the Q value, and adding the updated Q value into a pre-established temporary storage table.

It should be noted that, during initialization, an empty temporary storage table is pre-established as a buffer pool for storing the latest access state and Q value, so as to facilitate quick access; meanwhile, a BP neural network is pre-established, wherein weight initialization of the BP neural network is performed only once, and a common weight initialization method is as follows: a normal distribution with a mean of 0 and a variance of 1 is used for each weight.

As an example, the Q value is updated according to the following formula:

Q’(s，a)＝(1-α)Q(s，a)+αγC(s，a)+Q(s’，a’)

where Q ' (s, a) represents the Q value of the current operation state after update, α represents the learning rate, Q (s, a) represents the Q value of the current operation state, γ represents the discount factor, C (s, a) represents the average reward value in the current state, and Q (s ', a ') represents the Q value of the next state operation.

And 103, after the temporary storage table is fully stored, inputting the temporary storage table serving as a training set into a pre-established BP neural network for training to obtain a trained anti-collision model.

As an example, first, it is determined whether the temporary storage table is full; if yes, inputting the temporary storage table serving as a training set into a pre-established BP neural network for training, updating the learning times, emptying the contents in the temporary storage table, and judging whether the learning times are reached according to the current learning times and the preset learning times; if not, judging whether the learning times are reached directly according to the current learning times and the preset learning times; and if the learning times are reached, finishing the channel resource path distribution, and if the learning times are not reached, re-acquiring the reading request of the reader to perform a new round of iterative training.

That is, firstly, judging whether a temporary storage table is full, if so, inputting the temporary storage table as a training set into a pre-established BP neural network for training, updating the learning times, emptying the content in the temporary storage table, then judging whether the learning times are reached according to the current learning times and the preset learning times, if so, completing channel resource path allocation, and if not, re-acquiring a reading request of a reader to perform a new round of iterative training; if not, directly judging whether the learning times are reached according to the current learning times and the preset learning times; and if the learning times are reached, finishing the channel resource path distribution, and if the learning times are not reached, re-acquiring the reading request of the reader to perform a new round of iterative training.

It should be noted that, performing a new round of iterative training refers to jumping to the above step 101, and re-performing the

step

101 and 103 until the number of learning times is reached, thereby completing the model training.

And 104, acquiring the number of the channel resource paths available to the current reader, and inputting the number of the channel resource paths available to the current reader into the trained anti-collision model to obtain an optimal channel resource path.

It should be noted that, after receiving the reward value, the BP neural network outputs a proper Q value through accumulative learning, and meanwhile continuously updates the weight of its own node, so that the method can select a proper channel resource, thereby effectively reducing the problem of channel resource collision in the reader; with the increase of the number of readers, compared with the traditional Hiq algorithm, the method can achieve better performance and plays an important role in practical scenes.

As a specific embodiment, as shown in fig. 2 to 3, the anti-collision method of the RFID multi-reader includes the following steps:

s1, initializing an algorithm, and establishing the null table W and the BP neural network.

And S2, randomly selecting an action a to allocate resources after the system receives the reader request.

And S3, after the algorithm finishes the action a, the reward value is returned after the channel collision condition is judged.

S4, search the W table, judge whether Q (S, a) exists. If so, go to step S5; if not, step S6 is performed.

And S5, calculating an updated Q value by directly using Q (S, a) in the W table, and adding the new Q value to the W table.

And S6, inputting the state S in Q (S, a) into the BP neural network to obtain the maximum Q value, calculating and updating the Q value by using the maximum Q value, and adding the new Q value into the W table.

S7, judge whether the W table is full. If so, go to step S8; if not, step S9 is performed.

And S8, training the neural network by using the W table, and emptying the W table.

S9, it is judged whether or not the learning count is reached. If so, go to step S10; if not, the process returns to step S2.

And S10, finishing channel allocation and ending the algorithm.

Note that the index of Q (s, a) representing the tabulated valued function is a state-action pair, where the row in the Q table represents action a and the column represents state s. After the W table stores the full Q value and the corresponding version number, the neural network performs learning operation by using the W table; the BP neural network is fused into the Sarsa method, and the fused method is applied to the RFID system, so that compared with the Sarsa anti-collision method without the neural network, the method has better performance on anti-collision under the conditions that channel resources are searched in the RFID system and the number of readers is large; the problem of large data volume in the RFID system can be effectively solved by using the strong storage and calculation capacity of the BP neural network.

In addition, in the back propagation of the training process of the BP neural network, the parameters are updated by the following formula:

wherein E represents the error between the true value and the predicted value of the output node, and r_t+1Representing the prize value at time t +1, a gamma discount factor,

representing the value at time t +1 of the predicted output of the BP neural network,

a value representing time t of the predicted output of the BP neural network.

In summary, according to the anti-collision method for multiple RFID readers in the embodiment of the present invention, a read request of a reader is first obtained, and a channel resource path is randomly allocated to the reader according to the read request; carrying out simulation processing on the randomly distributed channel resource paths to output corresponding reward values so as to update the Q values according to the reward values, adding the updated Q values into a pre-established temporary storage table, and inputting the temporary storage table serving as a training set into a pre-established BP neural network for training after the temporary storage table is full of memory so as to obtain a trained anti-collision model; finally, the number of channel resource paths available for the current reader is obtained, and the number of the channel resource paths available for the current reader is input into the trained anti-collision model, so that an optimal channel resource path is obtained; therefore, the optimal channel resource is searched through the Sarsa algorithm, the signal interference rate is reduced, the memory overhead in the calculation process is reduced by combining the BP neural network, the searching efficiency is improved, and the probability of collision is effectively reduced.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, on which an anti-collision program of an RFID multi-reader is stored, where the anti-collision program of the RFID multi-reader is executed by a processor to implement the above-mentioned anti-collision method of the RFID multi-reader.

According to the computer-readable storage medium of the embodiment of the invention, the anti-collision program of the RFID multi-reader is stored, so that the processor can realize the anti-collision method of the RFID multi-reader when executing the anti-collision program of the RFID multi-reader, thereby effectively reducing the probability of collision.

In addition, the embodiment of the present invention further provides a computer device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the program, the method for preventing collision of multiple RFID readers is implemented.

According to the computer equipment provided by the embodiment of the invention, the anti-collision program of the RFID multi-reader is stored by the memory, so that the anti-collision method of the RFID multi-reader is realized when the anti-collision program of the RFID multi-reader is executed by the processor, and the probability of collision is effectively reduced.

FIG. 4 is a block schematic diagram of a collision avoidance system of RFID multi-readers in accordance with an embodiment of the present invention; as shown in fig. 4, the system includes: an acquisition module 201, an update processing module 202, a training module 203, and a collision avoidance module 204.

The acquisition module 201 is configured to acquire a read request of a reader, and randomly allocate a channel resource path to the reader according to the read request; an update processing module 202, configured to perform simulation processing on the randomly allocated channel resource path to output a corresponding reward value, so as to update the Q value according to the reward value, and add the updated Q value to a temporary storage table established in advance; the training module 203 is configured to, after the temporary storage table is full of memory, input the temporary storage table as a training set into a pre-established BP neural network for training to obtain a trained anti-collision model; the anti-collision module 204 is configured to obtain the number of channel resource paths available to the current reader, and input the number of channel resource paths available to the current reader into the trained anti-collision model to obtain an optimal channel resource path.

Further, the Q value includes a state S and an action a, where the state S represents the number of channel resource paths available to the current reader, and the action a represents allocating one channel resource path to the current reader.

Further, the update processing module 202 is further configured to: searching in the pre-established temporary storage table according to the Q value corresponding to the randomly allocated channel resource path to judge whether the Q value exists in the temporary storage table; if yes, directly updating the Q value through the Q value and the reward value, and adding the updated Q value into a pre-established temporary storage table; if not, inputting the state S in the Q value into a pre-established BP neural network to output the Q values predicted by all actions, selecting the maximum Q value and the reward value to update the Q value, and adding the updated Q value into a pre-established temporary storage table.

Further, the Q value is updated according to the following formula:

Q’(s，a)＝(1-α)Q(s，a)+αγC(s，a)+Q(s’，a’)

Further, the training module 203 is further configured to: judging whether the temporary storage table is full; if yes, inputting the temporary storage table serving as a training set into a pre-established BP neural network for training, updating the learning times, emptying the contents in the temporary storage table, and judging whether the learning times are reached according to the current learning times and the preset learning times; if not, judging whether the learning times are reached directly according to the current learning times and preset learning times; and if the learning times are reached, finishing the channel resource path distribution, and if the learning times are not reached, re-acquiring the reading request of the reader to perform a new round of iterative training.

It should be noted that, the above description and examples of the anti-collision method for multiple RFID readers are also applicable to the anti-collision system for multiple RFID readers of this embodiment, and are not repeated herein.

In summary, according to the anti-collision system with multiple RFID readers according to the embodiment of the present invention, the obtaining module obtains the reading request of the reader, and randomly allocates a channel resource path to the reader according to the reading request; the randomly distributed channel resource paths are subjected to simulation processing through an updating processing module so as to output corresponding reward values, so that the Q value is updated according to the reward values, and the updated Q value is added into a pre-established temporary storage table; after the temporary storage table is fully stored through the training module, the temporary storage table is used as a training set and is input into a pre-established BP neural network for training, so that a trained anti-collision model is obtained; finally, the number of the channel resource paths available for the current reader is obtained through the anti-collision module, and the number of the channel resource paths available for the current reader is input into the trained anti-collision model, so that the optimal channel resource path is obtained; therefore, the optimal channel resource is searched through the Sarsa algorithm, the signal interference rate is reduced, the memory overhead in the calculation process is reduced by combining the BP neural network, the searching efficiency is improved, and the probability of collision is effectively reduced.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It should be noted that in the claims, any reference signs placed between parentheses shall not be construed as limiting the claim. The word "comprising" does not exclude the presence of elements or steps not listed in a claim. The word "a" or "an" preceding an element does not exclude the presence of a plurality of such elements. The invention may be implemented by means of hardware comprising several distinct elements, and by means of a suitably programmed computer. In the unit claims enumerating several means, several of these means can be embodied by one and the same item of hardware. The usage of the words first, second and third, etcetera do not indicate any ordering. These words may be interpreted as names.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various changes and modifications may be made in the present invention without departing from the spirit and scope of the invention. Thus, if such modifications and variations of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to include such modifications and variations.

In the description of the present invention, it is to be understood that the terms "first", "second" and the like are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implying any number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the present invention, "a plurality" means two or more unless specifically defined otherwise.

In the present invention, unless otherwise expressly stated or limited, the terms "mounted," "connected," "secured," and the like are to be construed broadly and can, for example, be fixedly connected, detachably connected, or integrally formed; can be mechanically or electrically connected; either directly or indirectly through intervening media, either internally or in any other relationship. The specific meanings of the above terms in the present invention can be understood by those skilled in the art according to specific situations.

In the present invention, unless otherwise expressly stated or limited, the first feature "on" or "under" the second feature may be directly contacting the first and second features or indirectly contacting the first and second features through an intermediate. Also, a first feature "on," "above," and "over" a second feature may be directly on or obliquely above the second feature, or simply mean that the first feature is at a higher level than the second feature. A first feature being "under," "below," and "beneath" a second feature may be directly under or obliquely under the first feature, or may simply mean that the first feature is at a lesser elevation than the second feature.

In the description herein, references to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., mean that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above should not be understood to necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples. Moreover, various embodiments or examples and features of various embodiments or examples described in this specification can be combined and combined by one skilled in the art without being mutually inconsistent.

Although embodiments of the present invention have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present invention, and that variations, modifications, substitutions and alterations can be made to the above embodiments by those of ordinary skill in the art within the scope of the present invention.

Claims

1. An anti-collision method of multiple RFID readers is characterized by comprising the following steps:

acquiring a reading request of a reader, and randomly allocating a channel resource path to the reader according to the reading request;

carrying out simulation processing on the randomly distributed channel resource paths to output corresponding reward values so as to update Q values according to the reward values and add the updated Q values into a pre-established temporary storage table;

after the temporary storage table is fully stored, inputting the temporary storage table serving as a training set into a pre-established BP neural network for training to obtain a trained anti-collision model;

acquiring the number of channel resource paths available to the current reader, and inputting the number of channel resource paths available to the current reader into the trained anti-collision model to obtain an optimal channel resource path;

the Q value comprises a state S and an action a, wherein the state S represents the number of available channel resource paths of the current reader, and the action a represents the allocation of one channel resource path to the current reader;

wherein, updating the Q value according to the reward value, and adding the updated Q value into a pre-established temporary storage table, comprises:

searching in the pre-established temporary storage table according to the Q value corresponding to the randomly allocated channel resource path to judge whether the Q value exists in the temporary storage table;

if yes, directly updating the Q value through the Q value and the reward value, and adding the updated Q value into a pre-established temporary storage table;

if not, inputting the state S in the Q value into a pre-established BP neural network so as to output the Q values predicted by all actions, selecting the maximum Q value and the reward value to update the Q value, and adding the updated Q value into a pre-established temporary storage table;

wherein the Q value is updated according to the following formula:

Q’(s，a)＝(1-α)Q(s，a)+αγC(s，a)+Q(s’，a’)

2. The anti-collision method for the RFID multiple readers according to claim 1, wherein after the temporary storage table is full, the temporary storage table is input as a training set to a pre-established BP neural network for training, so as to obtain a trained anti-collision model, comprising:

judging whether the temporary storage table is full;

if yes, inputting the temporary storage table serving as a training set into a pre-established BP neural network for training, updating the learning times, emptying the contents in the temporary storage table, and judging whether the learning times are reached according to the current learning times and the preset learning times;

if not, judging whether the learning times are reached directly according to the current learning times and preset learning times;

and if the learning times are reached, finishing the channel resource path distribution, and if the learning times are not reached, re-acquiring the reading request of the reader to perform a new round of iterative training.

3. A computer-readable storage medium, on which an anti-collision program of an RFID multi-reader is stored, which when executed by a processor implements the anti-collision method of the RFID multi-reader according to any one of claims 1-2.

4. A computer device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the method for collision avoidance for RFID multiple readers according to any of claims 1-2.

5. An anti-collision system of RFID multi-reader, comprising:

the acquisition module is used for acquiring a reading request of a reader and randomly allocating a channel resource path to the reader according to the reading request;

the updating processing module is used for carrying out simulation processing on the randomly distributed channel resource paths so as to output corresponding reward values, so that the Q value is updated according to the reward values, and the updated Q value is added into a pre-established temporary storage table;

the training module is used for inputting the temporary storage table serving as a training set into a pre-established BP neural network for training after the temporary storage table is fully stored so as to obtain a trained anti-collision model;

the anti-collision module is used for acquiring the number of channel resource paths available to the current reader and inputting the number of channel resource paths available to the current reader into the trained anti-collision model to obtain an optimal channel resource path;

wherein, the update processing module is further configured to:

searching in the pre-established temporary storage table according to the Q value corresponding to the randomly allocated channel resource path to judge whether the Q value exists in the temporary storage table or not;

wherein the Q value is updated according to the following formula:

Q’(s，a)＝(1-α)Q(s，a)+αγC(s，a)+Q(s’，a’)