Summary of the invention
The technical problem to be solved in the present invention is: a kind of filter algorithm of multistage Hash table of the RFID of being applied to middleware is provided, carries out efficient redundant data filtration treatment with the RFID data to from reader.
In order to solve the problems of the technologies described above, the technical solution adopted in the present invention is:
For a redundant data filter method for RFID middleware, comprise the following steps:
In internal memory, create for storing the multistage Hash table from the RFID data of reader, in described multistage Hash table, create and be useful on inquiry, delete the label node of these RFID data;
Judge whether RFID data stream, finished if not process, if carry out next step;
Inquire about the label node in multistage Hash table according to the label ID of RFID data, judge the state of each RFID data in RFID data stream;
According to the state of each RFID data, it is carried out to redundant data filtration treatment.
Be further used as preferred embodiment, described label ID is the binary sequence for these RFID data of unique identification, and the memory location of described RFID data in multistage Hash table is the mapping function based on label ID.
Be further used as preferred embodiment, the label node defining in described multistage Hash table comprises: label ID, label read the time first, label last reads time and reading times.
Be further used as preferred embodiment, the state of described RFID data comprises: unknown, catch and existed; Described unknown state refers to that these RFID data do not exist in multistage Hash table; Described trap state refers to that these RFID data have existed but not confirmed in multistage Hash table; Described existence refers to that these RFID data are identified in multistage Hash table.
Be further used as preferred embodiment, describedly according to the state of each RFID data, it carried out to redundant data filtration treatment and comprise:
Under unknown state, these RFID data are stored in multistage Hash table, and be this RFID data generating labels node, these RFID data are filtered out as redundant data in whole data transmission channel, proceed to trap state;
Under trap state, what judge nearest twice RFID data reads whether overtime threshold value of mistiming, if these RFID data are removed and proceed to unknown state from multistage Hash table, upgrade if not the label node of these RFID data and judge whether the reading times of these RFID data reaches frequency threshold value, keep trap state if do not reach frequency threshold value, these RFID data are filtered out as redundant data, these RFID data are added to output queue if reach frequency threshold value, offer upper system as the clean data after filtering, and proceed to existence,
Under existence, what judge nearest twice RFID data reads whether overtime threshold value of mistiming, if these RFID data are removed and proceed to unknown state from multistage Hash table, upgrade if not the label node of these RFID data and directly these RFID data filtered out as redundant data.
Be further used as preferred embodiment, the establishment of described multistage Hash table comprises the following steps:
Be defined in the label node of storing RFID data in multistage Hash table, described label node comprises: label ID, label read the time first, label last reads time and reading times;
Setting-up time threshold value, frequency threshold value and label data scale, described time threshold is characterized in this time RFID data and is repeatedly read and regards as repeating data; It is to stablize data that described frequency threshold value characterizes the RFID data of only having reading times to reach this numerical value; Described label data scale characterizes the capacity of this multistage Hash table interior label node;
Set maximum barrelage M and the exponent number N of multistage Hash table;
In internal memory, apply for M*N storage space, with two-dimensional array HashTable[M] [N] represent, as the Hash matrix of storage RFID data;
Set up the corresponding relation of RFID data and this RFID data memory location in multistage Hash table with the mapping function based on label ID.
Be further used as preferred embodiment, described time threshold, frequency threshold value and label data scale can be adjusted.
The invention has the beneficial effects as follows: the present invention by setting up multistage Hash table in internal memory, the RFID data allocations that middleware is received is in multistage Hash table, and carry out redundant data filtration treatment by the state that judges these RFID data, realize the filtration of magnanimity redundancy RFID data, greatly reduce the transmission quantity of network, thereby for upper layer application provides more totally, more valuable RFID data, and the inventive method, by utilizing the storage organization of multistage Hash table, has promoted the filtration efficiency of RFID data greatly.
Embodiment
The present invention's employing is set up multistage Hash table and is carried out the RFID data that storing received arrives in internal memory, and carries out redundant data filtration treatment according to the state of each RFID data, can filter out efficiently the RFID data of repetition.
With reference to Fig. 1, a kind of redundant data filter method for RFID middleware, comprises the following steps:
In internal memory, create for storing the multistage Hash table from the RFID data of reader, in described multistage Hash table, create and be useful on inquiry, delete the label node of these RFID data;
Judge whether RFID data stream, finished if not process, if carry out next step;
Inquire about the label node in multistage Hash table according to the label ID of RFID data, judge the state of each RFID data in RFID data stream;
According to the state of each RFID data, it is carried out to redundant data filtration treatment.
Be further used as preferred embodiment, the label node defining in described multistage Hash table comprises: label ID, label read time tFirstRead first, label last reads time tLastRead and reading times tagCnt.
Described label ID is the binary sequence for these RFID data of unique identification, and the memory location of described RFID data in multistage Hash table is the mapping function based on label ID.
With reference to Fig. 2, be further used as preferred embodiment, the state of described RFID data comprises: unknown (unknown), catch (captured) and had (observed); Described unknown state refers to that these RFID data do not exist in multistage Hash table; Described trap state refers to that these RFID data have existed but not confirmed in multistage Hash table; Described existence refers to that these RFID data are identified in multistage Hash table.
Be further used as preferred embodiment, the form of the RFID data of obtaining from reader in the present invention is limited, described RFID data comprise: the timestamp of Reader ID, RFID data encoding and RFID data.Now illustrate, the form of supposing RFID data is all with <reader, epc, time> tlv triple exists as metadata, reader represents Reader ID, epc represents RFID data encoding, both can unique identification, and time is the timestamp that represents that reader is read.Reader is, with such form, RFID data are sent to middleware.
Be further used as preferred embodiment, describedly according to the state of these RFID data, it carried out to redundant data filtration treatment and comprise:
Before RFID data are read first, RFID data are in unknown state; When middleware receives RFID data <reader, epc, when time>, whether first from multistage Hash table, search these RFID data exists by mapping function, judge that if find these RFID data, in catching or existence, judge that if do not find these RFID data are in unknown state;
Under unknown state, these RFID data are stored in multistage Hash table according to mapping function, and be this RFID data generating labels node, setting label node data is tagCnt=1, upgrading tFirstRead and tLastRead is the current time of reading currentTime.These RFID data are filtered out as redundant data in whole data transmission channel, proceed to trap state;
Under trap state, what judge nearest twice RFID data reads whether overtime threshold value of mistiming, if these RFID data are removed and proceed to unknown state from multistage Hash table, upgrade if not the label node of these RFID data and judge whether the reading times of these RFID data reaches frequency threshold value, keep trap state if do not reach frequency threshold value, these RFID data are filtered out as redundant data, these RFID data are added to output queue Queue if reach frequency threshold value, offer upper system as the clean data after filtering, and proceed to existence,
Under existence, what judge nearest twice RFID data reads whether overtime threshold value of mistiming, if these RFID data are removed and proceed to unknown state from multistage Hash table, upgrade if not the label node of these RFID data and directly these RFID data filtered out as redundant data.
Be further used as preferred embodiment, with reference to Fig. 3, the establishment of described multistage Hash table comprises the following steps:
Be defined in the label node of storing RFID data in multistage Hash table, described label node comprises: label ID, label read time tFirstRead first, label last reads time tLastRead and reading times tagCnt;
The form of label node for example, can be set in actual applications as follows:
struct?TagNode
{
TagId: long // label ID is also the key of Hash table
TFirstRead:time // expression label node is read the time first, as time threshold
thresholdtiming initial time
TLastRead:time // record the time that label node reads for the last time
TagCnt:long // record label node at a time threshold
thresholdthe number of times reading
}
Setting-up time threshold value
threshold, frequency threshold value TagCnt
thresholdand label data scale DataScaleSize, described time threshold
thresholdbe characterized in this time RFID data and be repeatedly read and regard as repeating data, can directly filter out.If the time difference reading double exceeds this time threshold
threshold, think different events.This value can be adjusted according to actual service conditions, and obtaining static data can its threshold value of corresponding increase as scanning stock, and obtaining dynamic data needs to turn down this threshold value as the RFID data of transport tape.
Described frequency threshold value TagCnt
thresholdsign only has RFID data that reading times reaches this numerical value for stablize data, and just confirmation reads this RFID data, if think that lower than this threshold value the data of misreading, this threshold value can adjust according to actual business; Described label data scale DataScaleSize characterizes the capacity of this multistage Hash table interior label node;
Set maximum barrelage M and the exponent number N of multistage Hash table; Described maximum barrelage M sets according to label data scale DataScaleSize, operated by rotary motion M=2*DataScaleSize, and the exponent number N of definite Hash table simultaneously, generally unsuitable excessive, if excessive, multistage Hash table may be degenerated to multilinked list.
In internal memory, apply for M*N storage space, with two-dimensional array HashTable[M] [N] represent, and as the Hash matrix of storage RFID data, wherein N represents the exponent number of Hash table, and M represents the maximum barrelage of Hash table;
Set up the corresponding relation of RFID data and this RFID data memory location in multistage Hash table with the mapping function based on label ID, because RFID data are unique, the key that enters multistage Hash table as it using label ID, sets up mapping function f, calculates f (tagId)=(tagId+M) % primTable[n], wherein tagId is label ID, M is the maximum barrelage of Hash table, primTable[n] represent the barrelage of this rank Hash table, n=1,2,3 ... N.With reference to Fig. 4, calculate the barrelage primTable[n of every rank Hash table], get front n and be less than the prime number of maximum barrelage M as the barrelage on every rank, from big to small successively as the barrelage on each rank, and be stored in array primTable, the object of this practice is the hash-collision that provides less.
The f (tagId) calculating, as hash value, is the memory address of these RFID data in this rank Hash bucket.By mapping function f, for multistage Hash table provides the method for inquiry, amendment, deletion and the increase of RFID data.
Preferably, described time threshold
threshold, frequency threshold value TagCnt
thresholdand label data scale DataScaleSize can adjust according to practical situations.
Fig. 5 is the flow chart of steps that the present invention is based on the preferred real embodiment of redundant data filter method of multistage Hash table, with reference to Fig. 5:
According to actual RFID traffic use, determine default parameter, comprise getting and effectively read time threshold
threshold, confirm the effective frequency threshold value TagCnt of data
threshold, and label data scale DataScaleSize creates multistage Hash table in internal memory, and create to RFID data inquire about, revise, the label node of the operation such as deletion.
As shown in Figure 2, define the state machine of the state conversion of RFID data, determine the criterion of RFID data filtering.
When there being RFID data <reader
n, epc
n, time
nwhen > arrives, first confirm the state of these RFID data, exist in Hash table by inquiring about these RFID data, and the time of reading, and confirm with reading times.
And before further confirming the state of RFID data, need to first judge that whether these RFID data are in effectively reading in interval, if the i.e. up-to-date time of reading and the last interval overtime threshold value that reads
thresholdif, exceeding the record that reads of RFID data before representing and lost efficacy, it is as reading and process first.
these RFID data, not in Hash table, represent that these RFID data are in unknown state;
these RFID data, in Hash table, but are less than and confirm the effective reading times threshold value of data effectively reading reading times in the time, represent that these RFID data are in captured state;
these RFID data, in Hash table, but are greater than and confirm the effective reading times threshold value of data effectively reading reading times in the time, represent that these RFID data are in observed state.
After determining the state of RFID data, this label node in Hash table up-to-date read to the time and reading times is upgraded, by checking total read-write number of times of label node and the time interval of reading, as shown in Figure 2, determine the state conversion process of RFID data.
when RFID data are during in unknown state, it is added in multistage Hash table by establishing the memory location that function f calculates, the reading times tagCnt that sets it is 1 time, upgrading tFirstRead and tLastRead is the current time of reading currentTime, and these RFID data are temporarily filtered out and proceed to captured state as redundant data.
when RFID data are during in captured state, show to have existed these RFID data in multistage Hash table, whether judgement reads the time interval at time threshold
thresholdin, if currentTime-tLastRead>
threshold, show that these RFID data lost efficacy, be updated to unknown state, and by data filtering processing; If currentTime-tLastRead≤
threshold, upgrading reading times tagCnt=tagCnt+1, tLastRead is currentTime, then judges whether reading times tagCnt just arrives the frequency threshold value TagCnt effectively reading
thresholdif just, these RFID data proceed to observed state, these RFID data add in queue Queue as the clean data after filtering, for upper-layer service system call; If be less than the frequency threshold value TagCnt effectively reading
threshold, these RFID data are still in observed state, and now these RFID data, still as unstable data, are temporarily filtered out.
when RFID data are during in observed state, show in multistage Hash table, to have had these RFID data, and the number of times of reading exceedes frequency threshold value, now again read this RFID data, whether judgement reads the time interval at time threshold
thresholdin, if currentTime-tLastRead>
threshold, show that these RFID data lost efficacy, be updated to unknown state, and by data filtering processing; If currentTime-tLastRead≤
threshold, upgrading reading times tagCnt=tagCnt+1, tLastRead is currentTime, and these RFID data filter out as redundant data, and in actual application, most RFID data are all in this state.
This method, by the filtration channel using multistage Hash table as RFID middleware, has effectively realized the accurate filtration of magnanimity redundancy RFID data fast, has greatly reduced the transmission quantity of network, has reduced the data processing scale of business, has saved a large amount of processing times.
Below in conjunction with Fig. 6, example of the present invention is once described.
The first step, is configured with parameters in close relations such as scenes with RFID practical application scale for more of the present invention, the definite time threshold that effectively reads of this example
thresholdfor 100s, effectively reading times threshold value TagCnt
thresholdbe set to 5, the exponent number of Hash bucket is 10, supposes that label data scale DataScaleSize is 10000, and establishing maximum Hash barrelage M is 20000, can be with two-dimensional array HashTable[20000] [10] represent.
Calculate the barrelage on every rank, in this example, using 20000 as maximum barrelage, the barrelage of every 1 rank to the 10 rank Hash buckets is set as 19997,19993 successively, 19991,19979,19973,19963,19961,19949,19937,19927, be stored in primTable array.
If have a collection of RFID data to arrive at 10S simultaneously, as 100 label ID values RFID data that are 0000000100, 99 RFID data that label ID value is 000000099, 98 RFID data that label ID value is 00000000098, 2 RFID data that label ID value is 0000000002, 1 RFID data that label ID value is 0000000001, the RFID data that are 0000000100 for label ID value for example, according to f (tagId)=(tagId+M) % primTable[n], it is at the index position index=(100+20000) of the first rank Hash bucket % 19997, be the first Hash bucket position, rank 103, in like manner 0000000099 in position 102, 0000000098 in position 101 0000000001 in position 4.
Because it is 5 times that the threshold value of effective degree is read in setting, 5 following RFID data are all only stored in Hash table, do not export.After filtering, output, in queue, only has (0000000100,000000099 ... 000000006,000000005) totally 96 RFID data values, the scale of data lowers greatly.
The RFID data that label ID value scope is 000000005-0000000100 are now all located observed state, if now arrive middleware in the RFID of this scope data, it will be redundant data and filter out.
The RFID data of label ID value 000000001-000000004 are in captured state, if the RFID data that now label ID value is 0000000004 pass to middleware, its value just arrives the threshold value of effective degree, it will join in queue as clean data, and its RFID data transfer observed state to.If there are the RFID data that label ID value is 0000000003 to arrive, total read-write number of times only has 4 times, and still in captured state, it is still filtered as unstable data.
The RFID data that label ID value is 000000101 if now have arrive, because in Hash table not these RFID data read record, so in unknown state, now added in Hash table before reading, this RFID data mode also transfers captured to.
If effectively read time threshold at 100s
thresholdin, all do not have the RFID data that label ID value is 0000000100 to read record, after 101s, when the RFID data that label ID value is 0000000100 are read again, effectively read time threshold owing to exceeding, so these RFID data read and record reality and lost efficacy, in unknow state, now again read,, as reading processing first, its reading times is set to 1, reads the time first, the up-to-date time of reading all can be set to the current time, and these RFID data will proceed to captured state.
The RFID data that label ID value is 000020097 if now have arrive, calculate since the first rank Hash table, its index=(20097+20000) % 19997 its values are 103, because the RFID data of this barrel existing 0000000100, in hash-collision, then calculate the 2nd rank, its index=(20097+20000) % 19993, its value is 111, so these RFID data are stored in the position that the index position of second-order is 111.
The present invention, by constructing the data structure of multistage Hash table as the pipeline of filtration, has realized the filtration of magnanimity redundancy RFID data, has greatly reduced the transmission quantity of network, thereby for upper layer application provides more clean, valuable RFID data.
The present invention is by the state of definition RFID data, by setting-up time threshold value and reading times threshold value, can effectively filter those born unstable due to RFID technology, and unreliable brought misreads, and the situations such as mutiread are brought cleaner data.
According to the evaluation criterion of algorithm, the quality of weighing a data processing algorithm mainly contains the parameter of three aspects: time complexity, space complexity and average length of search.
Three evaluation indexes of the present invention are as follows:
time complexity: the present invention adopts multistage Hash table, the scale of Dynamic Matching RFID data, effectively utilize the unique characteristic of RFID data, using its label ID as key, in the multistage Hash table of design, making the number of the bucket on every rank is prime numbers, utilize like this number of key% bucket to draw the index position at Hash bucket, make each RFID data in the enough scatterings in the position of Hash table, at utmost reduce conflict, realize simple, through test comparison, its search efficiency can maintain the constant order of magnitude, it is the time complexity of O (1), than the O (n) of linear linked list, the O (lgn) of tree construction, efficiency has the lifting of matter by contrast.
space complexity: the essence of hash function is being traded space for time, by introducing multiple Hash buckets as auxiliary space, space complexity is O (n)
average length of search: average length of search refers to the search length of all nodes in a structure by the weighted value of searching probability.General in the situation that, the present invention all can find by Hash mapping the memory location of target labels within the time of constant.The worst situation is need to search n time exactly, and n represents the exponent number of Hash table.
More than that better enforcement of the present invention is illustrated, but the invention is not limited to described embodiment, those of ordinary skill in the art can also make all equivalent variations or replacement under the prerequisite without prejudice to spirit of the present invention, and the distortion that these are equal to or replacement are all included in the application's claim limited range.