CN112597201A - Element counting method, device, readable medium and equipment - Google Patents

Element counting method, device, readable medium and equipment Download PDF

Info

Publication number
CN112597201A
CN112597201A CN202011561749.5A CN202011561749A CN112597201A CN 112597201 A CN112597201 A CN 112597201A CN 202011561749 A CN202011561749 A CN 202011561749A CN 112597201 A CN112597201 A CN 112597201A
Authority
CN
China
Prior art keywords
array
counter
count value
value
determining
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202011561749.5A
Other languages
Chinese (zh)
Inventor
张媛媛
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Agricultural Bank of China
Original Assignee
Agricultural Bank of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agricultural Bank of China filed Critical Agricultural Bank of China
Priority to CN202011561749.5A priority Critical patent/CN112597201A/en
Publication of CN112597201A publication Critical patent/CN112597201A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2455Query execution
    • G06F16/24568Data stream processing; Continuous queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computational Linguistics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Fuzzy Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The method comprises the steps of obtaining a data stream comprising a plurality of elements, determining a counter corresponding to the elements in a first array aiming at each element, and setting the current count value of the determined counter as the minimum count value; adding one to the current count value of the counter; determining a counter corresponding to the element in the next array of the first array; if the current count value of the counter is smaller than or equal to the minimum count value, updating the value of the minimum count value to the current count value of the counter, adding one to the count value, taking the next array of the first array as a new first array, and returning to execute the counter step of determining the corresponding element in the next array of the first array until the next array of the first array does not exist.

Description

Element counting method, device, readable medium and equipment
Technical Field
The present application relates to the field of data processing technologies, and in particular, to a method, an apparatus, a readable medium, and a device for counting elements.
Background
In the prior art, the frequency of each element in the data stream needs to be counted frequently. For example, it is necessary to count the number of queries for each keyword, the number of visits to each website, and the like. Currently, a minimum Count algorithm (Count-MinSketch) is usually used to Count each element in the data stream. Specifically, the Count-MinSketch algorithm includes n arrays, each array includes w counters, and n and w are positive integers. Count-MinSketch performs a Count-up operation on each element in the data stream, on a particular counter in the respective array. When the frequency of the element needs to be inquired, the minimum count value is selected from the counters corresponding to the elements in the arrays to serve as the frequency of the element.
In the existing element counting statistical method, each time element frequency statistics is carried out, an operation of adding one to the counter in each array is required, and the process is complicated. In addition, in the process of adding one to the counters in each array, the problem that the count of one element is added to the counters of other elements by one by mistake is easily caused, so that the element counting accuracy is low.
Disclosure of Invention
Based on the defects of the prior art, the application provides a method, a device, a readable medium and equipment for counting elements, so as to improve the efficiency and accuracy of element counting.
The application discloses a first aspect of a method for counting elements, which comprises the following steps:
acquiring a data stream comprising a plurality of elements;
for each element in the data stream, determining a counter corresponding to the element in a first array, and setting the current count value of the counter in the first array as a minimum count value;
adding one to the current count value of the counter in the first array;
determining a counter corresponding to the element in a next array of the first array;
if the current count value of the counter in the next array of the first array is smaller than or equal to the minimum count value, updating the value of the minimum count value to the current count value of the counter in the next array of the first array, and adding one to the current count value of the counter in the next array of the first array;
taking the next array of the first array as a new first array, and returning to execute the step of determining the counter corresponding to the element in the next array of the first array until the next array of the first array does not exist; wherein each of the arrays includes a plurality of counters.
Optionally, in the method for counting the elements, the step of determining the counter corresponding to the element in the next array of the first array by taking the next array of the first array as a new first array and returning to execute until there is no next array of the first array further includes:
and if the determined current count value of the counter in the next array of the first array is greater than the minimum count value, not updating the value of the minimum count value, and not adding one to the determined current count value of the counter in the next array of the first array.
Optionally, in the method for counting the elements, the determining a counter corresponding to the element in the first array includes:
performing hash operation on the element by using a hash function corresponding to the first array to obtain a hash value corresponding to the element in the first array;
determining a counter corresponding to the element in the first array by using the hash value corresponding to the element in the first array;
the determining a counter corresponding to the element in a next array of the first array includes:
performing hash operation on the element by using a hash function corresponding to the next array of the first array to obtain a hash value corresponding to the element in the next array of the first array;
and determining a counter corresponding to the element in the next array of the first array by using the hash value corresponding to the element in the next array of the first array.
Optionally, in the method for counting the above elements, the method further includes:
receiving a frequency query request of a target element;
determining a counter corresponding to the target element in each array;
reading a count value from a counter corresponding to the target element in each array;
selecting the minimum value from each read count value as the frequency value of the target element;
and outputting the frequency value of the target element.
Optionally, in the method for counting the elements, the determining a counter corresponding to the target element in each array includes:
for each array, carrying out hash operation on the target element by using a hash function corresponding to the array to obtain a hash value corresponding to the target element in the array;
and for each array, determining a counter corresponding to the element in the array by using the hash value corresponding to the element in the array.
The second aspect of the present application discloses a counting apparatus of elements, comprising:
an acquisition unit configured to acquire a data stream including a plurality of elements;
a first determining unit, configured to determine, for each element in the data stream, a counter corresponding to the element in a first array, and set a current count value of the counter in the first array as a minimum count value;
the counting unit is used for adding one to the current count value of the counter in the determined first array;
a second determining unit, configured to determine a counter corresponding to the element in a next array of the first array;
an update count unit, configured to update a value of the minimum count value to a current count value of a counter of a next array of the first array if the current count value of the counter of the next array of the first array is smaller than or equal to the minimum count value, and add one to the current count value of the counter of the next array of the first array;
a returning unit, configured to return the next array of the first array to the second determining unit as a new first array until there is no next array of the first array; wherein each of the arrays includes a plurality of counters.
Optionally, in the counting device of the above elements, further comprising:
a third determining unit, configured to not update the value of the minimum count value and not increment the current count value of the counter in the next array of the first array if the current count value of the counter in the next array of the first array is greater than the minimum count value.
Optionally, in the counting apparatus for the above elements, when the first determining unit determines that the element corresponds to the counter in the first array, the first determining unit is configured to:
performing hash operation on the element by using a hash function corresponding to the first array to obtain a hash value corresponding to the element in the first array; determining a counter corresponding to the element in the first array by using the hash value corresponding to the element in the first array;
the second determining unit, when determining the counter corresponding to the element in the next array of the first array, is configured to:
performing hash operation on the element by using a hash function corresponding to the next array of the first array to obtain a hash value corresponding to the element in the next array of the first array; and determining a counter corresponding to the element in the next array of the first array by using the hash value corresponding to the element in the next array of the first array.
Optionally, in the counting device of the above elements, further comprising:
a first receiving unit, configured to receive a frequency query request of a target element;
a fourth determining unit, configured to determine a counter corresponding to the target element in each array;
a first reading unit, configured to read a count value from a counter corresponding to the target element in each array;
the first selection unit is used for selecting the minimum value from each read count value as the frequency value of the target element;
a first output unit for outputting a frequency value of the target element.
Optionally, in the counting apparatus of the above elements, the fourth determining unit includes:
the calculation subunit is configured to perform hash operation on the target element by using a hash function corresponding to each array to obtain a hash value of the target element in the array;
and the determining subunit is used for determining, for each array, a counter corresponding to the element in the array by using the hash value corresponding to the element in the array.
A third aspect of the application discloses a computer readable medium having a computer program stored thereon, wherein the program when executed by a processor implements the method as described in any of the first aspects above.
The fourth aspect of the present application discloses an apparatus comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method as in any one of the first aspects above.
It can be seen from the foregoing technical solutions that, in the element counting method provided in the embodiments of the present application, by determining, for each element in a data stream, a counter corresponding to the element in a first array, setting a current count value of the counter in the first array as a minimum count value, then, the counter in the first array is determined to be increased by one at the current count value, after the counter corresponding to the element in the next array of the first array is determined, if the current count value of the counter in the next array of the first array is less than or equal to the minimum count value, the value of the minimum count value is updated to the current count value of the counter of the next array of the first array, and adding one to the current count value of the counter in the next array of the determined first array. And then taking the next array of the first array as a new first array, and returning to execute the step of determining the counter corresponding to the element in the next array of the first array until the next array of the first array does not exist, so that only one is added to the minimum value in the currently known counter, unlike the prior art in which one is added to the counter in each array. And the operation of adding one to the count value of the counter is carried out only when the current count value of the counter in the next array of the determined first array is less than or equal to the minimum count value, so that the error that the count of one element is added by one to the counters of other elements is reduced, and the accuracy of element counting is improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a schematic diagram illustrating a conventional counting method for an element;
FIG. 2 is a flowchart illustrating a method for counting elements according to an embodiment of the present disclosure;
fig. 3 is a schematic flowchart of a method for determining a counter corresponding to an element in a first array according to an embodiment of the present disclosure;
fig. 4 is a flowchart illustrating a method for determining a counter corresponding to an element in a next array of a first array according to an embodiment of the present disclosure;
FIG. 5 is a schematic structural diagram illustrating a counting manner of elements according to an embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating a method for querying a frequency value of a target element according to an embodiment of the present disclosure;
fig. 7 is a schematic flowchart of a method for determining a counter corresponding to an element in each array according to an embodiment of the present disclosure;
fig. 8 is a schematic structural diagram of an element counting apparatus according to an embodiment of the present disclosure.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Referring to fig. 1, the Count-MinSketch algorithm is commonly used in the prior art to Count each element in a data stream. Specifically, the Count-MinSketch algorithm includes n arrays, which are a1, a2, A3, … … and An, each array includes w counters, and n and w are positive integers. Each array is associated with a particular hash function. For example, in fig. 1, array a1 is associated with hash function h1, array a2 is associated with hash functions h2, … …, and array An is associated with hash function hn. For each element in the data stream, a count-plus-one operation is performed on the element at a particular counter in each array. For example, for element ei, the element ei is calculated using a hash function h1 to obtain a hash value h1(ei), and then the hash value h1(ei) is used as the position index of the array a1, and the counter value of the corresponding position is incremented by 1. And so on, the element is counted and added by one on a specific counter in each array. When the frequency of the element needs to be inquired, the minimum count value is selected from the counters corresponding to the elements in the arrays to serve as the frequency of the element. For example, as shown in fig. 1, for the element ei, the minimum count value is selected from count values a1[ h1(ei) ] in the counter in the a1 array indexed by the hash value h1(ei), count values a2[ h2(ei) ], … … in the counter in the a2 array, and count values An [ hn (ei) ] in the An counter as the frequency of the element ei.
In the existing element counting statistical method, each time element frequency statistics is carried out, an operation of adding one to the counter in each array is required, and the process is complicated. In addition, in the process of adding one to the counters in each array, since the hash function has the problem of hash collision, the problem that the count of one element is added by one to the counters of other elements is easily caused, and the element counting accuracy is low.
Based on the above problems in the prior art, an embodiment of the present application provides an element counting method to simplify the element counting process and improve the accuracy of element counting.
Referring to fig. 2, the embodiment of the present application discloses an element counting method, which can be implemented in software or hardware settings, such as a central processing unit (cpu) or a Field Programmable Gate Array (FPGA), and specifically includes the following steps:
s201, acquiring a data stream comprising a plurality of elements.
A data stream is obtained, wherein the data stream comprises a plurality of elements. The elements can be keywords, website names clicked and browsed by the user, and the like. For example, in the process of performing frequency statistics on search keywords in a search engine, keywords input by users using the search engine may be continuously collected, and each keyword input by each user is collected, thereby forming a data stream including a plurality of keyword elements.
The data stream has the same elements and different elements, and the frequency of various elements can be obtained by counting the number of the various elements in the data stream, so that the frequency statistical requirement of various elements is met.
It should be noted that the elements in the data stream are also arranged in order, and when the steps shown in fig. 2 are performed on each element in the data stream, the steps are performed for each element in order. For example, the data stream { e1, e2, … …, em }, first performs steps S202 to S207 for e1, then performs steps S202 to S207 … … for e2, and so on to complete counting of e1, e2, … …, em.
S202, aiming at each element in the data stream, a counter corresponding to the element in the first array is determined, and the current count value of the counter in the determined first array is set as the minimum count value.
Wherein each array comprises a plurality of counters. A plurality of arrays are constructed in advance, each array comprises a plurality of counters, and the number of the counters in each array is the same. Each counter in each array is used for counting the frequency of the corresponding element of the counter. The pre-constructed arrays can also be preset with the sequence among the arrays. After the sequence among the arrays is set, reading the current count value of the element in the counter corresponding to the element in the first array from the first array in the set sequence, and setting the current count value as the minimum count value. The minimum count value represents the minimum value among all count values currently read.
Optionally, the association relationship of the element corresponding to each counter in each array may be preset, and then, for each element in the data stream, which counter the element corresponds to in the first array is found from the association relationship of the element corresponding to each counter in each array that is preset, and then the current count value of the counter is set as the minimum count value.
Optionally, referring to fig. 3, in an embodiment of the present application, an implementation manner of determining a counter corresponding to an element in a first array includes:
s301, carrying out hash operation on the element by using the hash function corresponding to the first array to obtain the corresponding hash value of the element in the first array.
Specifically, each array is preset with a corresponding hash function, and the hash functions corresponding to each array are different from each other. When step S301 is executed, a preset hash function corresponding to the first array is used to perform hash operation on the element, so as to obtain a hash value corresponding to the element in the first array.
If a plurality of identical elements appear in the data stream, the hash operation is performed on each identical element by using the hash function corresponding to the first array, and the obtained hash values are identical. Different elements are subjected to hash operation by using the hash function corresponding to the first array, and the obtained hash values are different. Therefore, the elements can be distinguished through the corresponding hash value of each element in the first array, and the frequency of the same element in the statistical data stream can be further realized.
S302, determining a counter corresponding to the element in the first array by using the hash value corresponding to the element in the first array.
Since the hash values corresponding to different elements in the first array are different, the elements can be distinguished by calculating the hash value corresponding to each element in the first array. Specifically, a one-to-one correspondence between the hash value corresponding to each element and each counter in the first array is preset. I.e. one counter in the first array corresponds to one hash value. Step S301 calculates a hash value corresponding to the element in the first array, indexes, by using a one-to-one correspondence between a preset hash value corresponding to each element and each counter in the first array, a counter corresponding to the element in the first array by using the hash value corresponding to the element in the first array, where the counter corresponding to the element in the first array is a counter for counting the frequency of the element in the first array.
And S203, adding one to the current count value of the counter in the determined first array.
And for each element in the data stream, adding one to the current count value of the counter of the determined element in the first array, namely counting the number of the element into the counter of the element in the first array.
It should be noted that the minimum count value set in step S202 is performed when step S203 is not executed, that is, step S202 sets a count value before counting the number of the elements as the minimum count value.
And S204, determining a counter corresponding to the element in the next array of the first array.
Because the arrays are arranged in sequence in advance, after the counter corresponding to the element in the first array finishes counting, the counter corresponding to the element in the next array of the first array continues to finish counting. In the process of counting the element frequency, the situation of wrong counting frequency is easy to occur by only depending on one counter. Therefore, in the embodiment of the present application, the element frequency is counted by the counter corresponding to the element in each array.
It should be noted that the principle and the execution process of executing step S204 are the same as the principle and the execution process of determining the counter corresponding to the element in the first array in step S202, and reference is made to this, and details are not repeated here.
Optionally, referring to fig. 4, in an embodiment of the present application, an implementation manner of executing step S204 includes:
s401, carrying out hash operation on the element by using a hash function corresponding to the next array of the first array to obtain a hash value corresponding to the element in the next array of the first array.
The principle and the process for executing step S401 are the same as those of step S301 shown in fig. 3, and may be referred to here, which is not described again.
S402, determining a counter corresponding to the element in the next array of the first array by using the hash value corresponding to the element in the next array of the first array.
The principle and the process of executing step S402 are the same as those of step S302 shown in fig. 3, and reference may be made to these steps, which are not described herein again.
S205, judging whether the current count value of the counter in the next array of the determined first array is less than or equal to the minimum count value.
Specifically, the counter in the next array of the first array determined in step S204 is read, and the current frequency counted by the element in the counter in the next array of the first array is known. If the frequency of the element counted by the counter corresponding to the element in each array is accurate, the current counted values of the counters in the next array of the first array (i.e. the frequency of the element) should all be the same, but in practice, the process of counting the frequencies of the elements by the counters is in error, and through multiple practical verifications, the value with the smallest frequency of the elements counted by the counters is the value closest to the actual frequency of the elements and with the highest accuracy. Therefore, when step S205 is executed, it is determined whether the current count value of the counter in the next array of the first array is less than or equal to the minimum count value, and if the current count value of the counter in the next array of the first array is less than or equal to the minimum count value, the current count value of the counter in the first array is more accurate than the current frequency of the element, so step S206 is executed. If the current count value of the counter in the next array of the first array is greater than the minimum count value, it is proved that the accuracy of the count value of the next array of the first array to the element is lower than the count value of the first array to the element, so that the step S207 may be directly performed without performing any operation on the counter corresponding to the element in the next array of the first array, that is, without updating the value of the minimum count value, without adding one to the current count value of the counter in the determined next array of the first array.
S206, updating the value of the minimum count value to the current count value of the counter of the next array of the first array, and adding one to the current count value of the counter of the next array of the first array.
Since the minimum count value is the minimum value among the currently known count values of the element, and step S205 determines that the current count value of the counter in the next array of the first array is less than or equal to the minimum count value, the minimum value among the currently known count values of the element needs to be changed to the current count value of the counter in the next array of the first array. And after the value of the minimum count value is updated to the current count value of the counter of the next array of the first array, adding one to the current count value of the counter in the next array of the first array, namely counting the number of the element into the counter.
It should be noted that, in the present application, only when it is determined that the current count value of the counter in the next array of the first array is less than or equal to the minimum count value, the current count value of the counter in the next array of the first array is incremented by one, because the current count value of the counter in the next array of the first array is the highest accuracy among currently known counters for counting the frequency of the element, the frequency counting can be continued by using the counter, that is, the current count value of the counter in the next array of the first array is incremented by one. And when the current count value of the counter in the next array of the first array is judged to be larger than the minimum count value, the accuracy of the current frequency statistical result of the counter in the next array of the first array on the element is proved to be lower than that of the first array, so that the number of the element does not need to be counted in the next array of the first array.
In the prior art, when each element is counted, an operation of adding one to the counter corresponding to the element in each array is performed, and in the process of frequency statistics, the problem that the count of one element is added to the counters of other elements by mistake is easily caused, so that the count value of the individual element is larger than the actual frequency value, and the error of the counted count value is larger and larger when the operation of adding one to the counter corresponding to each array is performed.
In the application, only the counter in the next array of the first array is selected to increment the current count value of the counter in the next array of the first array when the current count value is less than or equal to the minimum count value, that is, the statistical accuracy of the counter in the next array of the first array is higher than that of the counter in the first array, and the counter in the next array of the first array is not incremented the current count value of the counter in the next array of the first array when the current count value is greater than the minimum count value, on one hand, unnecessary incrementing operation is reduced, the statistical counting process is more efficient and faster, on the other hand, the counter is selectively used for counting, and only the counter with higher accuracy in the current statistical time (that is, the counter with the current count value less than or equal to the minimum count value) is selected for current counting, the situation that the counting error is increased when the counter with low accuracy continuously executes counting operation is avoided, and the accuracy of element statistics is improved.
S207, taking the next array of the first array as a new first array, and returning to execute the step S204 until the next array of the first array does not exist.
Wherein each array comprises a plurality of counters. Because the order of the plurality of arrays is preset, after the next array of the first array completes the operation in the element counting process, the next array of the first array can be used as a new first array, the operation on the next array of the first array is continued, namely, the step S204 is returned to until all arrays are completely operated and the next array of the first array does not exist any more.
Specifically, if the current count value of the counter in the next array of the first array is determined to be less than or equal to the minimum count value, step S206 and step S207 are executed, and if the current count value of the counter in the next array of the first array is determined to be greater than the minimum count value, step S206 is not executed, and step S207 is directly executed.
Referring to fig. 5, one embodiment of performing the counting method of the elements shown in fig. 2 is as follows: n arrays are constructed in advance, namely a1, a2, … … and An, in the process of counting and counting elements in a data stream, when a certain element ei is counted, firstly, a hash function h1 corresponding to a first array a1 is used for carrying out hash operation on the element ei to obtain a hash value h1(ei), then, the hash value h1(ei) is used for indexing to a 6 th counter of the left number in the first array a1, and the counter corresponding to the element ei in the first array a1 is determined. And then reading that the count value of the element ei in the counter corresponding to the first array is 69, namely the frequency of the element ei counted by the counter currently is 69, setting 69 as the minimum count value, namely the minimum value in the count values of the currently known element ei, and after setting the minimum count value, adding one to the counter corresponding to the element ei in the first array, namely counting the number of the element ei in the data stream into the counter, wherein the count value in the counter is changed from 69 to 70. After the operation in the first array a1 is completed, the operation is traversed to the second array e2, the element ei is hashed by using the hash function h2 corresponding to the second array a2, so as to obtain a hash value h2(ei), then the hash value h2(ei) is used to index to the 10 th counter of the left number in the second array a2, that is, the counter corresponding to the element ei in a2 is determined, then the count value in the counter is read to be 88, and since 88 is greater than the current minimum count value 69, the counter does not need to be subjected to an adding operation, and the traversal is continued to the next array A3. Similarly, when the next array A3 is executed, the hash function h3 corresponding to A3 is also used to perform hash operation on ei to obtain a hash value h3(ei), and the 4 th counter of the left number in A3 is indexed by using the hash value h3(ei), so that the counter corresponding to the element ei in A3 is determined. The count value of the counter is read to be 30, and since 30 is smaller than the minimum count value 69 and is the minimum count value of the currently known element ei, 30 needs to be set to be the new minimum count value, and then an adding operation is performed on the counter, and the count value of the counter is changed from 30 to 31. By analogy, when the last array An is traversed, the hash function hn corresponding to the array An is used for carrying out hash operation on ei to obtain a hash value hn (ei), the 8 th counter of the left number in An is indexed by using the hash value hn (ei), and the counter corresponding to the element ei in An is determined. The count value in the counter is then read as 76, no add operation is required since 76 is greater than the minimum count value of 30, and An is not followed by the next array, thus ending the count of element ei and continuing to count the next element in the data stream.
It should be noted that, since the flow shown in fig. 2 conforms to the characteristics of the pipeline operation and there is no pointer backtracking operation, it can be applied to software execution such as a central processing unit, and also can be applied to hardware execution such as an FPGA. The efficiency of the FPGA performing the process shown in figure 2 may be higher than the efficiency of the software (e.g., central processing unit) performing.
Optionally, referring to fig. 6, in an embodiment of the present application, the method further includes:
s601, receiving a frequency query request of a target element.
Wherein the frequency query request of the target element is used for requesting to query the frequency value of the target element. Specifically, the frequency query request of the target element may carry the identification information of the target element, or directly carry the target element. The frequency query request of the target element may be initiated by a user from a device executing the counting method of the elements of the above embodiments, or may be generated by another device and transmitted to a device executing the counting method of the elements of the above embodiments.
And S602, determining a counter corresponding to the target element in each array.
After receiving the frequency query request of the target element, since the frequency query request of the target element carries the unique identifier of the target element or the target element, it is possible to know which element needs to be queried by the current request by analyzing the frequency query request of the target element. And further, after determining the target element of the query, determining a counter corresponding to the target element in each array by using the target element. Specifically, the counter corresponding to the target element in each array may be determined sequentially according to a preset order of each array, or the counter corresponding to the target element in each array may not be determined sequentially. There are many ways to determine the counter corresponding to the target element in each array, including but not limited to those provided in the embodiments of the present application.
Optionally, referring to fig. 7, in an embodiment of the present application, an implementation manner of performing step S602 includes:
s701, aiming at each array, carrying out hash operation on the target element by using the hash function corresponding to the array to obtain the corresponding hash value of the target element in the array.
The hash function corresponding to each array is preset, and then the hash function corresponding to the array can be used for carrying out hash operation on the target element aiming at each array, so that the corresponding hash value of the target element in the array is obtained.
Specifically, the hash operation may be performed on the target element by sequentially using the hash function corresponding to each array according to the preset order of the arrays, so as to obtain the hash value corresponding to the target element in each array. Or simultaneously and respectively utilizing the hash function corresponding to each array to perform hash operation on the target element to obtain the corresponding hash value of the target element in each array. Or the hash function corresponding to each array is sequentially utilized in any order to perform hash operation on the target element, so as to obtain the corresponding hash value of the target element in each array.
S702, aiming at each array, determining a counter corresponding to the element in the array by using the hash value corresponding to the element in the array.
For each array, the corresponding relation between the hash value calculated by each element by using the hash function corresponding to the array and the counter in the array is preset. Thus, with the hash value of the element corresponding in the array, it is possible to index to the counter of the element corresponding in the array.
And S603, reading a count value from the counter corresponding to the target element in each array.
For each array, the count value read from the counter corresponding to the target element in the array is the frequency statistical result of the counter corresponding to the target element in the array on the target element, and the frequency statistical result of the target element is how many target elements are in the data stream. The count value read by the target element in the corresponding counter in each array may be different, since the accuracy of the count may not be the same for each array.
S604, selecting the minimum value from each read count value as the frequency value of the target element.
Since the smallest count value is the result closest to the actual frequency value of the target element, the smallest count value needs to be selected as the frequency value of the target element from each read count value.
And S605, outputting the frequency value of the target element.
There are many ways to output the frequency value of the target element, for example, the frequency value may be displayed on a screen or sent to a user in the form of information.
The method for counting the elements provided in the embodiment of the application determines a counter corresponding to the element in a first array by aiming at each element in a data stream, sets the current count value of the counter in the determined first array as a minimum count value, then adds one to the current count value of the counter in the determined first array, determines a counter corresponding to the element in a next array of the first array, and then updates the value of the minimum count value to the current count value of the counter in the next array of the first array if the current count value of the counter in the next array of the determined first array is smaller than or equal to the minimum count value, and adds one to the current count value of the counter in the next array of the determined first array. And then taking the next array of the first array as a new first array, and returning to execute the step of determining the counter corresponding to the element in the next array of the first array until the next array of the first array does not exist, so that only one is added to the minimum value in the currently known counter, unlike the prior art in which one is added to the counter in each array. And the operation of adding one to the count value of the counter is carried out only when the current count value of the counter in the next array of the determined first array is less than or equal to the minimum count value, so that the error that the count of one element is added by one to the counters of other elements is reduced, and the accuracy of element counting is improved.
Referring to fig. 8, based on the above counting method for the elements provided in the embodiment of the present application, the embodiment of the present application correspondingly discloses a counting apparatus for the elements, which includes: an acquisition unit 801, a first determination unit 802, a counting unit 803, a second determination unit 804, an update counting unit 805, and a return unit 806.
An obtaining unit 801 is configured to obtain a data stream including a plurality of elements.
A first determining unit 802, configured to determine, for each element in the data stream, a counter corresponding to the element in the first array, and set a current count value of the counter in the determined first array as a minimum count value.
Optionally, in a specific embodiment of the present application, when the first determining unit 802 performs determining that the element corresponds to the counter in the first array, the first determining unit is configured to:
and performing hash operation on the element by using a hash function corresponding to the first array to obtain a hash value corresponding to the element in the first array, and determining a counter corresponding to the element in the first array by using the hash value corresponding to the element in the first array.
A counting unit 803, configured to increment a current count value of the counter in the determined first array by one.
A second determining unit 804, configured to determine a counter corresponding to the element in a next array of the first array.
Optionally, in a specific embodiment of the present application, when the second determining unit 804 performs determining that the element corresponds to a counter in a next array of the first array, the second determining unit is configured to:
and performing hash operation on the element by using a hash function corresponding to the next array of the first array to obtain a hash value corresponding to the element in the next array of the first array, and determining a counter corresponding to the element in the next array of the first array by using the hash value corresponding to the element in the next array of the first array.
The update count unit 805 is configured to update the value of the minimum count value to the current count value of the counter of the next array of the first array if the current count value of the counter of the next array of the determined first array is less than or equal to the minimum count value, and increment the current count value of the counter of the next array of the determined first array by one.
The returning unit 806 is configured to return the next array of the first array to the second determining unit as a new first array until there is no next array of the first array. Wherein each array comprises a plurality of counters.
Optionally, in a specific embodiment of the present application, the counting device for elements further includes:
and the third determining unit is used for not updating the value of the minimum count value and not adding one to the current count value of the counter in the next array of the determined first array if the current count value of the counter in the next array of the determined first array is greater than the minimum count value.
Optionally, in a specific embodiment of the present application, the counting device for elements further includes: the device comprises a first receiving unit, a fourth determining unit, a first reading unit, a first selecting unit and a first output unit.
The first receiving unit is used for receiving a frequency query request of a target element.
And the fourth determining unit is used for determining the counter corresponding to the target element in each array.
Optionally, in a specific embodiment of the present application, the fourth determining unit includes: a calculation subunit and a determination subunit.
And the calculation subunit is used for carrying out hash operation on the target element by utilizing the hash function corresponding to the array aiming at each array to obtain the corresponding hash value of the target element in the array.
And the determining subunit is used for determining, for each array, a counter corresponding to the element in the array by using the hash value corresponding to the element in the array.
And the first reading unit is used for reading the count value from the counter corresponding to the target element in each array.
And the first selection unit is used for selecting the minimum value from each read count value as the frequency value of the target element.
And the first output unit is used for outputting the frequency value of the target element.
The specific principle and the implementation process of the counting device for the elements disclosed in the embodiment of the present application are the same as those of the counting method for the elements disclosed in the embodiment of the present application, and reference may be made to corresponding parts in the counting method for the elements disclosed in the embodiment of the present application, which are not described herein again.
The device for counting elements according to the embodiment of the present application determines, by the first determining unit 802, a counter corresponding to an element in a first array for each element in a data stream, sets a current count value of the counter in the determined first array as a minimum count value, the counting unit 803 then increments the current count value of the counter in the first determined array, the second determining unit 804 determines the counter of the element in the next array of the first array, if the current count value of the counter in the next array of the first array is less than or equal to the minimum count value, the update count unit 805 updates the value of the minimum count value to the current count value of the counter of the next array of the first array, and adding one to the current count value of the counter in the next array of the determined first array. Then, the returning unit 806 takes the next array of the first array as a new first array, and returns to execute the step of executing the counter corresponding to the determined element in the next array of the first array until the next array of the first array does not exist, which is implemented by adding one to only the minimum value in the currently known counter, unlike the prior art in which one is added to the counter in each array. And the operation of adding one to the count value of the counter is carried out only when the current count value of the counter in the next array of the determined first array is less than or equal to the minimum count value, so that the error that the count of one element is added by one to the counters of other elements is reduced, and the accuracy of element counting is improved.
The embodiment of the application discloses a computer readable medium, wherein a computer program is stored on the computer readable medium, and when the computer program is executed by a processor, the counting method of any element in the above embodiments is realized.
The embodiment of the present application further discloses an apparatus, including: one or more processors. A storage device having one or more programs stored thereon, which when executed by one or more processors, cause the one or more processors to implement a method of counting elements as in any one of the above embodiments.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present invention. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the invention. Thus, the present invention is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.

Claims (10)

1. A method of counting elements, comprising:
acquiring a data stream comprising a plurality of elements;
for each element in the data stream, determining a counter corresponding to the element in a first array, and setting the current count value of the counter in the first array as a minimum count value;
adding one to the current count value of the counter in the first array;
determining a counter corresponding to the element in a next array of the first array;
if the current count value of the counter in the next array of the first array is smaller than or equal to the minimum count value, updating the value of the minimum count value to the current count value of the counter in the next array of the first array, and adding one to the current count value of the counter in the next array of the first array;
taking the next array of the first array as a new first array, and returning to execute the step of determining the counter corresponding to the element in the next array of the first array until the next array of the first array does not exist; wherein each of the arrays includes a plurality of counters.
2. The method according to claim 1, wherein said step of determining the counter corresponding to the element in the next array of the first array is performed by returning to the step of determining the counter corresponding to the element in the next array of the first array until there is no next array of the first array, further comprising:
and if the determined current count value of the counter in the next array of the first array is greater than the minimum count value, not updating the value of the minimum count value, and not adding one to the determined current count value of the counter in the next array of the first array.
3. The method of claim 1, wherein determining the counter corresponding to the element in the first array comprises:
performing hash operation on the element by using a hash function corresponding to the first array to obtain a hash value corresponding to the element in the first array;
determining a counter corresponding to the element in the first array by using the hash value corresponding to the element in the first array;
the determining a counter corresponding to the element in a next array of the first array includes:
performing hash operation on the element by using a hash function corresponding to the next array of the first array to obtain a hash value corresponding to the element in the next array of the first array;
and determining a counter corresponding to the element in the next array of the first array by using the hash value corresponding to the element in the next array of the first array.
4. The method of claim 1, further comprising:
receiving a frequency query request of a target element;
determining a counter corresponding to the target element in each array;
reading a count value from a counter corresponding to the target element in each array;
selecting the minimum value from each read count value as the frequency value of the target element;
and outputting the frequency value of the target element.
5. The method of claim 4, wherein determining the counter corresponding to the target element in each array comprises:
for each array, carrying out hash operation on the target element by using a hash function corresponding to the array to obtain a hash value corresponding to the target element in the array;
and for each array, determining a counter corresponding to the element in the array by using the hash value corresponding to the element in the array.
6. An apparatus for counting elements, comprising:
an acquisition unit configured to acquire a data stream including a plurality of elements;
a first determining unit, configured to determine, for each element in the data stream, a counter corresponding to the element in a first array, and set a current count value of the counter in the first array as a minimum count value;
the counting unit is used for adding one to the current count value of the counter in the determined first array;
a second determining unit, configured to determine a counter corresponding to the element in a next array of the first array;
an update count unit, configured to update a value of the minimum count value to a current count value of a counter of a next array of the first array if the current count value of the counter of the next array of the first array is smaller than or equal to the minimum count value, and add one to the current count value of the counter of the next array of the first array;
a returning unit, configured to return the next array of the first array to the second determining unit as a new first array until there is no next array of the first array; wherein each of the arrays includes a plurality of counters.
7. The apparatus of claim 6, further comprising:
a third determining unit, configured to not update the value of the minimum count value and not increment the current count value of the counter in the next array of the first array if the current count value of the counter in the next array of the first array is greater than the minimum count value.
8. The apparatus of claim 6, wherein the first determining unit, when determining the counter corresponding to the element in the first array, is configured to:
performing hash operation on the element by using a hash function corresponding to the first array to obtain a hash value corresponding to the element in the first array; determining a counter corresponding to the element in the first array by using the hash value corresponding to the element in the first array;
the second determining unit, when determining the counter corresponding to the element in the next array of the first array, is configured to:
performing hash operation on the element by using a hash function corresponding to the next array of the first array to obtain a hash value corresponding to the element in the next array of the first array; and determining a counter corresponding to the element in the next array of the first array by using the hash value corresponding to the element in the next array of the first array.
9. A computer-readable medium, on which a computer program is stored, wherein the program, when executed by a processor, implements the method of any one of claims 1 to 5.
10. An apparatus, comprising:
one or more processors;
a storage device having one or more programs stored thereon;
the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method of any of claims 1-5.
CN202011561749.5A 2020-12-25 2020-12-25 Element counting method, device, readable medium and equipment Pending CN112597201A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011561749.5A CN112597201A (en) 2020-12-25 2020-12-25 Element counting method, device, readable medium and equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011561749.5A CN112597201A (en) 2020-12-25 2020-12-25 Element counting method, device, readable medium and equipment

Publications (1)

Publication Number Publication Date
CN112597201A true CN112597201A (en) 2021-04-02

Family

ID=75202059

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011561749.5A Pending CN112597201A (en) 2020-12-25 2020-12-25 Element counting method, device, readable medium and equipment

Country Status (1)

Country Link
CN (1) CN112597201A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881338A (en) * 2023-09-07 2023-10-13 北京傲星科技有限公司 Data mining method and related equipment for data stream based on large model

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101364215A (en) * 2008-09-28 2009-02-11 炬力集成电路设计有限公司 Data processing apparatus and method for saving memory space
US20090100362A1 (en) * 2007-10-10 2009-04-16 Microsoft Corporation Template based method for creating video advertisements
CN106293510A (en) * 2016-07-21 2017-01-04 中国农业银行股份有限公司 A kind of data sharing method towards MVS and system
CN106598494A (en) * 2016-12-05 2017-04-26 东软集团股份有限公司 Data statistical method and apparatus
CN107566206A (en) * 2017-08-04 2018-01-09 华为技术有限公司 A kind of flow-measuring method, equipment and system
CN109684052A (en) * 2018-12-26 2019-04-26 华为技术有限公司 Transaction analysis method, apparatus, equipment and storage medium

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20090100362A1 (en) * 2007-10-10 2009-04-16 Microsoft Corporation Template based method for creating video advertisements
CN101364215A (en) * 2008-09-28 2009-02-11 炬力集成电路设计有限公司 Data processing apparatus and method for saving memory space
CN106293510A (en) * 2016-07-21 2017-01-04 中国农业银行股份有限公司 A kind of data sharing method towards MVS and system
CN106598494A (en) * 2016-12-05 2017-04-26 东软集团股份有限公司 Data statistical method and apparatus
CN107566206A (en) * 2017-08-04 2018-01-09 华为技术有限公司 A kind of flow-measuring method, equipment and system
CN109684052A (en) * 2018-12-26 2019-04-26 华为技术有限公司 Transaction analysis method, apparatus, equipment and storage medium

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
李平: "POF-ICN架构中的边缘缓存研究", 中国优秀硕士学位论文全文数据库(信息科技辑), pages 139 - 29 *

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116881338A (en) * 2023-09-07 2023-10-13 北京傲星科技有限公司 Data mining method and related equipment for data stream based on large model
CN116881338B (en) * 2023-09-07 2024-01-26 北京傲星科技有限公司 Data mining method and related equipment for data stream based on large model

Similar Documents

Publication Publication Date Title
Oh et al. Finding near-optimal configurations in product lines by random sampling
CN109741060B (en) Information inquiry system, method, device, electronic equipment and storage medium
JP6307169B2 (en) System and method for rapid data analysis
US20080005106A1 (en) System and method for automatic weight generation for probabilistic matching
CN108804459B (en) Data query method and device
EP2364473A2 (en) Method and system for clustering data points
CN111737295B (en) Database cursor query method, device, equipment and storage medium
CN110909015A (en) Splitting method, device and equipment of microservice and storage medium
CN112765282A (en) Data online analysis processing method, device, equipment and storage medium
WO2015192798A1 (en) Topic mining method and device
CN112597201A (en) Element counting method, device, readable medium and equipment
CN104794130B (en) Relation query method and device between a kind of table
CN113553341A (en) Multidimensional data analysis method, multidimensional data analysis device, multidimensional data analysis equipment and computer readable storage medium
CN111090669A (en) Data query method and device based on space-time collision
CN114020790A (en) Data query method and device
CN105302827B (en) A kind of searching method and equipment of event
CN111797095B (en) Index construction method and JSON data query method
CN107330031B (en) Data storage method and device and electronic equipment
CN112765118B (en) Log query method, device, equipment and storage medium
CN111078671A (en) Method, device, equipment and medium for modifying data table field
Li et al. Cardinality estimation: Is machine learning a silver bullet
CN111143398B (en) Extra-large set query method and device based on extended SQL function
CN104809146B (en) System and method for determining index of the object in object sequence
CN108846103B (en) Data query method and device
CN111694891B (en) Data table processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination