CN105577455A - Method and system for performing real-time UV statistic of massive logs - Google Patents
Method and system for performing real-time UV statistic of massive logs Download PDFInfo
- Publication number
- CN105577455A CN105577455A CN201610126930.0A CN201610126930A CN105577455A CN 105577455 A CN105577455 A CN 105577455A CN 201610126930 A CN201610126930 A CN 201610126930A CN 105577455 A CN105577455 A CN 105577455A
- Authority
- CN
- China
- Prior art keywords
- real
- time
- pvlog
- bitarray
- daily record
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/06—Management of faults, events, alarms or notifications
- H04L41/069—Management of faults, events, alarms or notifications using logs of notifications; Post-processing of notifications
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04L—TRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
- H04L41/00—Arrangements for maintenance, administration or management of data switching networks, e.g. of packet switching networks
- H04L41/14—Network analysis or design
Landscapes
- Engineering & Computer Science (AREA)
- Computer Networks & Wireless Communication (AREA)
- Signal Processing (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a method and system for performing real-time UV statistic of massive logs. The system comprises a Bloom filter creating and initializing module, a real-time log receiving module, a log processing module and a result output module. In the system for performing real-time UV statistic realized on the basis of a Bloom filter provided by the invention, each real-time PV Log is smartly mapped onto the corresponding number of specific bits of a bit array within a constant time by selecting a plurality of hash functions; and then, a current UV value can be calculated in real time through simple judgement. The system has the advantages of being simple to implement, few in system resource occupation, high in operation efficiency, real-time and the like. By adopting the method disclosed by the invention, few memory resources are occupied; that is to say, the system has the optimal space complexity; few processor resources are occupied; that is to say, the system has the optimal time complexity; and thus, dynamic real-time calculation of UV can be carried out very conveniently.
Description
Technical field
The present invention relates to the large data fields in the Internet, particularly a kind of method and system of massive logs being carried out to real-time UV statistics.
Background technology
UV is writing a Chinese character in simplified form of uniquevisitor, refers to by internet access, the natural person browsing this webpage or APP, and Chinese is also known as independent access user.UV is the concept of an actual user of reflection, each isolated user relative to each ip, a more corresponding actual viewer.Use UV as statistic, can understand more accurately in the unit interval and in fact have how many visitors to come the corresponding page, be an important indicator of user's service condition of measurement website or APP.
Relative to UV, an important concept is also had to be exactly PV.PV is writing a Chinese character in simplified form of PageView, i.e. page browsing amount, and in certain measurement period, namely each refreshed web page of user is once calculated only once PV, the same with UV, and PV is also the important indicator that measurement website or APP access situation.The each refreshed web page system of user can record an access log, and access log, also known as PVLog, exists usually in the form of a file.Whom every bar access log generally at least will record and when have accessed what page, and according to the actual requirements, the information also having other is by record together.
From the definition of these two concepts of PV and UV, UV obtains from carrying out duplicate removal calculating to the same subscriber PVLog within a period of time, a period of time scope herein refers to the time cycle that UV adds up, can be sky or hour, corresponding sky level UV or hour level UV.
Due to, the calculating of UV can by calculating the duplicate removal of the same subscriber in PVLog (PV daily record) within a period of time.We know, for large-scale website, PVLog is magnanimity usually, and such as the day PV of certain well-known C2C electricity business site search page domestic reaches billions of level, and daily record is all dynamically produce and constantly generate.So the problem of UV statistics just develops into the problem of how carrying out effective duplicate removal in ultra-large data centralization, UV statistics then means and carries out real-time repetition removal in ever-increasing super large data centralization in real time.
Traditional method of carrying out UV calculating is based on Hash table (hashtable), and there are the following problems for this scheme:
1., for the hashvalue cryptographic Hash of 32bit, when the element of hashtable Hash table reaches 100,000 grades, the collision probability inserting element will higher than 50%; And the collision probability of 1,000,000,000 grades of hashtable to be reduced to less than 1%, then hashvalue be at least 64bit. and now the EMS memory occupation of hashtable reach more than 20G, when data set continue increase time, memory cost likely exceedes the restriction of unit.
The element of 2.10 hundred million grades inserts, and resize is adjusted size tens of times by hashtable, and the expense of resize exponentially increases, and for the demand that some calculates in real time, this scheme will become unavailable.
Summary of the invention
The technical problem to be solved in the present invention is, by based on Bloom filter, and then can carry out real-time repetition removal in super large data centralization, realize quick UV and add up.
Real-time UV statistical system is achieved based on Bloom filter (BloomFilter), cleverly each real-time PVLog " mapping " in constant time, on the specific bit position of the corresponding number of bit array, then is judged to calculate current UV value in real time by simple by selecting several hash functions.There is the advantages such as realization is simple, occupying system resources is few, operational efficiency is high, real-time.
Solve the problems of the technologies described above, the invention provides a kind of method of massive logs being carried out to real-time UV statistics, comprising:
Gather the daily record of PVLog page browsing amount, carry out distribution afterwards etc. pending; UV counter is set simultaneously;
Create BloomFilter Bloom filter, BitArray bit array is created in the heap memory of current process, and the Hash function that definition k is different, K is the number of Hash function in Bloom filter, and element (PVLog) is mapped to a position in BitArray by each Hash function.
Positions all in BitArray are all initialized as 0;
The pending PVLog such as to receive, and to wherein every bar PVLog by k the bit position of k different Hash Function Mapping to BitArray;
Judge whether above-mentioned k bit position is all 1, if not then described UV counter adds 1, and this k bit position is all set to 1;
Export the value of UV counter, complete UV statistics.
Further, the mode defining the individual different Hash function of k is:
To each Hash function according to the mode of even random distribution by element hash in diverse location, k different Hash function is then by individual for element hash to k different position.
Further, the method creating Bloom filter comprises:
When initial condition, be that all positions in the BitArray of m are all set to 0 for length;
For the set D={d1 having n element, d2......dn}, by k mapping function { f1, f2, ... fk}, is mapped as k value { y1, y2......yk} by each element di (1<=i<=n) in set D, again by array [y1] corresponding in BitArray, array [y2] ... array [yk] set is 1.
Further, the method gathering the daily record of PVLog page browsing amount is that front end page js reports, background server reports or mobile terminal client sdk reports.
Further, judge whether above-mentioned k bit position is all 1, if then skip described UV counter, does not count, continuing to receive needs PVLog to be processed.
Based on said method, present invention also offers the system of massive logs being carried out to real-time UV statistics, comprising:
Bloom filter creates and initialization module, described Bloom filter creates with initialization module in order to create BloomFilter Bloom filter, BitArray bit array is created in internal memory, and the Hash function that definition k is different, positions all in BitArray are all initialized as 0;
Daily record real-time reception module, described daily record real-time reception module is in order to the pending PVLog such as to receive;
Log processing module, described log processing module in order to bar PVLog every in the pending PVLog of equity by k the bit position of k different Hash Function Mapping to BitArray; Judge whether above-mentioned k bit position is all 1, if not then described UV counter adds 1, and this k bit position is all set to 1;
Result output module, described result output module, in order to export the value of UV counter, completes UV statistics.
Further, system also comprises, website PV daily record Real-time Collection unit,
In order to be reported by front end page js, the PV daily record collected is sent to described log processing module by background server reports or mobile terminal client sdk reports mode in real time.
Further, system also comprises distribution subsystem,
In order to pass through scribe collector journal, and the daily record of Real-time Collection is distributed to described log processing module.
Further, described daily record real-time reception module sends out in order to real-time reception the PVLog that subsystem sends in real time, and PVLog is transmitted to log processing module.
Further, described result output module is in order to output to external file, database, shared drive and KV storage engines in real time by the value of UV counter.
Beneficial effect of the present invention:
1) less memory source is taken, namely more excellent space complexity
According to above to the derivation conclusion of Bloom filter Falsepositives, for UV statistical demand herein, suppose that the record number of PVLog is 1,000,000,000, i.e. n=10 hundred million, if acceptable error rate is 0.01, then size m ≈ 1,000,000,000 * 9.585 of BitArray, the memory space <1.2G of BitArray, even if we are reduced to 0.0001 acceptable error rate, then size m ≈ 1,000,000,000 * 19.170 of BitArray, the memory space of BitArray is still less than 2.3G. for UV statistics, ten thousand/ error be generally acceptable.
2) less processor resource is taken, namely more excellent time complexity
There is not frequent impact when extensive element that hashtable faces inserts and repeatedly resize problem, the set of the mapping of k hash function and k bit position is all constant time, so the time complexity of whole process is O (N), for linearly.
3) dynamic realtime that can carry out UV very easily calculates
During the PV daily record of general website or APP, dynamic realtime produces, as long as the PVLogFeed received goes processing module to the grand filtration of cloth.
Accompanying drawing explanation
Fig. 1 is a kind of method flow schematic diagram massive logs being carried out to real-time UV statistics in one embodiment of the invention.
Fig. 2 is the define method schematic flow sheet of the Hash function in Fig. 1.
Fig. 3 is the method idiographic flow schematic diagram creating Bloom filter in Fig. 1.
Fig. 4 is the method flow schematic diagram gathering the daily record of PVLog page browsing amount in Fig. 1.
Fig. 5 is another operating procedure schematic flow sheet in Fig. 1.
Fig. 6 is a kind of system configuration schematic diagram massive logs being carried out to real-time UV statistics in one embodiment of the invention.
Fig. 7 is preferred implementation schematic diagram in Fig. 6.
Fig. 8 is preferred implementation schematic diagram in Fig. 6.
Fig. 9 is the BitArray bit array schematic diagram in Fig. 3.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
Fig. 1 is a kind of method flow schematic diagram massive logs being carried out to real-time UV statistics in one embodiment of the invention.
In the present embodiment, step is specifically comprised as follows:
Step S101 gathers the daily record of PVLog page browsing amount, and to carry out after distribution etc. pending, PV is also an important indicator of measurement website or APP access situation.The each refreshed web page system of user can record an access log, and access log, also known as PVLog, exists usually in the form of a file.Whom every bar access log generally at least will record and when have accessed what page, and according to the actual requirements, the information also having other is by record together.
Step S102 arranges UV counter simultaneously,
Step S103 creates BloomFilter Bloom filter, and BloomFilter is proposed in 1970 by cloth grand (BurtonHowardBloom).It is actually and is made up of a very long binary vector and a series of random mapping function, and Bloom filter may be used for retrieval element whether in a set.Its advantage is that space efficiency and query time are all considerably beyond general algorithm.In daily life, be included in when designing a calculating machine software, we often will judge an element whether in a set.Such as in Word, inspection English word is needed whether to spell correctly (namely will judge it whether in known dictionary); In web crawlers, whether accessed mistake of network address etc.The most direct method is exactly deposit in a computer by whole element in set, when running into a new element, is directly compared by the element in it and set.In general, the set Hash table (hashtable) in computer stores.Its benefit is quick and precisely, and shortcoming is expense memory space.The basic thought of Bloom filter is, by a Hash function element map can be become a point in a bit array (BitArray), if look at this point whether 1 just know can gather in whether have it.
Step S104 creates BitArray bit array in the heap memory of current process,
Step S105 defines k different Hash function,
Positions all in BitArray are all initialized as 0 by step S106,
Step S107 such as to receive at the pending PVLog, and to wherein every bar PVLog by k the bit position of k different Hash Function Mapping to BitArray,
Does step S108 judge whether above-mentioned k bit position is all 1?
If not, then enter step S109UV counter and add 1, and this k bit position is all set to 1
Step S110 exports the value of UV counter, completes UV statistics.
In the present embodiment, create and initialization BitArray (all bit are initialized as 0) in the internal memory of the heap of process, the individual different hash function of definition k is (each with the even random distribution of uniformrandomdistribution, one by element hash to m diverse location), concrete grammar is:
Opening up the bit array (BitArray is initialized as 0) that a length is m, when initial condition, is that all positions of the bit array array of m are both initialized to 0 for length.For the set D={d1 having n element, d2......dn}, by k mapping function { f1, f2, ... fk}, is mapped as k value { y1, y2......yk} by each element di (1<=i<=n) in set D, again by array [y1] corresponding in bit array array, array [y2] ... array [yk] set is 1.
Fig. 2 is the define method schematic flow sheet of the Hash function in Fig. 1.
In the present embodiment, the define method of Hash function comprises:
Step S201 to each Hash function according to the mode of even random distribution by element hash in diverse location,
The individual different Hash function of step S202k is then by individual for element hash to k different position.
In the present embodiment, create and initialization BitArray (all bit are initialized as 0) in the internal memory of the heap of process, the hash function (each with the even random distribution of uniformrandomdistribution, one by element hash to m diverse location) that definition k is different.
Fig. 3 is the method idiographic flow schematic diagram creating Bloom filter in Fig. 1.
Preferred as in the present embodiment, the method creating Bloom filter is specific as follows:
Step S301, when initial condition, is that all positions in the BitArray of m are all set to 0 for length;
Step S302 for the set D={d1 having n element, d2......dn}, by k mapping function { f
1, f
2... f
k;
Each element di (1<=i<=n) in set D is mapped as k value { y by step S303
1, y
2... y
k;
Step S304 is by array [y1] corresponding in BitArray, array [y2] ... array [yk] set is 1.
Can be BitArray bit array schematic diagrames in Fig. 3 with reference to figure 9.
According to above-mentioned step S301 ~ S304, wherein, X, y, z, w are the element whether in set to be determined, and wherein x, y, z 3 positions in 3 Hash function hash to BitArray are all 1, so in set, and w has a position to be 0, then not in set.
Fig. 4 is the method flow schematic diagram gathering the daily record of PVLog page browsing amount in Fig. 1.
Preferred as in the present embodiment, the Log page browsing amount daily record in step S101 is:
Step S401, front end page js reports;
Or carry out step S402, background server reports;
Or carry out step S403, mobile terminal client sdk reports.
Above-mentioned method includes but not limited to: by the mode of http daily record data post to central log.
Fig. 5 is another operating procedure schematic flow sheet in Fig. 1.
Does step S108 judge whether above-mentioned k bit position is all 1?
Then enter step S111 if not and skip described UV counter, do not count;
Step S112 continues to receive needs PVLog to be processed.
Fig. 6 is a kind of system configuration schematic diagram massive logs being carried out to real-time UV statistics in one embodiment of the invention.
In the present embodiment, additionally provide a kind of system 100 structure of massive logs being carried out to real-time UV statistics, comprise following structure:
Bloom filter creates and initialization module 1001, in order to create BloomFilter Bloom filter, creates BitArray bit array in internal memory, and the Hash function that definition k is different, and positions all in BitArray are all initialized as 0;
Daily record real-time reception module 1002, in order to the pending PVLog such as to receive;
Log processing module 1003, in order to bar PVLog every in the PVLog that equity is pending by k the bit position of the individual different Hash Function Mapping of k to BitArray; Judge whether above-mentioned k bit position is all 1, if not then described UV counter adds 1, and this k bit position is all set to 1;
Result output module 1004, in order to export the value of UV counter, completes UV statistics.
Wherein, Bloom filter creates to be responsible for creating in internal memory and initialization BitArray (all bit are initialized as 0) with initialization module 1001, the individual different hash function of definition k is (each with the even random distribution of uniformrandomdistribution, one by element hash to m diverse location), concrete grammar is:
Opening up the bit array (BitArray is initialized as 0) that a length is m, when initial condition, is that all positions of the bit array array of m are both initialized to 0 for length.For the set D={d1 having n element, d2......dn}, by k mapping function { f1, f2, ... fk}, is mapped as k value { y1, y2......yk} by each element di (1<=i<=n) in set D, again by array [y1] corresponding in bit array array, array [y2] ... array [yk] set is 1
Wherein, daily record real-time reception module 1002 is responsible for the PVLog that real-time reception PV log collection and distribution subsystem send in real time, and PVLog is transmitted to log processing module
Wherein, log processing module 1003 is responsible for k the bit position every bar PVLog being mapped to BitArray by k different hash function, judges whether this k bit position is all 1, if not then UV counter adds 1, and this k bit position is all set to 1, if then skip.This module is the nucleus module of native system.
Wherein, result output module 1004 is responsible for the value of UV counter to be exported by modes such as interfaces.
Fig. 7 is preferred implementation schematic diagram in Fig. 6.
In the present embodiment, massive logs is carried out to the system of real-time UV statistics, comprising:
Bloom filter creates and initialization module 1001, in order to create BloomFilter Bloom filter, creates BitArray bit array in internal memory, and the Hash function that definition k is different, and positions all in BitArray are all initialized as 0;
Daily record real-time reception module 1002, in order to the pending PVLog such as to receive;
Log processing module 1003, in order to bar PVLog every in the PVLog that equity is pending by k the bit position of the individual different Hash Function Mapping of k to BitArray; Judge whether above-mentioned k bit position is all 1, if not then described UV counter adds 1, and this k bit position is all set to 1;
Result output module 1004, in order to export the value of UV counter, completes UV statistics.
Preferred as in the present embodiment, system also comprises website PV daily record Real-time Collection unit, and in order to be reported by front end page js, the PV daily record collected is sent to described log processing module by background server reports or mobile terminal client sdk reports mode in real time.
Fig. 8 is preferred implementation schematic diagram in Fig. 6.
Bloom filter creates and initialization module 1001, in order to create BloomFilter Bloom filter, creates BitArray bit array in internal memory, and the Hash function that definition k is different, and positions all in BitArray are all initialized as 0;
Daily record real-time reception module 1002, in order to the pending PVLog such as to receive;
Log processing module 1003, in order to bar PVLog every in the PVLog that equity is pending by k the bit position of the individual different Hash Function Mapping of k to BitArray; Judge whether above-mentioned k bit position is all 1, if not then described UV counter adds 1, and this k bit position is all set to 1;
Result output module 1004, in order to export the value of UV counter, completes UV statistics.Preferred as in the present embodiment, described result output module is in order to output to external file, database, shared drive and KV storage engines in real time by the value of UV counter.
Preferred as in the present embodiment, system also comprises distribution subsystem, in order to by scribe collector journal, and the daily record of Real-time Collection is distributed to described log processing module.
Those of ordinary skill in the field are to be understood that: more than; describedly be only specific embodiments of the invention, be not limited to the present invention, within the spirit and principles in the present invention all; any amendment of making, equivalent replacement, improvement etc., all should be included within protection scope of the present invention.
Claims (10)
1. massive logs is carried out to a method for real-time UV statistics, it is characterized in that comprising:
Gather the daily record of PVLog page browsing amount, carry out distribution afterwards etc. pending; UV counter is set simultaneously;
Create BloomFilter Bloom filter, in the heap memory of current process, create BitArray bit array, and the Hash function that definition k is different;
Positions all in BitArray are all initialized as 0;
The pending PVLog such as to receive, and to wherein every bar PVLog by k the bit position of k different Hash Function Mapping to BitArray;
Judge whether above-mentioned k bit position is all 1, if not then described UV counter adds 1, and this k bit position is all set to 1;
Export the value of UV counter, complete UV statistics.
2. method of massive logs being carried out to real-time UV statistics according to claim 1, is characterized in that, the mode of the Hash function that definition k is different is:
To each Hash function according to the mode of even random distribution by element hash in diverse location, k different Hash function is then by individual for element hash to k different position.
3. method of massive logs being carried out to real-time UV statistics according to claim 1, is characterized in that, the method creating Bloom filter comprises:
When initial condition, be that all positions in the BitArray of m are all set to 0 for length;
For the set D={d1 having n element, d2......dn}, by k mapping function { f1, f2, ... fk}, is mapped as k value { y1, y2......yk} by each element di (1<=i<=n) in set D, again by array [y1] corresponding in BitArray, array [y2] ... array [yk] set is 1.
4. method of massive logs being carried out to real-time UV statistics according to claim 1, is characterized in that, the method gathering the daily record of PVLog page browsing amount is,
Front end page js reports, background server reports or mobile terminal client sdk reports.
5. method of massive logs being carried out to real-time UV statistics according to claim 1, is characterized in that, judges whether above-mentioned k bit position is all 1, if then skip described UV counter, does not count, and continuing to receive needs PVLog to be processed.
6. pair massive logs carries out the system of real-time UV statistics, it is characterized in that, comprising:
Bloom filter creates and initialization module, described Bloom filter creates with initialization module in order to create BloomFilter Bloom filter, BitArray bit array is created in internal memory, and the Hash function that definition k is different, positions all in BitArray are all initialized as 0;
Daily record real-time reception module, described daily record real-time reception module is in order to the pending PVLog such as to receive
Log processing module, described log processing module in order to bar PVLog every in the pending PVLog of equity by k the bit position of k different Hash Function Mapping to BitArray; Judge whether above-mentioned k bit position is all 1, if not then described UV counter adds 1, and this k bit position is all set to 1;
Result output module, described result output module, in order to export the value of UV counter, completes UV statistics.
7. system of massive logs being carried out to real-time UV statistics according to claim 6, is characterized in that, also comprise, website PV daily record Real-time Collection unit,
In order to be reported by front end page js, the PV daily record collected is sent to described log processing module by background server reports or mobile terminal client sdk reports mode in real time.
8. system of massive logs being carried out to real-time UV statistics according to claim 6, is characterized in that, also comprise distribution subsystem,
In order to pass through scribe collector journal, and the daily record of Real-time Collection is distributed to described log processing module.
9. system of massive logs being carried out to real-time UV statistics according to claim 7, is characterized in that, described daily record real-time reception module sends out in order to real-time reception the PVLog that subsystem sends in real time, and PVLog is transmitted to log processing module.
10. system of massive logs being carried out to real-time UV statistics according to claim 8, is characterized in that, described result output module is in order to output to external file, database, shared drive and KV storage engines in real time by the value of UV counter.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610126930.0A CN105577455A (en) | 2016-03-07 | 2016-03-07 | Method and system for performing real-time UV statistic of massive logs |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610126930.0A CN105577455A (en) | 2016-03-07 | 2016-03-07 | Method and system for performing real-time UV statistic of massive logs |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105577455A true CN105577455A (en) | 2016-05-11 |
Family
ID=55887152
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610126930.0A Pending CN105577455A (en) | 2016-03-07 | 2016-03-07 | Method and system for performing real-time UV statistic of massive logs |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN105577455A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294090A (en) * | 2016-08-03 | 2017-01-04 | 五八同城信息技术有限公司 | A kind of data statistical approach and device |
CN108900619A (en) * | 2018-07-06 | 2018-11-27 | 阿里巴巴集团控股有限公司 | A kind of independent Statistics of accessing population method and device |
WO2021082936A1 (en) * | 2019-10-30 | 2021-05-06 | 深圳前海微众银行股份有限公司 | Method and apparatus for counting number of webpage visitors |
CN114385922A (en) * | 2022-01-17 | 2022-04-22 | 上海阿法迪智能数字科技股份有限公司 | Library system knowledge recommendation method based on bloom filter |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102253820A (en) * | 2011-06-16 | 2011-11-23 | 华中科技大学 | Stream type repetitive data detection method |
CN104252532A (en) * | 2014-09-11 | 2014-12-31 | 北京优特捷信息技术有限公司 | Website information statistic method and device |
WO2015168262A2 (en) * | 2014-05-01 | 2015-11-05 | Coho Data, Inc. | Systems, devices and methods for generating locality-indicative data representations of data streams, and compressions thereof |
-
2016
- 2016-03-07 CN CN201610126930.0A patent/CN105577455A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102253820A (en) * | 2011-06-16 | 2011-11-23 | 华中科技大学 | Stream type repetitive data detection method |
WO2015168262A2 (en) * | 2014-05-01 | 2015-11-05 | Coho Data, Inc. | Systems, devices and methods for generating locality-indicative data representations of data streams, and compressions thereof |
CN104252532A (en) * | 2014-09-11 | 2014-12-31 | 北京优特捷信息技术有限公司 | Website information statistic method and device |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294090A (en) * | 2016-08-03 | 2017-01-04 | 五八同城信息技术有限公司 | A kind of data statistical approach and device |
CN108900619A (en) * | 2018-07-06 | 2018-11-27 | 阿里巴巴集团控股有限公司 | A kind of independent Statistics of accessing population method and device |
CN108900619B (en) * | 2018-07-06 | 2022-01-11 | 创新先进技术有限公司 | Independent visitor counting method and device |
WO2021082936A1 (en) * | 2019-10-30 | 2021-05-06 | 深圳前海微众银行股份有限公司 | Method and apparatus for counting number of webpage visitors |
CN114385922A (en) * | 2022-01-17 | 2022-04-22 | 上海阿法迪智能数字科技股份有限公司 | Library system knowledge recommendation method based on bloom filter |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN104426713B (en) | The monitoring method and device of web site access effect data | |
De Choudhury et al. | How does the data sampling strategy impact the discovery of information diffusion in social media? | |
CN105590055B (en) | Method and device for identifying user credible behaviors in network interaction system | |
CN100462979C (en) | Distributed indesx file searching method, searching system and searching server | |
CN105577455A (en) | Method and system for performing real-time UV statistic of massive logs | |
CN101841435B (en) | Method, apparatus and system for detecting abnormality of DNS (domain name system) query flow | |
CN100589418C (en) | The generation method and the generation system of alarm correlation rule | |
CN103729478B (en) | LBS interest point discovery method based on MapReduce | |
CN103580939B (en) | A kind of unexpected message detection method and equipment based on account attribute | |
CN102473085A (en) | Method and system for data logging and analysis | |
CN110347716A (en) | Daily record data processing method, device, terminal and storage medium | |
CN102681999A (en) | Method and device for collecting and sending user action information | |
Zhang et al. | Enhancing traffic incident detection by using spatial point pattern analysis on social media | |
CN102521248A (en) | Network user classification method and device | |
CN103036977A (en) | Business pushing method and pushing system based on content distribution network | |
CN109739919A (en) | A kind of front end processor and acquisition system for electric system | |
CN103778226A (en) | Method for establishing language information recognition model and language information recognition device | |
CN108268569A (en) | The acquisition of water resource monitoring data and analysis system and method based on big data technology | |
CN101421751A (en) | Method and system for transaction monitoring in a communication network | |
CN103544150B (en) | For browser of mobile terminal provides the method and system of recommendation information | |
Han et al. | A comparative analysis on Weibo and Twitter | |
CN109635084A (en) | A kind of real-time quick De-weight method of multi-source data document and system | |
CN104965863A (en) | Object clustering method and apparatus | |
CN109783553A (en) | A kind of power distribution network mass data increased quality system | |
CN111666344A (en) | Heterogeneous data synchronization method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160511 |