CN104252532A - Website information statistic method and device - Google Patents

Website information statistic method and device Download PDF

Info

Publication number
CN104252532A
CN104252532A CN201410461656.3A CN201410461656A CN104252532A CN 104252532 A CN104252532 A CN 104252532A CN 201410461656 A CN201410461656 A CN 201410461656A CN 104252532 A CN104252532 A CN 104252532A
Authority
CN
China
Prior art keywords
field
counted
item
statistics
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410461656.3A
Other languages
Chinese (zh)
Inventor
陈军
梁玫娟
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd
Original Assignee
BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd filed Critical BEIJING YOUTEJIE INFORMATION TECHNOLOGY Co Ltd
Priority to CN201410461656.3A priority Critical patent/CN104252532A/en
Publication of CN104252532A publication Critical patent/CN104252532A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/24Querying
    • G06F16/245Query processing
    • G06F16/2458Special types of queries, e.g. statistical queries, fuzzy queries or distributed queries
    • G06F16/2462Approximate or statistical queries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/25Integrating or interfacing systems involving database management systems
    • G06F16/258Data format conversion from or to a database
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/958Organisation or management of web site content, e.g. publishing, maintaining pages or automatic linking

Abstract

The invention discloses a website information statistic method and device and aims to improve the statistic efficiency. The method includes acquiring the website log data; converting the acquired log data into structural data according to preset regular expression; determining fields associated with the preset items to be counted in the structural data; counting the items to be counted according to the associated fields, and acquiring the counting result.

Description

A kind of method and device adding up site information
Technical field
The present invention relates to computing machine and communication technical field, particularly a kind of method and device adding up site information.
Background technology
Along with the development of Internet technology, surf the Net more than 3 hour for each person every day, and this quantity is also in increase.Paid close attention to maximum websites by user, often can attract more advertiser, collect higher advertising fee.In addition, in order to provide personalized service, the behavioural analysis for user also gets more and more.No matter be the behavioural analysis attracting advertiser or carry out user, all need to add up for site information.
The present inventor finds, at present when adding up the visit capacity of website, is arrange counter in webpage.When counting user behavior, then the webpage clicked by browser recording user is stored in the cookie of user this locality, and then is uploaded to operator to add up.But the information in cookie is easily modified or deletes, it is really not statistical uncertainty to cause.Further, the statistical for each objects of statistics is different, when carrying out the statistics of much information, and the loaded down with trivial details inconvenience of statistic processes.
Summary of the invention
The invention provides a kind of method and the device of adding up site information, in order to improve statistical efficiency.
The invention provides a kind of method of adding up site information, comprising:
Obtain the daily record data of website;
According to the regular expression preset, the daily record data of acquisition is converted to structural data;
Determine field relevant with the item to be counted preset in structural data;
For item to be counted, add up according to described relevant field, obtain statistics.
The present embodiment utilizes daily record data to carry out various Information Statistics, and carry out multiple statistics by the data in a source, statistic processes is convenient and swift.
Optionally, the regular expression that described basis is preset, is converted to structural data by the daily record data of acquisition, comprises:
The multiple regular expression preset is mated with the daily record data of acquisition;
According to the regular expression that the match is successful, the daily record data of acquisition is converted to structural data.
By mating the structure of correct regular expression identification daily record data in the present embodiment, and daily record data being changed to structural data, being convenient to follow-up acquisition data and statistics, effectively improve accuracy rate and the efficiency of statistics.
Optionally, described for item to be counted, add up according to described relevant field, obtain statistics, comprising:
For item to be counted, using first field relevant with item to be counted as index;
Add up according to the second field relevant with item to be counted, obtain statistics.
In the present embodiment, daily record data not only provides data, can also be provided for the field of index, can, according to the data of certain other field of statistics, make statistical more versatile and flexible, improves statistical efficiency.
Optionally, described for item to be counted, add up according to described relevant field, obtain statistics, comprising:
For item to be counted, filter out the repeating data of relevant field;
Add up according to filter result, obtain statistics.
By filtering repeating data in the present embodiment, effectively improve the accuracy rate of statistics.
Optionally, described determine in structural data with the relevant field of item to be counted preset, comprising:
According to the implication of each field in the regular expression determination structural data preset;
The field relevant with the item to be counted preset is determined according to the implication of each field.
Can be determined the implication of each field in structural data in the present embodiment by regular expression, and then be convenient to realize various data statistics, statistical item is more, and statistical efficiency is higher.
Add up a device for site information, comprising:
Acquisition module, for obtaining the daily record data of website;
Modular converter, for according to the regular expression preset, is converted to structural data by the daily record data of acquisition;
Field module, for determining field relevant with the item to be counted preset in structural data;
Statistical module, for for item to be counted, adds up according to described relevant field, obtains statistics.
Optionally, described modular converter comprises:
Matching unit, for mating the multiple regular expression preset with the daily record data of acquisition;
Converting unit, for according to the regular expression that the match is successful, is converted to structural data by the daily record data of acquisition.
Optionally, described statistical module comprises:
Indexing units, for for item to be counted, using first field relevant with item to be counted as index;
First statistic unit, for adding up according to the second field relevant with item to be counted, obtains statistics.
Optionally, described statistical module comprises:
Filter element, for for item to be counted, filters out the repeating data of relevant field;
Second statistic unit, for adding up according to filter result, obtains statistics.
Optionally, described field module comprises:
Semantic primitive, for the implication according to each field in the regular expression determination structural data preset;
FU, for the field that the item to be counted determined according to the implication of each field with preset is relevant.
Other features and advantages of the present invention will be set forth in the following description, and, partly become apparent from instructions, or understand by implementing the present invention.Object of the present invention and other advantages realize by structure specifically noted in write instructions, claims and accompanying drawing and obtain.
Below by drawings and Examples, technical scheme of the present invention is described in further detail.
Accompanying drawing explanation
Accompanying drawing is used to provide a further understanding of the present invention, and forms a part for instructions, together with embodiments of the present invention for explaining the present invention, is not construed as limiting the invention.In the accompanying drawings:
Fig. 1 is the process flow diagram of the method for adding up site information in the embodiment of the present invention;
Fig. 2 is the process flow diagram of the method for adding up site information in the embodiment of the present invention;
Fig. 3 is the process flow diagram of the method for adding up site information in the embodiment of the present invention;
Fig. 4 is the structural drawing of the device adding up site information in the embodiment of the present invention;
Fig. 5 is the structural drawing of modular converter in the embodiment of the present invention;
Fig. 6 is the structural drawing of statistical module in the embodiment of the present invention;
Fig. 7 is the structural drawing of statistical module in the embodiment of the present invention;
Fig. 8 is the structural drawing of field module in the embodiment of the present invention.
Embodiment
Below in conjunction with accompanying drawing, the preferred embodiments of the present invention are described, should be appreciated that preferred embodiment described herein is only for instruction and explanation of the present invention, is not intended to limit the present invention.
At present, Virtual network operator needs the access situation of awareness network, to attract advertisement and to carry out user behavior analysis.When carrying out the statistics of network access situation, such as, add up the click volume of website, then embed counter at website homepage.And for example, need counting user at the residence time of website, then in webpage, embed timer.Often add up one and just will embed a plug-in unit in webpage, realize complicated loaded down with trivial details.And very limited by the data embedding plug-in unit acquisition, limit the project of statistics.A kind of mode is also had to be the data obtained from the cookie of client for adding up.But cookie is positioned at client, the authenticity of its data and security can not ensure, it is really not statistical uncertainty to cause.
For solving this problem, the present embodiment utilizes network log data to add up.Daily record data includes all internet behaviors of user, and data are comprehensive.And network log derives from Virtual network operator, data are reliably not easily modified.By this Data Source of network log data, can carry out much information statistics, eliminate the process embedding plug-in unit, when carrying out much information statistics, statistical efficiency is higher.
See Fig. 1, the flow process of adding up the method for site information in the present embodiment comprises:
Step 101: the daily record data obtaining website.
Step 102: according to the regular expression preset, the daily record data of acquisition is converted to structural data.
Step 103: determine field relevant with the item to be counted preset in structural data.
Step 104: for item to be counted, adds up according to described relevant field, obtains statistics.
The present embodiment obtains the daily record data of website, and this Data Source is reliable, and the abundant information comprised is comprehensive, can support multiple statistical item.Daily record data is converted to structural data, achieves log-structured identification, be convenient to follow-up data and extract and statistical study.By this Data Source of daily record data, can realize multiple statistics, eliminate the process embedding plug-in unit, when carrying out much information statistics, statistical efficiency is higher.
In one embodiment, step 102 comprises steps A 1 and steps A 2.
Steps A 1: the multiple regular expression preset is mated with the daily record data of acquisition.
Steps A 2: according to the regular expression that the match is successful, is converted to structural data by the daily record data of acquisition.
The structure of daily record is varied, and each website may define the log-structured of oneself.Multiple regular expression can be pre-set, mate with the daily record data obtained one by one.According to the structure of the regular expression determination daily record data that the match is successful.
Such as, tomcat multirow daily record:
03?Jul?2014?10:21:39,940?ERROR[SinkRunner-PollingRunner-DefaultSinkProcessor]
(org.apache.flume.SinkRunner$PollingRunner.run:SinkRunner.java:160)-Unable?to?deliver?event.Exception?follows.
org.apache.flume.EventDeliveryException:Failed?to?open?file./flume-ng/1404354094868-1?while?delivering?event
at?org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:179)
at?org.apache.flume.sink.DefaultSinkProcessor.process(DefaultSinkProcessor.java:68)
at?org.apache.flume.SinkRunner$PollingRunner.run(SinkRunner.java:148)
at?java.lang.Thread.run(Thread.java:679)
Caused?by:java.io.FileNotFoundException:./flume-ng/1404354094868-1(No?such?file?or?directory)
at?java.io.FileOutputStream.open(Native?Method)
at?java.io.FileOutputStream.<init>(FileOutputStream.java:209)
at?java.io.FileOutputStream.<init>(FileOutputStream.java:160)
at?org.apache.flume.sink.RollingFileSink.process(RollingFileSink.java:171)
...3?more
Then regular expression (^ d+ s error) | (^.+Exception:.+) | (^ s+at.+) | (^ s+... d+more) | (^ s*Cause by:.+) and above-mentioned log matches success.Then the structure of above-mentioned daily record is (^ d+ s error) | (^.+Exception:.+) | and (^ s+at.+) | (^ s+... d+more) | (^ s*Cause by:.+).Relevant field is extracted, as ip (Internet protocol) address etc. according to statistical item.The field extracted is named (field_name), sets up field_name=field_value (field_value is the data of the field extracted), realize unstructured data to be converted into structural data.
If all it fails to match for all default regular expressions, then generate regular expression by semantics recognition according to daily record data, can also the regular expression generated be adjusted, obtain the regular expression mated with daily record data.
The present embodiment can also determine the implication of each field in structural data according to the regular expression preset;
The field relevant with the item to be counted preset is determined according to the implication of each field.
Such as, the WEB server program Apache of website or Nginx can record user's access each time in daily record, and user accesses a corresponding Apache Access Log each time:
182.147.162.5--[07/Aug/2014:17:50:42+0800]"POST/api/emotion/update?fr=37573233?HTTP/1.1"
2006382"-""MomoChat/5.1?Android/216(GT-I9300;Android?4.0.4;Gapps?0;zh_CN;1)"
"-"0.0140.01410.83.68.111:900020019971360
The relevant field extracted:
clientip=182.147.162.5
timestamp=07/Aug/2014:17:50:42+0800
url=/api/emotion/update?fr=37573233
......
By the coupling of regular expression, can know that 182.147.162.5 is IP address.
07/Aug/2014:17:50:42+0800 is the access time./ api/emotion/update? fr=37573233 is the network address (i.e. URL (unified resource location)) etc. of access.Implication according to each field is the name of each field, then extracts corresponding data, generating structured data.Structural data shape as:
clientip=182.147.162.5
timestamp=07/Aug/2014:17:50:42+0800
url=/api/emotion/update?fr=37573233
......
If the visit capacity (Page View, PV) of statistics website, then determine the value of the url of whole website according to url field, the value number of times occurred and the visit capacity being whole website of this url.
If add up the visit capacity of each url, determine the value of url according to url field, the number of times of the value appearance of this url is the visit capacity of this url.
In one embodiment, step 104 comprises step B1 and step B2.
Step B1: for item to be counted, using first field relevant with item to be counted as index.
Step B2: add up according to the second field relevant with item to be counted, obtains statistics.
Such as, need counting user in the residence time of each url of website, then with field cookie for index, using the access of the access of same cookie as same user, by statistics timestamp and url, the residence time of user at each url of website can be obtained.
And for example, using timestamp as index, do above-mentioned statistics according to timestamp, the statistical value of different time sections can be obtained.
The present embodiment according to the needs of statistics, using relevant field as index, can realize abundanter statistics.
In one embodiment, step 104 comprises step C1 and step C2.
Step C1: for item to be counted, filters out the repeating data of relevant field.
Step C2: add up according to filter result, obtains statistics.
Such as, need statistical iteration number of users (Unique Visitor, UV), then add up the quantity after clientip in access log removes repetition, just obtain UV.
In the present embodiment, daily record data is that statistics provides most basic data, in order to realize various statistics object, can also process basic data, as filtered duplicate removal etc., to add up more accurately.And abundanter statistics can be realized.
On the basis of above statistics, the statistics of (with upper one year), chain rate (with a upper season, last month, upper one week) can also be carried out on year-on-year basis by the quantity of adding up different time sections; And, statistical information is supplied to user at page formation statistical report form or Visual Chart.
Below by several embodiment introduce in detail statistics site information implementation procedure.
See Fig. 2, the detailed process adding up the method for site information in the present embodiment comprises:
Step 201: the daily record data obtaining website.
Step 202: the multiple regular expression preset is mated with the daily record data of acquisition.
Step 203: according to the regular expression that the match is successful, names field and extracts data, realizes unstructured data to be converted into structural data.
Step 204: determine field relevant with the item to be counted preset in structural data.
Step 205: for item to be counted, adds up according to described relevant field, obtains statistics.
Step 206: adopt diagrammatic form to export statistics.
The present embodiment can know the implication of each field of daily record data by regular expression, and carries out precise designation according to the implication of field, thus non-structured daily record data is converted into structural data.And to the normalized name that the field of all daily record datas is all unified, the data daily record data of various form being converted into unified structure can be realized.Data basis is provided for accurately adding up efficiently.
See Fig. 3, the detailed process adding up the method for site information in the present embodiment comprises:
Step 301: the daily record data obtaining website.
Step 302: the multiple regular expression preset is mated with the daily record data of acquisition.
Step 303: according to the regular expression that the match is successful, names field and extracts data, realizes unstructured data to be converted into structural data.
Step 304: for item to be counted, using first field relevant with item to be counted as index.
Step 305: add up according to the second field relevant with item to be counted, obtains statistics.
Step 306: adopt diagrammatic form to export statistics.
In the present embodiment, the data providing statistics are not only in the effect of field, as the index of statistics, can also can support abundanter statistics.
By the foregoing describe separated statistics site information implementation procedure, this process can be realized by device, is introduced below to the inner structure of this device and function.
See Fig. 4, the device adding up site information in the present embodiment comprises: acquisition module 401, modular converter 402, field module 403 and statistical module 404.
Acquisition module 401, for obtaining the daily record data of website.
Modular converter 402, for according to the regular expression preset, is converted to structural data by the daily record data of acquisition.
Field module 403, for determining field relevant with the item to be counted preset in structural data.
Statistical module 404, for for item to be counted, adds up according to described relevant field, obtains statistics.
Optionally, shown in Figure 5, described modular converter 402 comprises: matching unit 4021 and converting unit 4022.
Matching unit 4021, for mating the multiple regular expression preset with the daily record data of acquisition.
Converting unit 4022, for according to the regular expression that the match is successful, is converted to structural data by the daily record data of acquisition.
Optionally, shown in Figure 6, described statistical module 404 comprises: indexing units 4041 and the first statistic unit 4042.
Indexing units 4041, for for item to be counted, using first field relevant with item to be counted as index.
First statistic unit 4042, for adding up according to the second field relevant with item to be counted, obtains statistics.
Optionally, shown in Figure 7, described statistical module 404 comprises: filter element 4043 and the second statistic unit 4044.
Filter element 4043, for for item to be counted, filters out the repeating data of relevant field;
Second statistic unit 4044, for adding up according to filter result, obtains statistics.
Optionally, shown in Figure 8, described field module 403 comprises: semantic primitive 4031 and FU 4032.
Semantic primitive 4031, for the implication according to each field in the regular expression determination structural data preset.
FU 4032, for the field that the item to be counted determined according to the implication of each field with preset is relevant.
The present embodiment can realize the statistics of multiple project by this kind of Data Source of daily record data, significantly improves statistical efficiency.And do not need to embed various plug-in unit in webpage, simplify structure of web page.In addition, daily record data derives from network, is not easily revised by user or deletes, and data are more reliable, contributes to the accuracy rate improving statistics.
Those skilled in the art should understand, embodiments of the invention can be provided as method, system or computer program.Therefore, the present invention can adopt the form of complete hardware embodiment, completely software implementation or the embodiment in conjunction with software and hardware aspect.And the present invention can adopt in one or more form wherein including the upper computer program implemented of computer-usable storage medium (including but not limited to magnetic disk memory and optical memory etc.) of computer usable program code.
The present invention describes with reference to according to the process flow diagram of the method for the embodiment of the present invention, equipment (system) and computer program and/or block scheme.Should understand can by the combination of the flow process in each flow process in computer program instructions realization flow figure and/or block scheme and/or square frame and process flow diagram and/or block scheme and/or square frame.These computer program instructions can being provided to the processor of multi-purpose computer, special purpose computer, Embedded Processor or other programmable data processing device to produce a machine, making the instruction performed by the processor of computing machine or other programmable data processing device produce device for realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be stored in can in the computer-readable memory that works in a specific way of vectoring computer or other programmable data processing device, the instruction making to be stored in this computer-readable memory produces the manufacture comprising command device, and this command device realizes the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
These computer program instructions also can be loaded in computing machine or other programmable data processing device, make on computing machine or other programmable devices, to perform sequence of operations step to produce computer implemented process, thus the instruction performed on computing machine or other programmable devices is provided for the step realizing the function of specifying in process flow diagram flow process or multiple flow process and/or block scheme square frame or multiple square frame.
Obviously, those skilled in the art can carry out various change and modification to the present invention and not depart from the spirit and scope of the present invention.Like this, if these amendments of the present invention and modification belong within the scope of the claims in the present invention and equivalent technologies thereof, then the present invention is also intended to comprise these change and modification.

Claims (10)

1. add up a method for site information, it is characterized in that, comprising:
Obtain the daily record data of website;
According to the regular expression preset, the daily record data of acquisition is converted to structural data;
Determine field relevant with the item to be counted preset in structural data;
For item to be counted, add up according to described relevant field, obtain statistics.
2. the method for statistics site information as claimed in claim 1, is characterized in that the daily record data of acquisition is converted to structural data, comprises by the regular expression that described basis is preset:
The multiple regular expression preset is mated with the daily record data of acquisition;
According to the regular expression that the match is successful, the daily record data of acquisition is converted to structural data.
3. the method for statistics site information as claimed in claim 1, is characterized in that, described for item to be counted, adds up, obtain statistics, comprising according to described relevant field:
For item to be counted, using first field relevant with item to be counted as index;
Add up according to the second field relevant with item to be counted, obtain statistics.
4. the method for statistics site information as claimed in claim 1, is characterized in that, described for item to be counted, adds up, obtain statistics, comprising according to described relevant field:
For item to be counted, filter out the repeating data of relevant field;
Add up according to filter result, obtain statistics.
5. the method for statistics site information as claimed in claim 1, is characterized in that, describedly determines field relevant with default item to be counted in structural data, comprising:
According to the implication of each field in the regular expression determination structural data preset;
The field relevant with the item to be counted preset is determined according to the implication of each field.
6. add up a device for site information, it is characterized in that, comprising:
Acquisition module, for obtaining the daily record data of website;
Modular converter, for according to the regular expression preset, is converted to structural data by the daily record data of acquisition;
Field module, for determining field relevant with the item to be counted preset in structural data;
Statistical module, for for item to be counted, adds up according to described relevant field, obtains statistics.
7. the device of statistics site information as claimed in claim 6, it is characterized in that, described modular converter comprises:
Matching unit, for mating the multiple regular expression preset with the daily record data of acquisition;
Converting unit, for according to the regular expression that the match is successful, is converted to structural data by the daily record data of acquisition.
8. the device of statistics site information as claimed in claim 6, it is characterized in that, described statistical module comprises:
Indexing units, for for item to be counted, using first field relevant with item to be counted as index;
First statistic unit, for adding up according to the second field relevant with item to be counted, obtains statistics.
9. the device of statistics site information as claimed in claim 6, it is characterized in that, described statistical module comprises:
Filter element, for for item to be counted, filters out the repeating data of relevant field;
Second statistic unit, for adding up according to filter result, obtains statistics.
10. the device of statistics site information as claimed in claim 6, it is characterized in that, described field module comprises:
Semantic primitive, for the implication according to each field in the regular expression determination structural data preset;
FU, for the field that the item to be counted determined according to the implication of each field with preset is relevant.
CN201410461656.3A 2014-09-11 2014-09-11 Website information statistic method and device Pending CN104252532A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410461656.3A CN104252532A (en) 2014-09-11 2014-09-11 Website information statistic method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410461656.3A CN104252532A (en) 2014-09-11 2014-09-11 Website information statistic method and device

Publications (1)

Publication Number Publication Date
CN104252532A true CN104252532A (en) 2014-12-31

Family

ID=52187422

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410461656.3A Pending CN104252532A (en) 2014-09-11 2014-09-11 Website information statistic method and device

Country Status (1)

Country Link
CN (1) CN104252532A (en)

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105577455A (en) * 2016-03-07 2016-05-11 达而观信息科技(上海)有限公司 Method and system for performing real-time UV statistic of massive logs
CN105630658A (en) * 2015-12-22 2016-06-01 北京奇虎科技有限公司 Data processing method and data processing device
CN106021554A (en) * 2016-05-30 2016-10-12 北京奇艺世纪科技有限公司 Log analysis method and device
CN106294427A (en) * 2015-05-26 2017-01-04 北大方正集团有限公司 Contribution statistical method and contribution statistical system
CN106528619A (en) * 2016-09-30 2017-03-22 国家电网公司 A key field-based switch log rapid aggregation method
CN106547686A (en) * 2016-10-10 2017-03-29 北京百度网讯科技有限公司 Product test method and device
CN106547470A (en) * 2015-09-16 2017-03-29 伊姆西公司 Daily record storage optimization method and equipment
CN107590169A (en) * 2017-04-14 2018-01-16 南方科技大学 A kind of preprocess method and system of carrier gateway data
CN108509444A (en) * 2017-02-24 2018-09-07 深圳市优朋普乐传媒发展有限公司 A kind of method and device of data processing
CN110109812A (en) * 2019-05-10 2019-08-09 广州英睿科技有限公司 Statistical method, device, computer equipment and the storage medium of access log data
CN112749223A (en) * 2021-01-28 2021-05-04 道和云科技(天津)有限公司 Interface log configuration and structured storage method and system

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561802A (en) * 2008-04-18 2009-10-21 上海复旦光华信息科技股份有限公司 Web page structural data extraction method and system
CN103001796A (en) * 2012-11-13 2013-03-27 北界创想(北京)软件有限公司 Method and device for processing weblog data by server
CN103377260A (en) * 2012-04-28 2013-10-30 阿里巴巴集团控股有限公司 Analysis method and device of URLs (Uniform Resource Locator) of weblog
CN103605738A (en) * 2013-11-19 2014-02-26 北京国双科技有限公司 Webpage access data statistical method and webpage access data statistical device

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101561802A (en) * 2008-04-18 2009-10-21 上海复旦光华信息科技股份有限公司 Web page structural data extraction method and system
CN103377260A (en) * 2012-04-28 2013-10-30 阿里巴巴集团控股有限公司 Analysis method and device of URLs (Uniform Resource Locator) of weblog
CN103001796A (en) * 2012-11-13 2013-03-27 北界创想(北京)软件有限公司 Method and device for processing weblog data by server
CN103605738A (en) * 2013-11-19 2014-02-26 北京国双科技有限公司 Webpage access data statistical method and webpage access data statistical device

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
GEERT JAN BEX ET AL.: ""Learning Deterministic Regular Expressions for the Inference of Schemas from XML Data"", 《ACM TRANSACTIONS ON THE WEB》 *

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294427A (en) * 2015-05-26 2017-01-04 北大方正集团有限公司 Contribution statistical method and contribution statistical system
CN106547470A (en) * 2015-09-16 2017-03-29 伊姆西公司 Daily record storage optimization method and equipment
CN106547470B (en) * 2015-09-16 2020-01-03 伊姆西公司 Log storage optimization method and device
CN105630658A (en) * 2015-12-22 2016-06-01 北京奇虎科技有限公司 Data processing method and data processing device
CN105630658B (en) * 2015-12-22 2018-10-09 北京奇虎科技有限公司 The method and device of data processing
CN105577455A (en) * 2016-03-07 2016-05-11 达而观信息科技(上海)有限公司 Method and system for performing real-time UV statistic of massive logs
CN106021554A (en) * 2016-05-30 2016-10-12 北京奇艺世纪科技有限公司 Log analysis method and device
CN106528619A (en) * 2016-09-30 2017-03-22 国家电网公司 A key field-based switch log rapid aggregation method
CN106547686A (en) * 2016-10-10 2017-03-29 北京百度网讯科技有限公司 Product test method and device
CN108509444A (en) * 2017-02-24 2018-09-07 深圳市优朋普乐传媒发展有限公司 A kind of method and device of data processing
CN107590169A (en) * 2017-04-14 2018-01-16 南方科技大学 A kind of preprocess method and system of carrier gateway data
CN107590169B (en) * 2017-04-14 2020-03-06 南方科技大学 Operator gateway data preprocessing method and system
CN110109812A (en) * 2019-05-10 2019-08-09 广州英睿科技有限公司 Statistical method, device, computer equipment and the storage medium of access log data
CN112749223A (en) * 2021-01-28 2021-05-04 道和云科技(天津)有限公司 Interface log configuration and structured storage method and system

Similar Documents

Publication Publication Date Title
CN104252532A (en) Website information statistic method and device
CN107895009A (en) One kind is based on distributed internet data acquisition method and system
CN110704411B (en) Knowledge graph building method and device suitable for art field and electronic equipment
CN102930059B (en) Method for designing focused crawler
CN103218431B (en) A kind ofly can identify the system that info web gathers automatically
CN105447184B (en) Information extraction method and device
CN105608134B (en) A kind of network crawler system and its web page crawl method based on multithreading
WO2015196907A1 (en) Search pushing method and device which mine user requirements
CN106126648B (en) It is a kind of based on the distributed merchandise news crawler method redo log
CN103605738A (en) Webpage access data statistical method and webpage access data statistical device
CN104462501A (en) Knowledge graph construction method and device based on structural data
CN104281622A (en) Information recommending method and information recommending device in social media
CN103530429B (en) Webpage content extracting method
CN106383887A (en) Environment-friendly news data acquisition and recommendation display method and system
CN104182506A (en) Log management method
CN102708174A (en) Method and device for displaying rich media information in browser
CN102710795A (en) Hotspot collecting method and device
CN104134108A (en) Sales data analysis method of electronic commerce website
CN104391978A (en) Method and device for storing and processing web pages of browsers
CN103729479A (en) Web page content statistical method and system based on distributed file storage
CN104850549A (en) Method for monitoring public opinions on Internet
CN104391706A (en) Reverse engineering based model base structuring method
CN103810283A (en) Microblog data acquisition method based on user correlation
CN104166545B (en) The sniff method and device of a kind of web page resources
US20160179901A1 (en) Computer-Implemented System And Method For Providing Selective Contextual Exposure Within Social Network Situations

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20141231