CN103200269A - Internet information statistical method and Internet information statistical system - Google Patents

Internet information statistical method and Internet information statistical system Download PDF

Info

Publication number
CN103200269A
CN103200269A CN2013101274926A CN201310127492A CN103200269A CN 103200269 A CN103200269 A CN 103200269A CN 2013101274926 A CN2013101274926 A CN 2013101274926A CN 201310127492 A CN201310127492 A CN 201310127492A CN 103200269 A CN103200269 A CN 103200269A
Authority
CN
China
Prior art keywords
data
domain name
statistics
internet information
professional
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2013101274926A
Other languages
Chinese (zh)
Inventor
余效伟
罗峰
黄苏支
李娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
IZP (BEIJING) TECHNOLOGIES Co Ltd
Original Assignee
IZP (BEIJING) TECHNOLOGIES Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by IZP (BEIJING) TECHNOLOGIES Co Ltd filed Critical IZP (BEIJING) TECHNOLOGIES Co Ltd
Priority to CN2013101274926A priority Critical patent/CN103200269A/en
Publication of CN103200269A publication Critical patent/CN103200269A/en
Pending legal-status Critical Current

Links

Images

Abstract

The invention discloses an Internet information statistical method and an Internet information statistical system. The Internet information statistical method includes the following steps. Firstly, according to service themes, network accessing data of users are divided into multiple service theme data sets through the MapReduce. Secondly, data comprised by each service theme data set undergo statistics according to different indexes and statistics data are stored. Thirdly, when a request for searching statistics information is received, according to a service theme which is required to be searched by the searching request, corresponding statistics data are obtained and fed back. According to the Internet information statistical method and the Internet information statistical system, the users can deeply know information such as accessing volume, accessing times, accessing users, searched key words and flow of all the searched key words of a certain industry, a certain website or some competing websites, rich statistics data can be accurately presented in different grain sizes and a high speed for different service systems and users so that complicated internal relations among network accessing data can be found and displayed and detailed and subjective data support can be provided for a decision-making department.

Description

Internet information statistical method and system
Technical field
The present invention relates to technical field of the computer network, relate in particular to a kind of internet information statistical method and system thereof.
Background technology
Alexa is the premier company that provides free Web Site Traffic Information in the Internet, is created in 1996, is devoted to develop the instrument that webpage grasps and website traffic calculates always.The Alexa rank is an index that is used for estimating a certain website visiting amount of often quoting at present.
The website world rankings of Alexa are mainly divided two kinds: overall ranking and classification rank.
Overall ranking is also applauded to rank, i.e. the specific ranking of website in all websites.The Alexa every three months is announced once new website overall ranking.The foundation of this rank is the geometrical mean that the user links number (Users Reach) and page browsing number (Page Views) accumulation in three months.
The classification rank, the one, by subject classification, such as news, amusement, shopping etc., Alexa provides the ranking of certain specific website in same class website.The 2nd, by the language classification, be divided into 20 kinds of language at present, such as English website, Chinese website [Chinese (simpl) and Chinese (trad)] etc., provide the ranking of particular station in all these speech like sound websites.
Alexa also can in overall ranking information, come to comment a grade to the website with " star " according to the comment of netizen to the website, is up to " 5 star ".The Baidu that website in the country's rank is the most forward has got " Samsung half ".Itself does not participate in rank the Alexa website, but Alexa gives 4 stars that are evaluated as of oneself.
But Alexa can only provide a rough ranking information, and more option and interface can't be provided.
Summary of the invention
The objective of the invention is to propose a kind of internet information statistical method and system thereof, make it possible to for different operation systems and user with different granularities, provide abundant network access data exactly at a high speed, for decision-making section provides full and accurate objective data support.
For reaching this purpose, the present invention by the following technical solutions:
A kind of internet information statistical method is characterized in that, comprising:
S1, according to professional theme the subscriber network access data are divided into a plurality of professional subject data collection by MapReduce, wherein, described subscriber network access data are user's outgoing access data;
S2, concentrate the data based different index comprise to add up to each professional subject data, the statistics of each professional theme is preserved;
S3, when receiving the statistical information query requests, the professional theme that will inquire about according to described query requests obtains corresponding statistics and returns.
Further, before step S1, also comprise:
Remove the invalid data in the network access data in advance.
Further, described invalid data comprises: the domain name data of suffix mistake, visit capacity is less than the domain name data of predetermined threshold value.
Further, when the identification invalid data, judge whether to belong to domain name data default in the black and white lists earlier, subordinate's domain name data for the domain name data in the white list and domain name data, all the time be considered as valid data, subordinate's domain name data for the domain name data in the blacklist and domain name data are considered as invalid data all the time.
Further, the form with service is that the statistics of each the professional theme among the step S2 provides access interface.
Further, utilize at least one professional submodule of described service-creation, wherein encapsulated the predefine computing at described statistics, and provide access interface with the form of service for this business submodule.
According to same design of the present invention, the present invention also provides a kind of internet information statistical system, comprising:
A kind of internet information statistical system is characterized in that, comprising:
The data split cells is divided into a plurality of professional subject data collection with the subscriber network access data by MapReduce according to professional theme, and wherein, described subscriber network access data are user's outgoing access data;
Data summarization unit concentrates the data based different index that comprises to add up to each professional subject data, and the statistics of each professional theme is preserved;
The data query unit, when receiving the statistical information query requests, the professional theme that will inquire about according to described query requests obtains corresponding statistics and returns.
Further, described system also comprises:
The data pretreatment unit is removed the invalid data in the network access data in advance.
Further, described invalid data comprises: the domain name data of suffix mistake, visit capacity is less than the domain name data of predetermined threshold value.
Further, when the identification invalid data, judge whether to belong to domain name data default in the black and white lists earlier, subordinate's domain name data for the domain name data in the white list and domain name data, all the time be considered as valid data, subordinate's domain name data for the domain name data in the blacklist and domain name data are considered as invalid data all the time.
Further, described system also comprises:
Data Mart is preserved the data that the data summarization unit statistics obtains.
Further, described system also comprises:
First service unit is that the statistics of each professional theme of obtaining of data summarization unit statistics provides access interface with the form of service.
Further, described system also comprises:
Second service unit, at least one the professional submodule of service-creation that utilizes described first service unit to provide has wherein encapsulated the predefine computing at described statistics, and provides access interface with the form of service for this business submodule.
The present invention can make things convenient for the user understand in depth some industries, some websites or some competition website visit capacity, access times, calling party, search the various information such as flow that keyword, each searching key word bring, and can be with different granularities, at a high speed present abundant statistics to excavate and to show internal relation between numerous and diverse network access data, for decision-making section provides full and accurate objective data support for different operation systems and user exactly.
Description of drawings
Fig. 1 is the specific embodiment of the invention one described internet information statistical method flow chart;
Fig. 2 is the specific embodiment of the invention two described internet information statistical system structured flowcharts;
Fig. 3 is the specific embodiment of the invention three described internet information statistical system structure charts of realizing based on distributed data processing framework Handoop.
Embodiment
Further specify technical scheme of the present invention below in conjunction with accompanying drawing and by embodiment.
Embodiment one
Fig. 1 is the described internet information statistical method of present embodiment flow chart, and as shown in Figure 1, the described internet information statistical method of present embodiment comprises:
S101, network access data is divided into a plurality of professional subject data collection.
In this step, according to professional theme network access data is divided into a plurality of professional subject data collection by MapReduce.Described network access data comprises that these mass data storage are in large-scale distributed storage system ODS be used to the whole network data on flows IMOS daily record data that carries out data analysis.Mass data is carried out at a high speed division handle that MapReduce data processor system just is good at, this data processor system can be divided into different data sets with lot of data by the distributed parallel calculation mode in a short period of time, therefore, the present invention adopts MapReduce mechanism to carry out the division work of network access data.In addition, in order to agree with the demand data of upper-layer service system, the present invention divides described network access data according to a plurality of professional themes, store data under each theme respectively with different subdata bases, thereby form the data set of a plurality of different business themes, for upper system provides data through Preliminary division.Wherein, described professional theme can include but not limited to: visit capacity, access time, web analytics information etc.After above-mentioned MapReduce processing, not only removed a large amount of repeating datas, and data have been carried out basic classification, greatly reduced the data total amount and made that the visit of the data of data summarization unit DWA is more efficient and convenient.
In addition; usually can comprise a large amount of invalid datas in the primitive network visit data of magnanimity; for target data being limited in the real data area that needs and be concerned about of user, before step S101, can also remove the invalid data in the primitive network visit data in advance.Can define the judgment criterion of invalid data according to real business demand, for example can be the domain name data of suffix mistake, and visit capacity is less than domain name data of predetermined threshold value etc.Further the exception treatment mechanism can also be set, sometimes do not wish to be filtered for some invalid data that meets above-mentioned judgment criterion, can keep by the mode that white list is set so, namely, when the identification invalid data, judge whether to belong to domain name data default in the black and white lists earlier, subordinate's domain name data for the domain name data in the white list and domain name data, all the time be considered as valid data, subordinate's domain name data for the domain name data in the blacklist and domain name data are considered as invalid data all the time.
。Reduce the data volume that actual needs is handled significantly by this pretreatment mode, thereby further promoted the treatment effeciency of data.
S102, the data that each professional subject data is concentrated are added up and are preserved.
In this step, concentrate the data based different index that comprises to add up to each professional subject data, the statistics of each professional theme is preserved keep supplying the use of layer service system then.In order further to improve data-handling efficiency, the data of pressing close to its demand as far as possible promptly are provided for the complex calculation of upper-layer service, among the present invention according to real business demand, according to different indexs each the professional subject data collection after handling through step S101 is added up, the result that statistics is obtained preserves, by processing so in advance, when the upper-layer service system needs these data, can directly visit, calculate at the scene of need not.And the index of selecting those operation systems as much as possible to need jointly when selecting statistical indicator can reduce double counting so in large quantities, improves utilance and the whole treatment effeciency of data.
In addition, described statistics can be saved among the Data Mart DM.Data Mart (Data Mart) abbreviates DM as, is a proprietary version of data warehouse (DW).Although a data warehouse federated database spreads all over whole enterprise, Data Mart is littler and concentrate on a specific department usually.Data Mart comprises at the certain professional pretreated data snapshot of bottom data process, is close to the demand of upper-layer service more.Data Mart makes a relevant database imitate the analytical capabilities of a multi-dimensional database, can realize the light visit to relevant information.
Further, for convenience the data of external system after to statistics among the step S2 conduct interviews, and can provide access interface for above-mentioned data with service manner, thereby provide fine-grained multiplexing approach for different operation systems.
Further, for the data of more pressing close to its demand being provided for different operation systems, simplify operation system to the occupation mode of above-mentioned statistics, can also utilize above-mentioned fine-grained service interface to create at least one professional submodule, encapsulate the predefine computing of pressing close to the upper-layer service system requirements more at described statistics in this business submodule, and provide access interface with the form of serving for this business submodule equally.Like this, the upper-layer service system can directly use the service interface of described professional submodule to realize some predefine computing to described statistics, has realized the coarseness of above-mentioned data multiplexingly, has simplified the occupation mode to described data.
S103, response query requests are returned statistics.
When receiving the statistical information query requests, the professional theme that will inquire about according to described query requests obtains corresponding statistics and returns.The varigrained service access interface that can utilize the front to provide when obtaining described statistics for operation system provides abundant reprocessing data, has satisfied the multiple business demand of user to network access data.
Embodiment two
According to same design of the present invention, the present invention also provides a kind of internet information statistical system, Fig. 2 is the described internet information statistical framework of present embodiment block diagram, and as shown in Figure 2, this system comprises: data split cells 201, data summarization unit 202 and data query unit 203.
Wherein, data split cells 201 is divided into a plurality of professional subject data collection with network access data by MapReduce according to professional theme.Described network access data comprises that these mass data storage are in large-scale distributed storage system ODS be used to the whole network data on flows IMOS daily record data that carries out data analysis.Mass data is carried out at a high speed division handle that MapReduce data processor system just is good at, this data processor system can be divided into different data sets with lot of data by the distributed parallel calculation mode in a short period of time, therefore, the present invention adopts MapReduce mechanism to carry out the division work of network access data.In addition, in order to agree with the demand data of upper-layer service system, the present invention divides described network access data according to a plurality of professional themes, store data under each theme respectively with different subdata bases, thereby form the data set of a plurality of different business themes, for upper system provides data through Preliminary division.Wherein, described professional theme can include but not limited to: visit capacity, access time, web analytics information etc.After above-mentioned MapReduce processing, not only removed a large amount of repeating datas, and data have been carried out basic classification, greatly reduce the data total amount and made data summarization unit more efficient and convenient to the visit of data.
In addition; usually can comprise a large amount of invalid datas in the primitive network visit data of magnanimity; for target data being limited in the real data area that needs and be concerned about of user, before this division operation, can also remove the invalid data in the primitive network visit data in advance.Can define the judgment criterion of invalid data according to real business demand, for example can be the domain name data of suffix mistake, and visit capacity is less than domain name data of predetermined threshold value etc.Further the exception treatment mechanism can also be set, sometimes do not wish to be filtered for some invalid data that meets above-mentioned judgment criterion, can keep by the mode that white list is set so, namely, when the identification invalid data, judge whether to belong to domain name data default in the black and white lists earlier, subordinate's domain name data for the domain name data in the white list and domain name data, all the time be considered as valid data, subordinate's domain name data for the domain name data in the blacklist and domain name data are considered as invalid data all the time.In a preferred implementation of present embodiment, described black and white lists is all stored top-level domain, subordinate's domain name data for the domain name data in the white list and domain name data, all be considered as valid data, subordinate's domain name data for the domain name data in the blacklist and domain name data all are considered as invalid data.Reduce the data volume that actual needs is handled significantly by this pretreatment mode, thereby further promoted the treatment effeciency of data.
Data summarization unit 202 concentrates the data based different index that comprises to add up to each professional subject data, the statistics of each professional theme is preserved keep supplying the use of layer service system then.In order further to improve data-handling efficiency, the data of pressing close to its demand as far as possible promptly are provided for the complex calculation of upper-layer service, among the present invention according to real business demand, according to different indexs each the professional subject data collection after handling through the data division unit is added up, the result that statistics is obtained preserves, by processing so in advance, when the upper-layer service system needs these data, can directly visit, calculate at the scene of need not.And the index of selecting those operation systems as much as possible to need jointly when selecting statistical indicator can reduce double counting so in large quantities, improves utilance and the whole treatment effeciency of data.
In addition, described system can further include Data Mart, and described statistics is saved among the Data Mart DM.Data Mart (Data Mart) abbreviates DM as, is a proprietary version of data warehouse (DW).Although a data warehouse federated database spreads all over whole enterprise, Data Mart is littler and concentrate on a specific department usually.Data Mart comprises at the certain professional pretreated data snapshot of bottom data process, is close to the demand of upper-layer service more.Data Mart makes a relevant database imitate the analytical capabilities of a multi-dimensional database, can realize the light visit to relevant information.
Further, data after external system is added up data summarization unit for convenience conduct interviews, described system can also comprise first service unit, provides access interface with service manner for above-mentioned data, thereby provides fine-grained multiplexing approach for different operation systems.
Further, for the data of more pressing close to its demand being provided for different operation systems, simplify operation system to the occupation mode of above-mentioned statistics, described system can also comprise second service unit, in order to creating at least one professional submodule with above-mentioned fine-grained service interface, encapsulate the predefine computing of pressing close to the upper-layer service system requirements more at described statistics in this business submodule, and provide access interface with the form of serving for this business submodule equally.Like this, the upper-layer service system can directly use the service interface of described professional submodule to realize some predefine computing to described statistics, has realized the coarseness of above-mentioned data multiplexingly, has simplified the occupation mode to described data.
Data query unit 203, when receiving the statistical information query requests, the professional theme that will inquire about according to described query requests obtains corresponding statistics and returns.The varigrained service access interface that can utilize the front to provide when obtaining described statistics for operation system provides abundant reprocessing data, has satisfied the multiple business demand of user to network access data.
Embodiment three
The present invention also provides the internet information statistical system that realizes based on distributed data processing framework Handoop, as shown in Figure 3, this system mainly comprises operation system 301, service layer 302, Data Mart (DM) 303, data warehouse (DW) 304, the distributed memory system (ODS) 305 on upper strata.Wherein, Data Mart DM realizes that based on HBASE data warehouse DW realizes that based on HIVE distributed memory system ODS realizes based on HDFS.
Next introduce its data handling procedure.At first, import storage system ODS from the outside with network access data IMOS, from ODS, data are extracted among the data warehouse DW by the mode of ETL then.The ETL full name is Extraction-Transformation-Loading, i.e. data extract, conversion and loading.The instrument that can realize ETL has: OWB (Oracle Warehouse Builder), ODI (Oracle Data Integrator), Informatic PowerCenter, AICloudETL, DataStage, Repository Explorer, Beeload, Kettle, DataSpider etc.
Described data warehouse DW also comprises two data processing units: data split cells (DWD) 3041, data summarization unit (DWA) 3042, wherein, in order to agree with the demand data of upper-layer service system, data split cells DWD divides described network access data according to a plurality of professional themes, store data under each theme respectively with different subdata bases, thereby form the data set of a plurality of different business themes, for data summarization unit DWA provides data through Preliminary division.Wherein, described division realizes based on MapReduce mechanism, this data processor system can be divided into different data sets with lot of data by the distributed parallel calculation mode in a short period of time, and mass data processing of the present invention MapReduce just is good at.In addition, described professional theme can include but not limited to: visit capacity, access time, web analytics information etc.After above-mentioned MapReduce processing, not only removed a large amount of repeating datas, and data have been carried out basic classification, greatly reduced the data total amount and made that the visit of the data of data summarization unit DWA is more efficient and convenient.
A bit can not be ignored in addition in actual applications; usually can comprise a large amount of invalid datas in the primitive network visit data of magnanimity; for target data being limited in the real data area that needs and be concerned about of user, before described division operation, can also remove the invalid data in the primitive network visit data in advance.Can define the judgment criterion of invalid data according to real business demand, for example can be the domain name data of suffix mistake, and visit capacity is less than domain name data of predetermined threshold value etc.Further the exception treatment mechanism can also be set, sometimes do not wish to be filtered for some invalid data that meets above-mentioned judgment criterion, can keep by the mode that white list is set so, that is, be considered as valid data all the time for the domain name data in the white list; Even and also wish sometimes to be filtered for some active data, can come forced filtration to fall by the mode that blacklist is set so, namely be considered as invalid data all the time for the domain name data in the blacklist.In a preferred implementation of present embodiment, described black and white lists is all stored top-level domain, and the subordinate's domain name data under the domain name in the white list all are considered as valid data, and the subordinate's domain name data under the domain name in the blacklist all are considered as invalid data.In a preferred implementation of present embodiment, described black and white lists is all stored top-level domain, subordinate's domain name data for the domain name data in the white list and domain name data, all be considered as valid data, subordinate's domain name data for the domain name data in the blacklist and domain name data all are considered as invalid data.
Reduce the data volume that actual needs is handled significantly by this pretreatment mode, thereby further promoted the treatment effeciency of data.
In addition, data summarization unit (DWA) 3042 concentrates the data based different index that comprises to add up to each professional subject data, the statistics of each professional theme is preserved keep supplying the use of layer service system then.In order further to improve data-handling efficiency, the data of pressing close to its demand as far as possible promptly are provided for the complex calculation of upper-layer service, among the present invention according to real business demand, according to different indexs each professional subject data collection of dividing after handling through data split cells (DWD) 3041 is added up, the result that statistics is obtained preserves, by processing so in advance, when the upper-layer service system needs these data, can directly visit, calculate at the scene of need not.And the index of selecting those operation systems as much as possible to need jointly when selecting statistical indicator can reduce double counting so in large quantities, improves utilance and the whole treatment effeciency of data.
After the data statistics of data summarization unit DWA is finished, described statistics can be saved among the Data Mart DM.Data Mart DM (Data Mart) abbreviates DM as, is a proprietary version of data warehouse (DW).Although a data warehouse federated database spreads all over whole enterprise, Data Mart is littler and concentrate on a specific department usually.Data Mart comprises at the certain professional pretreated data snapshot of bottom data process, is close to the demand of upper-layer service more.Data Mart makes a relevant database imitate the analytical capabilities of a multi-dimensional database, can realize the light visit to relevant information.
Further, data after external system is added up data summarization unit DWA for convenience conduct interviews, can on DM, increase a fine granularity service layer 3021, provide access interface with service manner for above-mentioned data, thereby provide fine-grained multiplexing approach for different operation systems.
Further, provide the data of more pressing close to its demand in order to give different operation systems, simplify operation system to the occupation mode of above-mentioned statistics, can also on above-mentioned fine granularity service layer, increase a coarse granularity services layer 3022.In service layer 302, utilize above-mentioned fine-grained service interface to create at least one professional submodule, encapsulate the predefine computing of pressing close to the upper-layer service system requirements more at described statistics in this business submodule, and provide access interface with the form of serving for this business submodule equally.Like this, the upper-layer service system can directly use the service interface of described professional submodule to realize some predefine computing to described statistics, has realized the coarseness of above-mentioned data multiplexingly, has simplified the occupation mode to described data.
When receiving the query requests of operation system 301, Data Mart DM returns its desired data according to its query demand.The fine granularity service layer of service layer 3021 that provides by means of the front can visit the data that Data Mart is provided with different granularities respectively with coarse granularity services layer 3022, thereby for operation system provides abundant reprocessing data, satisfied the multiple business demand of user to network access data.
The present invention can make things convenient for the user understand in depth some industries, some websites or some competition website visit capacity, access times, calling party, search the various information such as flow that keyword, each searching key word bring, and can be with different granularities, at a high speed present abundant statistics to excavate and to show internal relation between numerous and diverse network access data, for decision-making section provides full and accurate objective data support for different operation systems and user exactly.
All or part of content in the technical scheme that above embodiment provides can realize that its software program is stored in the storage medium that can read by software programming, storage medium for example: the hard disk in the computer, CD or floppy disk.
The above only is preferred embodiment of the present invention, and is in order to limit the present invention, within the spirit and principles in the present invention not all, any modification of doing, is equal to replacement, improvement etc., all should be included within protection scope of the present invention.

Claims (10)

1. an internet information statistical method is characterized in that, comprising:
S1, according to professional theme the subscriber network access data are divided into a plurality of professional subject data collection by MapReduce, wherein, described subscriber network access data are user's outgoing access data;
S2, concentrate the data based different index comprise to add up to each professional subject data, the statistics of each professional theme is preserved;
S3, when receiving the statistical information query requests, the professional theme that will inquire about according to described query requests obtains corresponding statistics and returns.
2. internet information statistical method as claimed in claim 1 is characterized in that, also comprises before described step S1: remove the invalid data in the network access data in advance; Described invalid data comprises: the domain name data of suffix mistake, visit capacity is less than the domain name data of predetermined threshold value.
3. internet information statistical method as claimed in claim 2, it is characterized in that, when the identification invalid data, judge whether to belong to domain name data default in the black and white lists earlier, subordinate's domain name data for the domain name data in the white list and domain name data, all the time be considered as valid data, subordinate's domain name data for the domain name data in the blacklist and domain name data are considered as invalid data all the time.
4. internet information statistical method as claimed in claim 1 is characterized in that, the described statistics with each professional theme among the described step S2 is preserved and is specially: the statistics of described each professional theme is saved among the Data Mart DM.
5. internet information statistical method as claimed in claim 1 is characterized in that, provides access interface with the form of service for the statistics of each the professional theme among the described step S2.
6. internet information statistical method as claimed in claim 5, it is characterized in that, utilize at least one professional submodule of described service-creation, wherein encapsulated the predefine computing at described statistics, and provide access interface with the form of service for this business submodule.
7. an internet information statistical system is characterized in that, comprising:
The data split cells is divided into a plurality of professional subject data collection with the subscriber network access data by MapReduce according to professional theme, and wherein, described subscriber network access data are user's outgoing access data; Data summarization unit concentrates the data based different index that comprises to add up to each professional subject data, and the statistics of each professional theme is preserved;
The data query unit, when receiving the statistical information query requests, the professional theme that will inquire about according to described query requests obtains corresponding statistics and returns.
8. internet information statistical system as claimed in claim 7 is characterized in that also comprising:
The data pretreatment unit is removed the invalid data in the network access data in advance;
Described invalid data comprises: the domain name data of suffix mistake, visit capacity is less than the domain name data of predetermined threshold value.
9. internet information statistical system as claimed in claim 8, it is characterized in that, when the identification invalid data, judge whether to belong to domain name data default in the black and white lists earlier, subordinate's domain name data for the domain name data in the white list and domain name data, all the time be considered as valid data, subordinate's domain name data for the domain name data in the blacklist and domain name data are considered as invalid data all the time.
10. internet information statistical system as claimed in claim 7 is characterized in that also comprising Data Mart, is used for preserving the data that the data summarization unit statistics obtains.
CN2013101274926A 2013-04-12 2013-04-12 Internet information statistical method and Internet information statistical system Pending CN103200269A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2013101274926A CN103200269A (en) 2013-04-12 2013-04-12 Internet information statistical method and Internet information statistical system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2013101274926A CN103200269A (en) 2013-04-12 2013-04-12 Internet information statistical method and Internet information statistical system

Publications (1)

Publication Number Publication Date
CN103200269A true CN103200269A (en) 2013-07-10

Family

ID=48722624

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2013101274926A Pending CN103200269A (en) 2013-04-12 2013-04-12 Internet information statistical method and Internet information statistical system

Country Status (1)

Country Link
CN (1) CN103200269A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301182A (en) * 2014-10-22 2015-01-21 赛尔网络有限公司 Method and device for inquiring slow website access abnormal information
CN105897695A (en) * 2016-03-25 2016-08-24 努比亚技术有限公司 Website white list selection method, terminal, and server
CN106021486A (en) * 2016-05-18 2016-10-12 广东源恒软件科技有限公司 Big data-based data multidimensional analyzing and processing method
CN106897362A (en) * 2017-01-11 2017-06-27 中国建设银行股份有限公司 For data storage, the method and system of inquiry
CN108446301A (en) * 2018-01-26 2018-08-24 阿里巴巴集团控股有限公司 Service scripts splits method of summary, device and equipment
CN110109955A (en) * 2019-03-15 2019-08-09 平安科技(深圳)有限公司 Data call amount statistical method, system, computer installation and readable storage medium storing program for executing
CN110427438A (en) * 2019-07-30 2019-11-08 中国工商银行股份有限公司 Data processing method and its device, electronic equipment and medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159592A (en) * 2007-08-10 2008-04-09 北大方正集团有限公司 Statistical method and device of internet data information clicking rates
CN102111453A (en) * 2011-03-04 2011-06-29 创博亚太科技(山东)有限公司 Method and system for extracting Internet user network behaviors
CN102289447A (en) * 2011-06-16 2011-12-21 北京亿赞普网络技术有限公司 Website webpage evaluation system based on communication network message
CN102354315A (en) * 2011-09-22 2012-02-15 奇智软件(北京)有限公司 Generation method of site navigation page and device thereof

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101159592A (en) * 2007-08-10 2008-04-09 北大方正集团有限公司 Statistical method and device of internet data information clicking rates
CN102111453A (en) * 2011-03-04 2011-06-29 创博亚太科技(山东)有限公司 Method and system for extracting Internet user network behaviors
CN102289447A (en) * 2011-06-16 2011-12-21 北京亿赞普网络技术有限公司 Website webpage evaluation system based on communication network message
CN102354315A (en) * 2011-09-22 2012-02-15 奇智软件(北京)有限公司 Generation method of site navigation page and device thereof

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104301182A (en) * 2014-10-22 2015-01-21 赛尔网络有限公司 Method and device for inquiring slow website access abnormal information
CN104301182B (en) * 2014-10-22 2018-09-11 赛尔网络有限公司 A kind of querying method and device of the exception information of website visiting at a slow speed
CN105897695A (en) * 2016-03-25 2016-08-24 努比亚技术有限公司 Website white list selection method, terminal, and server
CN106021486A (en) * 2016-05-18 2016-10-12 广东源恒软件科技有限公司 Big data-based data multidimensional analyzing and processing method
CN106897362A (en) * 2017-01-11 2017-06-27 中国建设银行股份有限公司 For data storage, the method and system of inquiry
CN108446301A (en) * 2018-01-26 2018-08-24 阿里巴巴集团控股有限公司 Service scripts splits method of summary, device and equipment
CN108446301B (en) * 2018-01-26 2021-10-29 创新先进技术有限公司 Business file splitting and summarizing method, device and equipment
CN110109955A (en) * 2019-03-15 2019-08-09 平安科技(深圳)有限公司 Data call amount statistical method, system, computer installation and readable storage medium storing program for executing
CN110427438A (en) * 2019-07-30 2019-11-08 中国工商银行股份有限公司 Data processing method and its device, electronic equipment and medium

Similar Documents

Publication Publication Date Title
CN103200269A (en) Internet information statistical method and Internet information statistical system
CN108304444B (en) Information query method and device
CN100541495C (en) A kind of searching method of individual searching engine
CN106528787B (en) query method and device based on multidimensional analysis of mass data
CN102436513B (en) Distributed search method and system
CN103620601A (en) Joining tables in a mapreduce procedure
CN111708740A (en) Mass search query log calculation analysis system based on cloud platform
CN102667761A (en) Scalable cluster database
CN103838867A (en) Log processing method and device
CN108875042B (en) Hybrid online analysis processing system and data query method
CN103390038A (en) HBase-based incremental index creation and retrieval method
JP2019204472A (en) Method for reading plurality of small files of 2 mb or smaller from hdfs having data merge module and hbase cash module on the basis of hadoop
CN103294712A (en) System and method for recommending hot spot area in real time
CN108509437A (en) A kind of ElasticSearch inquiries accelerated method
CN107943952A (en) A kind of implementation method that full-text search is carried out based on Spark frames
WO2013106595A2 (en) Processing store visiting data
CN104298785A (en) Searching method for public searching resources
CN103186666A (en) Method, device and equipment for searching based on favorites
KR20150018880A (en) Information aggregation, classification and display method and system
CN103455335A (en) Multilevel classification Web implementation method
CN113609374A (en) Data processing method, device and equipment based on content push and storage medium
CN105518644A (en) Method for processing and displaying real-time social data on map
CN103853838A (en) Data processing method and device
CN111026709A (en) Data processing method and device based on cluster access
CN103870510B (en) A kind of social networks good friend's filter method based on distributed variable-frequencypump pattern

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130710