CN104615765A - Data processing method and data processing device for browsing internet records of mobile subscribers - Google Patents

Data processing method and data processing device for browsing internet records of mobile subscribers Download PDF

Info

Publication number
CN104615765A
CN104615765A CN201510080977.3A CN201510080977A CN104615765A CN 104615765 A CN104615765 A CN 104615765A CN 201510080977 A CN201510080977 A CN 201510080977A CN 104615765 A CN104615765 A CN 104615765A
Authority
CN
China
Prior art keywords
data
key
value
internet records
type
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201510080977.3A
Other languages
Chinese (zh)
Inventor
尹为强
罗云彬
赵锡成
王伟华
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China United Network Communications Group Co Ltd
Original Assignee
China United Network Communications Group Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China United Network Communications Group Co Ltd filed Critical China United Network Communications Group Co Ltd
Priority to CN201510080977.3A priority Critical patent/CN104615765A/en
Publication of CN104615765A publication Critical patent/CN104615765A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/21Design, administration or maintenance of databases
    • G06F16/215Improving data quality; Data cleansing, e.g. de-duplication, removing invalid entries or correcting typographical errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/10File systems; File servers
    • G06F16/18File system types
    • G06F16/182Distributed file systems

Abstract

The invention provides a data processing method and a data processing device for browsing internet records of mobile subscribers. The method comprises the following steps of (S110) establishing N data according to each browsing internet record in a preset time quantum; and (S120) accumulating statistical information in the data with the same identifications to obtain accumulated values of the statistical information corresponding to the identifications. Each datum comprises an identification and statistical information, each identification comprises a cell-phone number, a data type and a record type in the corresponding browsing internet record, the data are in N types, the N types of the data correspond to N preset fields in the browsing internet records one by one, the data types of the N data established by each browsing internet record are different, the record type of each datum is a value of the corresponding preset field, which corresponds to the corresponding data type in the datum, in the datum, and N is a positive integer greater than 1. By the data processing method and the data processing device for the browsing internet records of the mobile subscribers, the processing speed of mobile internet data can be increased.

Description

A kind of data processing method of mobile subscriber's internet records and device
Technical field
The present invention relates to the communications field, be specifically related to a kind of data processing method and device of mobile subscriber's internet records.
Background technology
Mobile terminal is got online without being tethered to a cable, and what produce every day is hundreds of millions of, takies the mass data of storage space TB magnitude, the data of month especially trillion, the mass data record of PB magnitude.Can excavate various useful information from this high-volume database, be such as basic dimensions with subscriber phone number, can carry out flow information analytic statistics respectively according to network type, type of service and mobile base station etc. three classification dimension.Use distributed computing framework MapReduce to calculate, the data of demand can be extracted fast from mass data.
Current is basic dimensions to solve according to subscriber phone number, the problem of flow information analytic statistics is carried out according to network type, type of service and three, mobile base station classification dimension, main use MapReduce distributed computing framework carries out data analysis extraction, need to write MapReduce task code respectively for difference statistics dimension, then carry out the execution successively of task Job.The implementation of existing scheme is described below:
(1) according to the statement of requirements of three statistics dimensions, three are write independently based on the Job program of MapReduce Computational frame;
(2) according to three tasks demand separately, the key-value pair (Key-Value) of setting Mapper and Reducer;
(3) because internet records large data sets group load pressure is large, so a Job task can only be run at every turn, first run and carry out the task Job of adding up according to network type;
(4) a upper task Job complete after, run and carry out the task Job of adding up according to type of service;
(5) a upper task Job is complete, finally runs and carries out the task Job of adding up according to mobile base station;
(6) all task Job end of runs, Output rusults copies this locality to so that subsequent data analysis uses from HDFS file system.
Because large data sets group performance limitations causes running a task Job at every turn, so three different Job can only perform successively.Need to spend the plenty of time, and during multiple tasks carrying, a large amount of intermediate data can be produced.
Summary of the invention
The technical problem to be solved in the present invention how to accelerate the process of mobile Internet access data.
In order to solve the problem, the invention provides a kind of data processing method of mobile subscriber's internet records, comprising:
S110, respectively according to each internet records structure N bar data in predetermined amount of time, data comprise mark and statistical information; Described mark comprises cell-phone number, data type and record type in described internet records; Described data type is N kind, and with the N number of predetermined field one_to_one corresponding in described internet records, the data type of the N bar data constructed by an internet records is different; The predetermined field value in described internet records of described record type corresponding to data type described in these data; N be greater than 1 positive integer;
S120, the statistical information in the identical data of mark to be added up, obtain the accumulated value of statistical information corresponding to each mark.
Alternatively, described N equals 3; N number of predetermined field described in N kind corresponding to data type is that VPN, BUSI TYPE and LAC adds CELL ID.
Alternatively, described statistical information comprises flow, duration and number of clicks; Described flow is the value of UP field and DOWN field in described internet records, and described duration is the value of time field in described internet records, and described number of clicks is 1.
Alternatively, described step S110 comprises:
Each internet records in the predetermined amount of time read from Hadoop file system HDFS is processed into N bar key-value pair data, and wherein key Key comprises phone number, data type and record type; Value Value comprises the value of DOWN in internet records, UP, time, and number of clicks 1;
Step S120 comprises:
Key-value pair data identical for Key is respectively merged into a key-value pair data, the Key of the key-value pair data after merging is constant, and Value is the result that the value of UP, DOWN, time, number of clicks in the key-value pair data participating in merging carries out adding up respectively.
Alternatively, also comprise after step S120:
In S130, the key-value pair data that obtains in step S120, cell-phone number is the identical and key-value pair data that data type is identical is respectively merged into a key-value pair data, the Key of the key-value pair data after merging is cell-phone number, and Value is the union of the Value participating in each key-value pair data merged.
Present invention also offers a kind of data processing equipment of mobile subscriber's internet records, comprising:
Data configuration module, for constructing N bar data according to each internet records in predetermined amount of time respectively, data comprise mark and statistical information; Described mark comprises cell-phone number, data type and record type in described internet records; Described data type is N kind, and with the N number of predetermined field one_to_one corresponding in described internet records, the data type of the N bar data constructed by an internet records is different; The predetermined field value in described internet records of described record type corresponding to data type described in these data; N be greater than 1 positive integer;
Accumulator module, for adding up to the statistical information identified in identical data, obtains the accumulated value of statistical information corresponding to each mark.
Alternatively, described N equals 3; N number of predetermined field described in N kind corresponding to data type is that VPN, BUSI TYPE and LAC adds CELL ID.
Alternatively, described statistical information comprises flow, duration and number of clicks; Described flow is the value of UP field and DOWN field in described internet records, and described duration is the value of time field in described internet records, and described number of clicks is 1.
Alternatively, described data configuration module refers to according to each internet records structure N bar data in predetermined amount of time respectively:
Each internet records in the predetermined amount of time read from Hadoop file system HDFS is processed into N bar key-value pair data by described data configuration module, and wherein key Key comprises phone number, data type and record type; Value Value comprises the value of DOWN in internet records, UP, time, and number of clicks 1;
Described accumulator module carries out cumulative referring to the statistical information identified in identical data:
Key-value pair data identical for Key is respectively merged into a key-value pair data by described accumulator module, the Key of the key-value pair data after merging is constant, and Value is the result that the value of UP, DOWN, time, number of clicks in the key-value pair data participating in merging carries out adding up respectively.
Alternatively, described device also comprises:
Merge module, for in the key-value pair data that obtains in accumulator module, cell-phone number is the identical and key-value pair data that data type is identical is respectively merged into a key-value pair data, and the Key of the key-value pair data after merging is cell-phone number, and Value is the union of the Value participating in each key-value pair data merged.
The present invention is improved by design, can by multiple statistics task demand concentrative implementation, can the statistics task of the multiple dimension of disposable process; Such data only need the statistics task be loaded into, traversal once just can complete multiple dimension, instead of must carry out separately being loaded into and traveling through for the statistics task of each dimension, therefore greatly reduce total duration of data processing.Adopt the program, execution time long problem can be avoided, accelerate task analysis process speed.
Accompanying drawing explanation
Fig. 1 is the schematic flow sheet of the data processing method of embodiment one;
Fig. 2 is the schematic diagram that the example of embodiment one adopts existing method process;
Fig. 3 is the schematic diagram that the example of embodiment one adopts embodiment one to process;
Fig. 4 is the process flow diagram of the data processing method of the example of embodiment one.
Embodiment
Below in conjunction with drawings and Examples, technical scheme of the present invention is described in detail.
It should be noted that, if do not conflicted, each feature in the embodiment of the present invention and embodiment can be combined with each other, all within protection scope of the present invention.In addition, although show logical order in flow charts, in some cases, can be different from the step shown or described by order execution herein.
The data processing method of embodiment one, a kind of mobile subscriber's internet records, as shown in Figure 1, comprising:
S110, respectively according to each internet records structure N bar data in predetermined amount of time, data comprise mark and statistical information; Described mark comprises cell-phone number, data type and record type in described internet records; Described data type is N kind, and with the N number of predetermined field one_to_one corresponding in described internet records, the data type of the N bar data constructed by an internet records is different; The predetermined field value in described internet records of described record type corresponding to data type described in these data; N be greater than 1 positive integer;
S120, the statistical information in the identical data of mark to be added up, obtain the accumulated value of statistical information corresponding to each mark.
In the present embodiment, in the N bar data constructed by an internet records, statistical information is all mutually the same, and different data types represents different statistics dimensions, that is: these N bar data will be respectively used to carry out statistical information under different dimensions and add up.
Mobile subscriber's internet records content is as shown in Table 1:
Table one, mobile subscriber's internet records content
As shown in Table 1, in mobile subscriber's internet records contents table, every bar record field number is greater than 10, respectively: MSIDDN| time DDHHMMSS|LAC|CELL ID|BUSI TYPE|UP|DOWN|VPN|URL|....Every bar internet records comprises information: phone number (corresponding to MSIDDN field), surf time (corresponding to time DDHHMMSS field), lane place coding LAC (corresponding to LAC field) of base station cell, base station cell mark (corresponding to CELL id field), business of networking type coding (corresponding to BUSI TYPE field), uplink traffic is (corresponding to UP field, in units of Byte), downlink traffic is (corresponding to DOWN field, in units of Byte), network type, can be the mark (corresponding to VPN field) of virtual private network, Web site Uniform Resource finger URL URL (corresponding to url field) and other information.
In an embodiment of the present embodiment, described N equals 3; N number of predetermined field described in N kind corresponding to data type can be, but not limited to as VPN, BUSI TYPE and LAC adds CELL ID; In present embodiment, statistics dimension corresponding to 3 kinds of data types is respectively network type, type of service and base station cell; When such as described data type is " 1 ", corresponding field is VPN, and corresponding statistics dimension is " network type "; Field corresponding time " 2 " is BUSI TYPE, and corresponding statistics dimension is " type of service "; For corresponding field time " 3 " is that LAC adds CELL ID, corresponding statistics dimension is " base station cell ".When data type is " 1 ", described record type is the value of VPN field in internet records; When data type is " 2 ", described record type is the value of BUSI TYPE field in internet records; When data type is " 3 ", described record type is the combination that in internet records, LAC adds the value of CELLID field.
In an embodiment of the present embodiment, described statistical information includes but not limited to flow, duration and number of clicks; Described flow is the value of UP field and DOWN field in described internet records, and described duration is the value of time field in described internet records, and described number of clicks is 1.
In an embodiment of the present embodiment, MapReduce is adopted to carry out said method;
Described step S110 correspond to the MAP stage, described data are key-value pair data, described in be designated key Key, described statistical information for value; Described step S110 specifically comprises:
Each internet records in the predetermined amount of time read from Hadoop file system HDFS is processed into N bar key-value pair data, and wherein key Key comprises phone number (value of the MSISDN field namely in internet records), data type DATA_TYPE and record type RECORD_TYPE; Value VALUE comprises DOWN in internet records, UP, time and number of clicks 1.
For N value for 3,3 key-value pair data are as follows respectively:
Article 1, key-value pair data (the statistics dimension corresponding to network type):
Key:MSISDN|1|RECORD_TYPE
VALUE:UP|DOWN|…
For table one Article 1 internet records, the Article 1 key-value pair data obtained is:
Key:1860110xxxx|1|2
VALUE:2|311|651|1
Article 2 key-value pair data (the statistics dimension corresponding to type of service):
Key:MSISDN|2|RECORD_TYPE
VALUE:UP|DOWN|…
For table one Article 1 internet records, the Article 2 key-value pair data obtained is:
Key:1860110xxxx|2|912
VALUE:912|311|651|1
Article 3 key-value pair data (the statistics dimension corresponding to base station cell):
Key:MSISDN|3|RECORD_TYPE
VALUE:UP|DOWN|…
For table one Article 1 internet records, the Article 3 key-value pair data obtained is:
Key:1860110xxxx|3|4310_50036
VALUE:4310_50036|311|651|1
Above-mentioned key-value pair (Key-VALUE) data record outputs to the follow-up COMBINE stage.
Described step S120 corresponds to the COMBINE stage, and described step S120 specifically comprises:
Key-value pair data identical for Key is respectively merged into a key-value pair data, the Key of the key-value pair data after merging is constant, and VALUE is the result that the value of UP, DOWN, time, number of clicks in the key-value pair data participating in merging carries out adding up respectively.
Be such as many key-value pair data of 1860110xxxx|1|2 for Key, in these many data cumulative, the value of UP, obtains the UP merged; By that analogy, the number of clicks obtaining the DOWN of merging, the time of merging and merge; Using merge UP/DOWN/ time/number of clicks as VALUE, take still 1860110xxxx|1|2 as Key, obtain merge key-value pair data.
The key-value pair data obtained after step S120 completes is the accumulation result in often kind of situation of each statistics dimension; Be such as " 1 " (network type for statistics dimension) for DATA_TYPE, VPN is 1, be 2, situation when being 6 is respectively merged into a key-value pair data (that is, obtaining the key-value pair data of three merging when DATA_TYPE be " 1 "); By that analogy, be " 2 " (type of service is statistics dimension) for DATA_TYPE, situation when BUSI TYPE is different value is respectively merged into a key-value pair data; Be " 3 " (base station cell for statistics dimension) for DATA_TYPE, LAC add CELL ID be combined as different value time situation be respectively merged into a key-value pair data.
In present embodiment, also comprise after described step S120:
In S130, the key-value pair data that obtains in step S120, cell-phone number is the identical and key-value pair data that data type is identical is respectively merged into a key-value pair data, the Key of the key-value pair data after merging is cell-phone number, and VALUE is the union of the VALUE participating in each key-value pair data merged.
Step S130 corresponds to the REDUCE stage; Need in step before to increase Partitioner stage and Grouping stage, the key-value pair data of different types of data is mutually exclusive routed in different REDUCE by the Partitioner stage, in the key-value pair data that the COMBINE stage exports by the Grouping stage, the key-value pair data that cell-phone number is identical with data type inputs to a REDUCE.
For N=3, for each phone number, the key-value pair data that step S130 obtains comprises three, and correspond respectively to three statistics dimensions, Key is cell-phone number; Article three, the VALUE of key-value pair data is as follows:
Article 1 (statistics dimension is network type):
Value:2G network traffics | 2G network duration | 2G network click number of times | 3G network flow | 3G network duration | 3G network number of clicks | 4G network traffics | 4G network duration | 4G network click number of times;
Article 2 (statistics dimension is type of service):
Value: web page browsing flow | web page browsing duration | web page browsing number of clicks | video flow | video duration | video number of clicks | receiving and dispatching mail flow | receiving and dispatching mail duration | receiving and dispatching mail number of clicks |
Article 3 (statistics dimension is base station cell):
Value:LAC|CELL_ID| flow | duration | number of clicks |
Describing the present embodiment in detail below in conjunction with example and prior art carries out the implementation of data processing to mobile subscriber's internet records, by comparing the difference of two kinds of methods, drawing the advantage of the present embodiment.
This example carries out surfing flow data statistics from three dimensions, respectively:
(1) according to different network type dimension
Data on flows statistics under this dimension adds up mobile subscriber under the heterogeneous networks such as 2G, 3G, 4G, the information such as one flow, duration and the number of clicks of individual month.
(2) according to different service types dimension
Data on flows statistics under this dimension adds up mobile subscriber under web page browsing, seeing the different business such as video, receiving and dispatching mail, P2P, the information such as one flow, duration and the number of clicks of individual month.
(3) according to different mobile base stations dimension
Data on flows statistics under this dimension adds up mobile subscriber under different base station community, the information such as one flow, duration and the number of clicks of individual month.
From the demand of above-mentioned three statistics dimensions, be all each mobile subscriber of statistics month data, each dimension is the information such as statistic flow, duration and number of clicks.Difference is that the group technology of data is according to different dimensions.
In order to the advantage of clearly the present embodiment, here the internet records of first associative list one does a detailed description to existing data processing method, existing method is, according to above-mentioned three dimensions, data processing is divided into three statistics tasks, then performs this three statistics tasks successively; These three statistics tasks are described below:
(1) according to the task description of different network type statistics:
The network type of mobile Internet access comprises 2G, 3G and 4G network, and in internet records, VPN represents network type exactly as shown in Table 1.VPN 1,2 and 6 represents 3G, 2G and 4G network type respectively.This task is the information such as surfing flow, online duration and the number of clicks of statistics each phone number one month under three kinds of different network types.Based on the Data Analysis Model of MapReduce, need to realize Mapper class and Reducer class, in Mapper class, carry out key-value pair structure, in Reducer class, carry out data merging export with final data.In the middle of Map and Reduce, add a data record merge Combiner class, transmit to reduce data in task run process and greatly reduce the data record number outputting to Reducer class.Tasks carrying schematic diagram is illustrated in fig. 2 shown below.
The data input and output of this task as shown in Table 2.
Table two, data on flows statistics task inputoutput data based on different network type
As shown in Table 2, for every bar internet records, all converting key to is: phone number, is worth to be: network type | flow | and duration | the key-value pair data of number of clicks, wherein network type is the value of the VPN in table one, and number of clicks is 1 herein.In the Combine stage, be for identical phone number, many records of identical network type carry out merging and export, and reduce the transmission of data in cluster network, greatly reduce the data record number being transmitted to the Reduce stage simultaneously.In the last Reduce stage, for the record of identical phone number, carry out data accumulation according to network type 2G, 3G and 4G and gather, finally export data.
(2) according to the task description of different service types statistics:
Mobile Internet access record is different according to the business of access, can be divided into web page browsing, receiving and dispatching mail, viewing video, P2P business etc. according to large class.This sorting technique is classified according to the operation code in table one represented by BUSI TYPE field, and operation code is all three position digital codings, then large class is exactly classify according to operation code hundred figure place.Such as operation code is 203 and 204, and its hundred bit digital is all 2, then the large class of its business is identical.
Map, Combine of this task are identical with Fig. 2 with Reduce implementation, then its data input and output are as shown in following table three.
Table three, data on flows statistics task inputoutput data based on different service types
As shown in Table 3, for every bar internet records, all converting key to is: phone number, is worth to be: type of service | flow | and duration | the key-value pair data of number of clicks.Wherein type of service is hundred bit digital of BUSITYPE in table one, and number of clicks is 1.Follow-up Combine and the Reduce stage and the data on flows statistics task based on different network type similar.
(3) according to the task description of different base station Cell statistical:
Comprise base station cell information in mobile Internet access record, in table one, LAC and the CELL ID of internet records is exactly the cell ID of mobile base station, and this mark uniquely can determine a base station cell.This task statistics phone number month flow information in each base station cell, finally carries out flow descending sort to all communities of this phone number again, provides serial number information.
Map, Combine and Reduce implementation of this task as shown in Figure 2, its data input and output as shown in Table 4:
Table three, data on flows statistics task inputoutput data based on different base station community
As shown in Table 4, for every bar internet records, all converting key to is: phone number, is worth to be: LAC|CELL_ID| flow | duration | and the key-value pair data of number of clicks.Wherein LAC and CELL_ID is the unique identification of base station cell, and number of clicks is 1.Follow-up Combine and Reduce process and above two dimension task class are seemingly, difference finally exports data in the Reduce stage, need to carry out descending sort to the data on flows of all base station cells of each phone number, to obtain sequence serial number data, last output information comprises sequence serial number data.
In the method for the present embodiment, then the task of three of statistic flow information dimensions is synthesized a task, so only need to write a task Job source code.Partitioner and the Grouping class in MapReduce distributed computing framework is utilized to carry out route and the classified statistics analysis of data record.Due to the statistics of disposable execution three dimensions in a task, so need to carry out class indication to each dimension data.Namely need a specific data field to identify data and belong to which dimension.DATA_TYPE can be used herein to carry out the data type mark of three different dimensions.
Owing to the addition of Partitioner and Grouping class in basic MapReduce tasks carrying process, so the implementation of the present embodiment data processing and the implementation of Fig. 2 are different, the implementation of the present embodiment as shown in Figure 3.
As above shown in Fig. 3, multiple task is placed on to the Job performed in a task, must with Partitioner and Grouping, to carry out data route and grouping, avoid multiple task output data to break up to mix, for follow-up data operation makes troubles.Wherein Partitioner class major function is that every bar data record is routed to specific Reducer, and then task Output rusults outputs to specific file " part-r-***** ", this part-r-***** is the result output file that MapReduce tasks carrying terminates, a corresponding part-r-***** file of Reducer.Grouping class major function carries out the grouping of data record, and for same Reducer, the data record being positioned at a grouping can import Reducer class into as same batch data, and is positioned at the data record of different grouping, can import Reducer class in batches.
First Multitask Data classification is carried out.
Because this task needs the statistics task of disposable operation three dimensions, so need to classify to the data of three dimensions, use DATA_TYPE carry out Data classification, Data classification as shown in Table 5:
Table five, Data classification table
Data dimension DATA_TYPE
By different network type 1
By different service types 2
By different base station community 3
And need to classify to the data implication of every bar record, to merge the foundation gathered as follow-up data record, as shown in Table 6:
Table six, data record sort
As shown in Table 6, for the data of different dimensions, the implication of its RECORD_TYPE is different.
Because a task needs execution three dimension data statistics, need to be formatted as same pattern to data.The Map stage exports key assignments to be needed to format following pattern:
Key: phone number | DATA_TYPE|RECORD_TYPE
Value:RECORD_TYPE| flow | duration | number of clicks
Wherein DATA_TYPE and RECORD_TYPE implication is as shown in table five, table six.
The data input and output of the present embodiment are as shown in the table.
The inputoutput data of table seven, the present embodiment
As above shown in table 7, in the Map stage, need an original internet records to be converted to three records of three statistics corresponding to dimension, use DATA_TYPE to distinguish to belong to which dimension respectively, use RECORD_TYPE distinguishes the record subclass below each dimension.It is which belongs to analyze dimension that Partition and the Reduce stage then will obtain data according to DATA_TYPE, and then carries out data route or data merging treatment and output.The core of the present embodiment is the data record route in Partition, dimension mark DATA_TYPE belonging to data, the data record belonging to different dimensions is routed to the Reduce set the inside of its correspondence, guarantee data do not mix, do not upset, the data that guarantee Reduce exports are exactly correct, the data after dividing.
The execution flow process of the present embodiment is as follows:
Flow process as shown in Figure 4 according to three dimension statistic flow information every month for phone number.
As shown in Figure 4, the major technique of the present embodiment is the accurate route to data record in Partition class, carries out gathering, process, merge and exporting in Map, Reduce class according to data dimension classification simultaneously.In the present embodiment, for the multiple tasks needing before to run successively just getable result, run a task and just can obtain, greatly can save and run cost duration.
Two kinds of MapReduce multitask execution method test results are as shown in the table.
Table eight, test result contrast
As shown in Table 8, by comparing the test result of two kinds of MapReduce multitask execution methods, can learn that the method for the present embodiment compares existing method more excellent.The cost of tasks carrying total duration is shorter.The impact of the load down that multitask running brings for large data sets group can be greatly reduced.
The data processing equipment of embodiment two, a kind of mobile subscriber's internet records, comprising:
Data configuration module, for constructing N bar data according to each internet records in predetermined amount of time respectively, data comprise mark and statistical information; Described mark comprises cell-phone number, data type and record type in described internet records; Described data type is N kind, and with the N number of predetermined field one_to_one corresponding in described internet records, the data type of the N bar data constructed by an internet records is different; The predetermined field value in described internet records of described record type corresponding to data type described in these data; N be greater than 1 positive integer;
Accumulator module, for adding up to the statistical information identified in identical data, obtains the accumulated value of statistical information corresponding to each mark.
In an embodiment of the present embodiment, described N equals 3; N number of predetermined field described in N kind corresponding to data type is that VPN, BUSI TYPE and LAC adds CELL ID.
In an embodiment of the present embodiment, described statistical information can comprise flow, duration and number of clicks; Described flow is the value of UP field and DOWN field in described internet records, and described duration is the value of time field in described internet records, and described number of clicks is 1.
In an embodiment of the present embodiment, described data configuration module specifically can refer to according to each internet records structure N bar data in predetermined amount of time respectively:
Each internet records in the predetermined amount of time read from Hadoop file system HDFS is processed into N bar key-value pair data by described data configuration module, and wherein key Key comprises phone number, data type and record type; Value Value comprises the value of DOWN in internet records, UP, time, and number of clicks 1;
Described accumulator module is carried out adding up to the statistical information identified in identical data and specifically can be referred to:
Key-value pair data identical for Key is respectively merged into a key-value pair data by described accumulator module, the Key of the key-value pair data after merging is constant, and Value is the result that the value of UP, DOWN, time, number of clicks in the key-value pair data participating in merging carries out adding up respectively.
In present embodiment, described device can also comprise:
Merge module, for in the key-value pair data that obtains in accumulator module, cell-phone number is the identical and key-value pair data that data type is identical is respectively merged into a key-value pair data, and the Key of the key-value pair data after merging is cell-phone number, and Value is the union of the Value participating in each key-value pair data merged.
The all or part of step that one of ordinary skill in the art will appreciate that in said method is carried out instruction related hardware by program and is completed, and described program can be stored in computer-readable recording medium, as ROM (read-only memory), disk or CD etc.Alternatively, all or part of step of above-described embodiment also can use one or more integrated circuit to realize.Correspondingly, each module/unit in above-described embodiment can adopt the form of hardware to realize, and the form of software function module also can be adopted to realize.The present invention is not restricted to the combination of the hardware and software of any particular form.
Certainly; the present invention also can have other various embodiments; when not deviating from the present invention's spirit and essence thereof; those of ordinary skill in the art are when making various corresponding change and distortion according to the present invention, but these change accordingly and are out of shape the protection domain that all should belong to claim of the present invention.

Claims (10)

1. a data processing method for mobile subscriber's internet records, comprising:
S110, respectively according to each internet records structure N bar data in predetermined amount of time, data comprise mark and statistical information; Described mark comprises cell-phone number, data type and record type in described internet records; Described data type is N kind, and with the N number of predetermined field one_to_one corresponding in described internet records, the data type of the N bar data constructed by an internet records is different; The predetermined field value in described internet records of described record type corresponding to data type described in these data; N be greater than 1 positive integer;
S120, the statistical information in the identical data of mark to be added up, obtain the accumulated value of statistical information corresponding to each mark.
2. the method for claim 1, is characterized in that:
Described N equals 3; N number of predetermined field described in N kind corresponding to data type is that VPN, BUSITYPE and LAC add CELL ID.
3. the method for claim 1, is characterized in that:
Described statistical information comprises flow, duration and number of clicks; Described flow is the value of UP field and DOWN field in described internet records, and described duration is the value of time field in described internet records, and described number of clicks is 1.
4. the method according to any one of Claim 1-3, is characterized in that, described step S110 comprises:
Each internet records in the predetermined amount of time read from Hadoop file system HDFS is processed into N bar key-value pair data, and wherein key Key comprises phone number, data type and record type; Value Value comprises the value of DOWN in internet records, UP, time, and number of clicks 1;
Step S120 comprises:
Key-value pair data identical for Key is respectively merged into a key-value pair data, the Key of the key-value pair data after merging is constant, and Value is the result that the value of UP, DOWN, time, number of clicks in the key-value pair data participating in merging carries out adding up respectively.
5. method as claimed in claim 4, is characterized in that, also comprise after step S120:
In S130, the key-value pair data that obtains in step S120, cell-phone number is the identical and key-value pair data that data type is identical is respectively merged into a key-value pair data, the Key of the key-value pair data after merging is cell-phone number, and Value is the union of the Value participating in each key-value pair data merged.
6. a data processing equipment for mobile subscriber's internet records, is characterized in that, comprising:
Data configuration module, for constructing N bar data according to each internet records in predetermined amount of time respectively, data comprise mark and statistical information; Described mark comprises cell-phone number, data type and record type in described internet records; Described data type is N kind, and with the N number of predetermined field one_to_one corresponding in described internet records, the data type of the N bar data constructed by an internet records is different; The predetermined field value in described internet records of described record type corresponding to data type described in these data; N be greater than 1 positive integer;
Accumulator module, for adding up to the statistical information identified in identical data, obtains the accumulated value of statistical information corresponding to each mark.
7. device as claimed in claim 6, is characterized in that:
Described N equals 3; N number of predetermined field described in N kind corresponding to data type is that VPN, BUSITYPE and LAC add CELL ID.
8. device as claimed in claim 6, is characterized in that:
Described statistical information comprises flow, duration and number of clicks; Described flow is the value of UP field and DOWN field in described internet records, and described duration is the value of time field in described internet records, and described number of clicks is 1.
9. the device according to any one of claim 6 to 8, is characterized in that, described data configuration module refers to according to each internet records structure N bar data in predetermined amount of time respectively:
Each internet records in the predetermined amount of time read from Hadoop file system HDFS is processed into N bar key-value pair data by described data configuration module, and wherein key Key comprises phone number, data type and record type; Value Value comprises the value of DOWN in internet records, UP, time, and number of clicks 1;
Described accumulator module carries out cumulative referring to the statistical information identified in identical data:
Key-value pair data identical for Key is respectively merged into a key-value pair data by described accumulator module, the Key of the key-value pair data after merging is constant, and Value is the result that the value of UP, DOWN, time, number of clicks in the key-value pair data participating in merging carries out adding up respectively.
10. device as claimed in claim 9, is characterized in that, also comprise:
Merge module, for in the key-value pair data that obtains in accumulator module, cell-phone number is the identical and key-value pair data that data type is identical is respectively merged into a key-value pair data, and the Key of the key-value pair data after merging is cell-phone number, and Value is the union of the Value participating in each key-value pair data merged.
CN201510080977.3A 2015-02-13 2015-02-13 Data processing method and data processing device for browsing internet records of mobile subscribers Pending CN104615765A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510080977.3A CN104615765A (en) 2015-02-13 2015-02-13 Data processing method and data processing device for browsing internet records of mobile subscribers

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510080977.3A CN104615765A (en) 2015-02-13 2015-02-13 Data processing method and data processing device for browsing internet records of mobile subscribers

Publications (1)

Publication Number Publication Date
CN104615765A true CN104615765A (en) 2015-05-13

Family

ID=53150207

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510080977.3A Pending CN104615765A (en) 2015-02-13 2015-02-13 Data processing method and data processing device for browsing internet records of mobile subscribers

Country Status (1)

Country Link
CN (1) CN104615765A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105591842A (en) * 2016-01-29 2016-05-18 中国联合网络通信集团有限公司 Method and device for obtaining version of mobile terminal operating system
CN106911523A (en) * 2017-04-25 2017-06-30 杭州东方通信软件技术有限公司 The method and system that mobile interchange network users are positioned by LTE indulging in the internet
CN107578148A (en) * 2017-08-16 2018-01-12 深信服科技股份有限公司 The method, apparatus and storage medium that estimation online duration influences on operating efficiency
CN107704575A (en) * 2017-09-30 2018-02-16 郑州轻工业学院 User behavior analysis method and user behavior analysis device based on data mining
CN108093428A (en) * 2017-11-06 2018-05-29 浙江每日互动网络科技股份有限公司 For differentiating the server of real traffic
CN110413670A (en) * 2019-06-28 2019-11-05 阿里巴巴集团控股有限公司 Data export method, device and equipment based on MapReduce
CN110427438A (en) * 2019-07-30 2019-11-08 中国工商银行股份有限公司 Data processing method and its device, electronic equipment and medium
CN111241139A (en) * 2020-01-15 2020-06-05 平安医疗健康管理股份有限公司 Data statistical method, device, computer equipment and storage medium
CN114915427A (en) * 2022-06-06 2022-08-16 中国联合网络通信集团有限公司 Access control method, device, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231649A (en) * 2007-12-27 2008-07-30 腾讯科技(深圳)有限公司 Data distribution statistical method
US20090198724A1 (en) * 2008-02-05 2009-08-06 Mikko Valimaki System and method for conducting network analytics
CN102298623A (en) * 2011-08-15 2011-12-28 北京神州泰岳软件股份有限公司 Method for acquiring dialog list data
CN102999506A (en) * 2011-09-13 2013-03-27 阿里巴巴集团控股有限公司 Method and device for obtaining unique visitor (UV)

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101231649A (en) * 2007-12-27 2008-07-30 腾讯科技(深圳)有限公司 Data distribution statistical method
US20090198724A1 (en) * 2008-02-05 2009-08-06 Mikko Valimaki System and method for conducting network analytics
CN102298623A (en) * 2011-08-15 2011-12-28 北京神州泰岳软件股份有限公司 Method for acquiring dialog list data
CN102999506A (en) * 2011-09-13 2013-03-27 阿里巴巴集团控股有限公司 Method and device for obtaining unique visitor (UV)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105591842B (en) * 2016-01-29 2018-12-21 中国联合网络通信集团有限公司 A kind of method and apparatus obtaining mobile terminal operating system version
CN105591842A (en) * 2016-01-29 2016-05-18 中国联合网络通信集团有限公司 Method and device for obtaining version of mobile terminal operating system
CN106911523A (en) * 2017-04-25 2017-06-30 杭州东方通信软件技术有限公司 The method and system that mobile interchange network users are positioned by LTE indulging in the internet
CN106911523B (en) * 2017-04-25 2019-10-01 杭州东方通信软件技术有限公司 The method and system that mobile interchange network users are positioned by LTE indulging in the internet
CN107578148A (en) * 2017-08-16 2018-01-12 深信服科技股份有限公司 The method, apparatus and storage medium that estimation online duration influences on operating efficiency
CN107704575A (en) * 2017-09-30 2018-02-16 郑州轻工业学院 User behavior analysis method and user behavior analysis device based on data mining
CN108093428A (en) * 2017-11-06 2018-05-29 浙江每日互动网络科技股份有限公司 For differentiating the server of real traffic
CN108093428B (en) * 2017-11-06 2021-02-19 每日互动股份有限公司 Server for authenticating real traffic
CN110413670A (en) * 2019-06-28 2019-11-05 阿里巴巴集团控股有限公司 Data export method, device and equipment based on MapReduce
CN110413670B (en) * 2019-06-28 2023-07-14 创新先进技术有限公司 Data export method, device and equipment based on MapReduce
CN110427438A (en) * 2019-07-30 2019-11-08 中国工商银行股份有限公司 Data processing method and its device, electronic equipment and medium
CN111241139A (en) * 2020-01-15 2020-06-05 平安医疗健康管理股份有限公司 Data statistical method, device, computer equipment and storage medium
CN111241139B (en) * 2020-01-15 2022-09-30 深圳平安医疗健康科技服务有限公司 Data statistical method, device, computer equipment and storage medium
CN114915427A (en) * 2022-06-06 2022-08-16 中国联合网络通信集团有限公司 Access control method, device, equipment and storage medium
CN114915427B (en) * 2022-06-06 2023-10-13 中国联合网络通信集团有限公司 Access control method, device, equipment and storage medium

Similar Documents

Publication Publication Date Title
CN104615765A (en) Data processing method and data processing device for browsing internet records of mobile subscribers
CN105224606B (en) A kind of processing method and processing device of user identifier
CN102724219B (en) A network data computer processing method and a system thereof
CN107145556B (en) Universal distributed acquisition system
CN107464043A (en) The distribution method of polymorphic type task in a kind of space mass-rent
CN105447147A (en) Data processing method and apparatus
CN104809130A (en) Method, equipment and system for data query
CN104462115A (en) Spam message identifying method and device
CN104133765B (en) The test case sending method of network activity and test case server
CN107729137A (en) Server, the method and storage medium of the decryption of block chain sign test
CN103618733A (en) Data filtering system and method applied to mobile internet
CN105871585A (en) Terminal association method and device
CN104704484A (en) Communicating tuples in message
CN107622064A (en) A kind of method for reading data and system
CN104580310A (en) Log processing method and server
CN105404644A (en) Public opinion information processing method and system
CN104765823A (en) Method and device for collecting website data
CN105007200B (en) The analysis method and system of network packet
CN105335313A (en) Basic data transmission method and apparatus
CN109542867A (en) Distribution type data collection method and device
CN105468676A (en) Big data processing method
CN106102027B (en) Short message batch based on MapReduce submits method
CN109428774B (en) Data processing method of DPI equipment and related DPI equipment
CN116382916A (en) Resource scheduling method and system for cloud computer computing power cluster and electronic equipment
CN112486676B (en) Data sharing and distributing device based on edge calculation

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20150513

RJ01 Rejection of invention patent application after publication