Summary of the invention
Main purpose of the present invention is to provide a kind of data processing method based on cellphone subscriber's internet behavior and device, is intended to the processing speed improving cellphone subscriber's Internet data, improves systematic function.
The present invention proposes a kind of data processing method based on cellphone subscriber's internet behavior, and described method comprises:
The first ticket including user's accessed web page address URL is generated according to user's Internet data;
According to pre-defined rule, preliminary treatment is carried out to data in described first ticket, generate the second ticket;
Statistical analysis process is carried out to data in described second ticket.
Preferably, describedly according to pre-defined rule, pretreated step is carried out to data in described first ticket and comprises:
Network access style URL analyzing and processing and/or appointed website flow analysis process and/or advertisement flowing of access analyzing and processing are carried out to data in described first ticket.
Preferably, described the step that data in first ticket carry out network access style URL analyzing and processing to be comprised:
The field of URL type is increased, for depositing URL generic in described first ticket;
Resolve the origin url in described first ticket;
From the URL classification contrast relationship table preset, search generic corresponding to described origin url, write the field of URL type corresponding with origin url in the second ticket.
Preferably, described the step that data in first ticket carry out appointed website flow analysis process to be comprised:
New url field is increased, for depositing the new URL after conversion in described first ticket;
According to the origin url in intended conversion rule conversion the first ticket;
Origin url after conversion is write new url field corresponding with origin url in the second ticket.
Preferably, described the step that data in first ticket carry out advertisement flowing of access analyzing and processing to be comprised:
Commercial paper url field is increased, for depositing commercial paper URL in described first ticket;
Commercial paper URL is isolated according to the predefined identifier in described first ticket entrained by origin url;
Described commercial paper URL is write commercial paper url field corresponding with origin url in the second ticket.
Preferably, described the step that data in first ticket carry out advertisement flowing of access analyzing and processing to be comprised:
Commercial paper url field is increased, for depositing commercial paper URL in described first ticket;
Commercial paper URL is isolated according to the predefined identifier in described first ticket entrained by origin url;
Described commercial paper URL is write commercial paper url field corresponding with origin url in the second ticket.
The present invention also proposes a kind of data processing equipment based on cellphone subscriber's internet behavior, comprising:
Original CDR generation module, includes for generating according to user's Internet data the first ticket that user accesses URL;
New ticket generation module, for carrying out preliminary treatment according to pre-defined rule to data in described first ticket, generates the second ticket;
New call bill data processing module, for carrying out statistical analysis process to data in described second ticket.
Preferably, described new ticket generation module is also for carrying out network access style URL analyzing and processing, appointed website flow analysis process and/or advertisement flowing of access analyzing and processing to data in described first ticket.
Preferably, described new ticket generation module comprises:
Field increases unit, is used for the field of the URL type depositing URL generic for increasing in described first ticket;
Resolution unit, for resolving the origin url in described first ticket;
Writing unit, for searching generic corresponding to described origin url from the URL classification contrast relationship table preset, writes the field of URL type corresponding with origin url in the second ticket.
Preferably, described field increases unit, is also used for the new url field of the new URL after depositing conversion for increasing in described first ticket;
Described resolution unit, also for changing the origin url in the first ticket according to intended conversion rule;
Said write unit, also for the origin url after conversion is write new url field corresponding with origin url in the second ticket; Or
Described field increases unit, also for increasing the commercial paper url field being used for depositing commercial paper URL in described first ticket;
Described resolution unit, also for isolating commercial paper URL according to the predefined identifier in described first ticket entrained by origin url;
Said write unit, also for described commercial paper URL is write commercial paper url field corresponding with origin url in the second ticket.
A kind of data processing method based on cellphone subscriber's internet behavior that the present invention proposes and device, before call bill data warehouse-in, first use pre-processing device such as interface message processor (IMP) phone bill according to carrying out preliminary treatment, preprocessing process comprises the user URL generated that surfs the Net is carried out to Classifying Sum, changes etc. according to certain rule URL, by generating new call bill data warehouse-in after a series of preliminary treatment.System database carries out statistical analysis process to the second call bill data afterwards.Thus, interface message processor (IMP) the resolving of URL is transferred to go process, result data after parsing generates new ticket, system database directly carries out statistical analysis according to result data, eliminate the process that large batch of url data is analyzed, thus substantially increase the efficiency of phone bill according to process, solve the performance bottleneck problem that cellphone subscriber's internet behavior is analyzed.
Embodiment
Solution for embodiment of the invention is mainly before call bill data warehouse-in, first phone bill is according to carrying out preliminary treatment, preprocessing process comprises the user URL generated that surfs the Net is carried out to Classifying Sum, changes etc. according to certain rule URL, by generating new call bill data warehouse-in after a series of preliminary treatment.System database carries out statistical analysis process to the second call bill data afterwards, to improve the efficiency of phone bill according to process, solves the performance bottleneck problem that cellphone subscriber's internet behavior is analyzed.
As shown in Figure 1, one embodiment of the invention proposes a kind of data processing method based on cellphone subscriber's internet behavior, comprising:
Step S101, generates according to user's Internet data and includes the first ticket that user accesses URL;
In the present embodiment, user can pass through surfing Internet with cell phone, accesses various website, to obtain the corresponding network information.When user is by surfing Internet with cell phone, mobile service system obtains network data according to the access network address of cellphone subscriber, and produce Original CDR, i.e. the first ticket alleged by the present embodiment, user's visit capacity is more, the corresponding increase of ticket amount that mobile service system produces.
Wherein, the URL of user's access is included in ticket.URL is a kind of identification method of the address for intactly describing webpage and other resources on Internet.Each webpage on Internet has a unique name identification, is usually referred to as URL address, and this address can be local disk, and also can be a certain computer on local area network (LAN), be more the website on Internet.Briefly, URL is exactly Web address, is commonly called as " network address ".
After mobile service system gets the first ticket, need to carry out analytic statistics process to the first call bill data, to understand cellphone subscriber's internet behavior according to result, such as: the advertisement flowing of access etc. that user often likes the website of which type upper, the flowing of access situation of website that some is specified and businessman to be concerned about, thus take corresponding commercial practice etc. according to cellphone subscriber's internet behavior is follow-up.
Step S102, carries out preliminary treatment according to pre-defined rule to data in the first ticket, generates the second ticket;
In the present embodiment, pre-defined rule is the subject matters such as access websites type, website visiting flow and the advertisement flowing of access be concerned about for operator and formulates, wherein carry out preliminary treatment according to pre-defined rule to data in the first ticket to comprise: carry out network access style URL analyzing and processing and/or appointed website flow analysis process and/or advertisement flowing of access analyzing and processing to data in the first ticket, concrete, such as can the user URL generated that surfs the Net be carried out Classifying Sum, be changed etc. according to certain rule URL.
According to the needs obtaining data processing of information, above-mentioned pre-defined rule also can be other similar rules.
In the present embodiment, preliminary treatment is carried out to data in the first ticket, independently equipment can be adopted, such as interface message processor (IMP), first use interface message processor (IMP) phone bill according to carrying out preliminary treatment, such as to user surf the Net generate URL carry out Classifying Sum, URL changed etc. according to certain rule, by generating the second ticket in new ticket and the present embodiment after a series of preliminary treatment, then new call bill data is put in storage, so that in subsequent processes, system database carries out statistical analysis process to the second call bill data.In the present embodiment, the second call bill data warehouse-in can be entered in the database table that pretreated data inputting specifies to system by library by IMP.
Step S103, carries out statistical analysis process to data in the second ticket.
As mentioned above, newly-generated call bill data transfers to system database to carry out statistical analysis process, such as, according to the generic of URL in the second ticket, can count the combined data that user expects a certain class URL obtained.Thus, interface message processor (IMP) the resolving of URL in first ticket is transferred to go process, result data after parsing generates new ticket, system database directly carries out statistical analysis according to result data, eliminate the process that large batch of url data is analyzed, thus substantially increase the efficiency of phone bill according to process, solve the performance bottleneck problem that cellphone subscriber's internet behavior is analyzed.
As shown in Figure 2, in step S102, the step that data in the first ticket carry out network access style URL analyzing and processing is comprised:
Step S1021, increases and is used for the field of the URL type depositing URL generic in the first ticket;
Step S1022, resolves the origin url in the first ticket;
Step S1023, searches generic corresponding to origin url, writes the field of URL type corresponding with origin url in the second ticket from the URL classification contrast relationship table preset.
Illustrate with instantiation below and the process of network access style URL analyzing and processing carried out to data in the first ticket, if there is the first call bill data as shown in table 2 below:
Sequence number |
URL |
1 |
http://www.sina.com/sport/1001.htm |
2 |
http://www.sina.com/sport/1002.htm |
3 |
http://www.sina.com/sport/1003.htm |
4 |
http://www.sina.com/news/1004.htm |
5 |
http://www.sina.com/news/1005.htm |
6 |
http://www.sina.com/movie/1006.htm |
Table 2
Wherein, the URL classification contrast relationship table namely preset of the criteria for classification of URL is as shown in table 3 below:
Classification |
URL |
Sport category |
http://www.sina.com/sport/* |
News category |
http://www.sina.com/news/* |
Film class |
http://www.sina.com/movie/ |
Table 3
Pretreated result is analyzed as shown in table 4 below by network access style URL:
Sequence number |
URL |
Classification |
1 |
http://www.sina.com/sport/1001.htm |
Sport category |
2 |
http://www.sina.com/sport/1002.htm |
Sport category |
3 |
http://www.sina.com/sport/1003.htm |
Sport category |
4 |
http://www.sina.com/news/1004.htm |
News category |
5 |
http://www.sina.com/news/1005.htm |
News category |
6 |
http://www.sina.com/movie/1006.htm |
Film class |
Table 4
Can draw thus, according to the generic of URL in the second ticket, the combined data that user expects the URL of a certain class such as news category obtained can be counted, the URL of the news category shown in table 4 is two, http://www.sina.com/news/1004.htm and http://www.sina.com/news/1005.htm.
As shown in Figure 3, in step S102, the step that data in the first ticket carry out appointed website flow analysis process is comprised:
Step S1024, increases and is used for the new url field of the new URL after depositing conversion in the first ticket;
Step S1025, according to the origin url in intended conversion rule conversion the first ticket;
Step S1026, writes new url field corresponding with origin url in the second ticket by the origin url after conversion.
Wherein, intended conversion rule can be the transformation rule table formulated according to system HOST file configuration rule, such as, for some HOST, there is following rule, as shown in table 5, wherein, " whether processing extension name ", " whether ignoring parameter " option are set with to each URL.
Table 5
According to above-mentioned transformation rule table, can the origin url in the first ticket be converted to new URL, write the corresponding new url field in the second ticket.The flowing of access of particular content in appointed website or appointed website can be counted according to the information of url field new in the second ticket.
It should be noted that, when carrying out preliminary treatment to the first call bill data, can by three kinds of pretreatment modes described in the present embodiment namely: network access style URL analyzing and processing, appointed website flow analysis process and the triplicity of advertisement flowing of access analyzing and processing are carried out to data in the first ticket and gets up to carry out, thus, according to final the second ticket generated, the type of user's access websites, the flowing of access of appointed website and advertisement flowing of access etc. can be counted simultaneously.
By data test, the comparable situation obtaining solution that the embodiment of the present invention analyzes cellphone subscriber's internet behavior and traditional solution is as shown in table 6 below:
Table 6
As shown in Table 6, compare conventional art, solution for embodiment of the invention analyzes the data of user's online enough more efficiently, substantially increases the processing speed of call bill data, alleviate the processing load of system database, solve the performance bottleneck problem that cellphone subscriber's internet behavior is analyzed.
The present embodiment is before call bill data warehouse-in, first use pre-processing device such as interface message processor (IMP) phone bill according to carrying out preliminary treatment, preprocessing process comprises the user URL generated that surfs the Net is carried out to Classifying Sum, changes etc. according to certain rule URL, by generating new call bill data warehouse-in after a series of preliminary treatment.System database carries out statistical analysis process to the second call bill data afterwards.Thus, interface message processor (IMP) the resolving of URL is transferred to go process, result data after parsing generates new ticket, system database directly carries out statistical analysis according to result data, eliminate the process that large batch of url data is analyzed, thus substantially increase the efficiency of phone bill according to process, solve the performance bottleneck problem that cellphone subscriber's internet behavior is analyzed.
As shown in Figure 4, in step S102, the step that data in the first ticket carry out advertisement flowing of access analyzing and processing is comprised:
Step S1027, increases the commercial paper url field being used for depositing commercial paper URL in the first ticket;
Step S1028, isolates commercial paper URL according to the predefined identifier in described first ticket entrained by origin url;
Step S1029, writes commercial paper url field corresponding with origin url in the second ticket by commercial paper URL.
As shown in Figure 5, one embodiment of the invention proposes a kind of data processing equipment based on cellphone subscriber's internet behavior, comprising: Original CDR generation module 501, new ticket generation module 502 and new call bill data processing module 503, wherein:
Original CDR generation module 501, includes for generating according to user's Internet data the first ticket that user accesses URL;
In the present embodiment, user can pass through surfing Internet with cell phone, accesses various website, to obtain the corresponding network information.When user is by surfing Internet with cell phone, in mobile service system, Original CDR generation module 501 obtains network data according to the access network address of cellphone subscriber, produces Original CDR, i.e. the first ticket alleged by the present embodiment, user's visit capacity is more, the corresponding increase of ticket amount that mobile service system produces.
Wherein, the URL of user's access is included in ticket.URL is a kind of identification method of the address for intactly describing webpage and other resources on Internet.Each webpage on Internet has a unique name identification, is usually referred to as URL address, and this address can be local disk, and also can be a certain computer on local area network (LAN), be more the website on Internet.Briefly, URL is exactly Web address, is commonly called as " network address ".
After mobile service system gets the first ticket, need to carry out analytic statistics process to the first call bill data, to understand cellphone subscriber's internet behavior according to result, such as: the advertisement flowing of access etc. that user often likes the website of which type upper, the flowing of access situation of website that some is specified and businessman to be concerned about, thus take corresponding commercial practice etc. according to cellphone subscriber's internet behavior is follow-up.
New ticket generation module 502, for carrying out preliminary treatment according to pre-defined rule to data in the first ticket, generates the second ticket;
In the present embodiment, new ticket generation module 501 carries out preliminary treatment according to pre-defined rule to data in the first ticket and specifically comprises and carry out network access style URL analyzing and processing, appointed website flow analysis process and/or advertisement flowing of access analyzing and processing to data in the first ticket.
Wherein, pre-defined rule is the subject matters such as access websites type, website visiting flow and the advertisement flowing of access be concerned about for operator and formulates, wherein carry out preliminary treatment according to pre-defined rule to data in the first ticket to comprise: carry out network access style URL analyzing and processing and/or appointed website flow analysis process and/or advertisement flowing of access analyzing and processing to data in the first ticket, concrete, such as can the user URL generated that surfs the Net be carried out Classifying Sum, be changed etc. according to certain rule URL.
According to the needs obtaining data processing of information, above-mentioned pre-defined rule also can be other similar rules.
In the present embodiment, preliminary treatment is carried out to data in the first ticket, independently equipment can be adopted, such as interface message processor (IMP), first use interface message processor (IMP) phone bill according to carrying out preliminary treatment, such as to user surf the Net generate URL carry out Classifying Sum, URL changed etc. according to certain rule, put in storage by generating new call bill data by the second ticket generation module 502 after a series of preliminary treatment, so that in subsequent processes, system database carries out statistical analysis process to the second call bill data.In the present embodiment, the second call bill data warehouse-in can be entered in the database table that data inputting after preliminary treatment specifies to system by library by IMP.
New call bill data processing module 503, for carrying out statistical analysis process to data in the second ticket.
As mentioned above, newly-generated call bill data transfers to the new call bill data processing module 503 of system database to carry out statistical analysis process, such as, according to the generic of URL in the second ticket, can count the combined data that user expects a certain class URL obtained.
Thus, interface message processor (IMP) the resolving of URL in first ticket is transferred to go process, result data after parsing generates new ticket, system database directly carries out statistical analysis according to result data, eliminate the process that large batch of url data is analyzed, thus substantially increase the efficiency of phone bill according to process, solve the performance bottleneck problem that cellphone subscriber's internet behavior is analyzed.
As shown in Figure 6, new ticket generation module 502 comprises: field increases unit 5021, resolution unit 5022 and writing unit 5023, wherein:
Field increases unit 5021, is used for the field of the URL type depositing URL generic for increasing in the first ticket;
Resolution unit 5022, for resolving the origin url in the first ticket;
Writing unit 5023, for searching generic corresponding to origin url from the URL classification contrast relationship table preset, writes the field of URL type corresponding with origin url in the second ticket.
Further, field increases unit 5021, is also used for the new url field of the new URL after depositing conversion for increasing in the first ticket;
Resolution unit 5022, also for changing the origin url in the first ticket according to intended conversion rule;
Writing unit 5023, also for the origin url after conversion is write new url field corresponding with origin url in the second ticket.
Further, field increases unit 5021, also for increasing the commercial paper url field being used for depositing commercial paper URL in the first ticket;
Resolution unit 5022, also for isolating commercial paper URL according to the predefined identifier in the first ticket entrained by origin url;
Writing unit 5023, also for commercial paper URL is write commercial paper url field corresponding with origin url in the second ticket.
The embodiment of the present invention based on the data processing method of cellphone subscriber's internet behavior and device by before call bill data warehouse-in, first use pre-processing device such as interface message processor (IMP) phone bill according to carrying out preliminary treatment, preprocessing process comprises the user URL generated that surfs the Net is carried out to Classifying Sum, changes etc. according to certain rule URL, by generating new call bill data warehouse-in after a series of preliminary treatment.System database carries out statistical analysis process to the second call bill data afterwards.Thus, interface message processor (IMP) the resolving of URL is transferred to go process, result data after parsing generates new ticket, system database directly carries out statistical analysis according to result data, eliminate the process that large batch of url data is analyzed, thus substantially increase the efficiency of phone bill according to process, solve the performance bottleneck problem that cellphone subscriber's internet behavior is analyzed.
The foregoing is only the preferred embodiments of the present invention; not thereby the scope of the claims of the present invention is limited; every utilize specification of the present invention and accompanying drawing content to do equivalent structure or flow process conversion; or be directly or indirectly used in other relevant technical field, be all in like manner included in scope of patent protection of the present invention.