Active users statistical method and device
Technical field
The application relates to communication technical field, particularly relates to a kind of active users statistical method and device.
Background technology
Any active ues refers in the measurement period of regulation, uses the user of Operator Specific Service.Active users is an important service index of telecom operators.
Along with the continuous growth of telecommunication user quantity, customer service ticket increasing number, business personnel's ticket amount of part province every day reaches more than hundred million grades, and therefore how from these mass datas, express statistic goes out the active users in prefectures and cities, each province and the whole nation, becomes problem demanding prompt solution.
Summary of the invention
For solving the problems of the technologies described above, the embodiment of the present application provides a kind of active users statistical method, goes out active users with express statistic from mass data.
Technical scheme is as follows:
A kind of active users statistical method, comprising:
Preset the original state of subscriber directory number to be counted, comprising:
The described subscriber directory number number of pressing section to be counted is divided into groups; Wherein, each organizes the corresponding some bytes of subscriber directory number to be counted, the unique corresponding subscriber directory number to be counted in a byte position in a byte; The original state of described subscriber directory number to be counted is corresponding with the initial value of described byte position;
Read ticket to be counted;
Original state corresponding for the subscriber directory number occurred in described ticket to be counted is revised as statistic behavior;
State in described subscriber directory number to be counted of adding up is the subscriber directory number number of statistic behavior.
Said method, preferably, the initial value of described byte position is 0.
Said method, preferably, is describedly revised as statistic behavior by original state corresponding for the subscriber directory number occurred in described ticket to be counted and comprises:
Byte position corresponding in byte belonging to the subscriber directory number occurred in described ticket is searched in described some bytes;
Determine to revise parameter, described revision parameter is indicated by a byte, the value of byte position corresponding with the byte position of byte position opposite position corresponding in byte belonging to the subscriber directory number occurred in described ticket in described revision parameter is 1, and the value of other byte position is 0;
Described revision parameter is carried out step-by-step with the byte belonging to described subscriber directory number or operates that original state corresponding for the subscriber directory number occurred in described ticket is revised as statistic behavior.
Said method, preferably, the initial value of described byte position is 1.
Said method, preferably, is describedly revised as statistic behavior by original state corresponding for the subscriber directory number occurred in described ticket to be counted and comprises:
Byte position corresponding in byte belonging to the subscriber directory number occurred in described ticket is searched in described some bytes;
Determine to revise parameter, described revision parameter is indicated by a byte, and the value of the byte position of byte position opposite position corresponding with byte belonging to the subscriber directory number occurred in described ticket in described revision parameter is 0, and the value of other byte position is 1;
Described revision parameter and the byte belonging to described subscriber directory number are carried out step-by-step with operation so that original state corresponding for the subscriber directory number occurred in described ticket is revised as statistic behavior.
Said method, preferably, the initial state information of described subscriber directory number to be counted is stored in Hash file, and described Hash file comprises file header, description block and data block, wherein,
Described file header comprises: file type identifies, and update time the latest, the total size of file, carries out the group number divided into groups by the described subscriber directory number number of pressing section to be counted;
Described description block comprises: the beginning numeral in number section, telephone number outside division sign section and end number;
Described data block comprises: the statistics that each group subscriber directory number is corresponding.
Said method, preferably, also comprises:
When the first statistics file and the second statistics file are merged into the 3rd statistics file by needs, the statistics being only present in described first statistics file or being only present in subscriber directory number in described second statistics file corresponding is directly added in described 3rd statistics file; Statistics step-by-step corresponding for the subscriber directory number be simultaneously present in described first statistics file and described second statistics file is carried out or operated, and operating result is added in described 3rd statistics file.
A kind of active users statistic device, comprising:
Presetting module, for presetting the original state of subscriber directory number to be counted, comprising: the described subscriber directory number number of pressing section to be counted divided into groups; Each organizes the corresponding some bytes of subscriber directory number to be counted, the unique corresponding subscriber directory number in a byte position in a byte; The original state of described subscriber directory number to be counted is corresponding with the initial value of described byte position;
Read module, for reading ticket to be counted;
Modified module, for being revised as statistic behavior by original state corresponding for the subscriber directory number occurred in described ticket to be counted;
Statistical module is the subscriber directory number number of statistic behavior for adding up state in described subscriber directory number to be counted.
Said apparatus, preferably, the initial value of described byte position is 0.
Said apparatus, preferably, described modified module comprises:
Search unit, for searching byte position corresponding in byte belonging to the subscriber directory number that occurs in described ticket in described some bytes;
Determining unit, parameter is revised for determining, described revision parameter is indicated by a byte, and the value of the byte position of byte position opposite position corresponding with byte belonging to the subscriber directory number occurred in described ticket in described revision parameter is 1, and the value of other byte position is 0;
Amendment unit, for carrying out step-by-step by described revision parameter with the byte belonging to described subscriber directory number or operating that original state corresponding for the subscriber directory number occurred in described ticket is revised as statistic behavior.
Said apparatus, preferably, the initial shape of described subscriber directory number to be counted is stored in Hash file, and described Hash file comprises file header, description block and data block, wherein,
Described file header comprises: file type identifies, and update time the latest, the total size of file, carries out the group number divided into groups by the described subscriber directory number number of pressing section to be counted;
Described description block comprises: the beginning numeral outside number section, telephone number division sign section and end number;
Described data block comprises: the statistics that each group subscriber directory number is corresponding.
The technical scheme provided from above the embodiment of the present application, a kind of active users statistical method provided by the invention, presets the original state of subscriber directory number to be counted, the described subscriber directory number number of pressing section to be counted is divided into groups; Wherein, the corresponding some bytes of each group counting user telephone number, the unique corresponding subscriber directory number number in a byte position in a byte; The original state of described subscriber directory number to be counted is corresponding with the initial value of described byte position; Read ticket to be counted; Original state corresponding for the subscriber directory number occurred in described ticket to be counted is revised as statistic behavior; State in described subscriber directory number to be counted of adding up is the subscriber directory number number of statistic behavior.
It can thus be appreciated that, a kind of active users statistical method that the embodiment of the present application provides, the state of each subscriber directory number to be counted is indicated by a byte position, in statistic processes, only operated by one-time positioning, a retouching operation, just can realize the statistics to active users, can express statistic be carried out, can space be saved again.
Accompanying drawing explanation
In order to be illustrated more clearly in the technical scheme in the embodiment of the present application, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, the accompanying drawing that the following describes is only some embodiments recorded in the application, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
The flow chart of a kind of active users statistical method that Fig. 1 provides for the embodiment of the present application;
The structural representation of the Hash file that Fig. 2 provides for the embodiment of the present application;
The structural representation of a kind of active users statistic device that Fig. 3 provides for the embodiment of the present application;
The structural representation of the another kind of active users statistic device that Fig. 4 provides for the embodiment of the present application.
In order to illustrated simple and clear, above accompanying drawing shows the common form of structure, and in order to avoid unnecessary fuzzy the present invention, can omit description and the details of known features and technology.In addition, the unit in accompanying drawing is unnecessary proportionally to be drawn.Such as, can relative to the size of some unit in other unit enlarged drawings, thus help better to understand embodiments of the invention.Identical label in different accompanying drawing represents identical unit.
Term " first ", " second ", " the 3rd " " 4th " etc. (if existence) in description and claims and above-mentioned accompanying drawing are for distinguishing similar unit, and need not be used for describing specific order or precedence.Should be appreciated that the data used like this can be exchanged in the appropriate case, so as embodiments of the invention described herein such as can with except here illustrated or otherwise describe those except order implement.In addition, term " comprises " and " having " and their any distortion, intention is to cover not exclusive comprising, to comprise the process of a series of unit, method, system, product or equipment being not necessarily limited to those unit, but can comprise clearly do not list or for intrinsic other unit of these processes, method, product or equipment.
Detailed description of the invention
The application's scheme is understood better in order to make those skilled in the art person.Below in conjunction with the accompanying drawing in the embodiment of the present application, be clearly and completely described the technical scheme in the embodiment of the present application, obviously, described embodiment is only some embodiments of the present application, instead of whole embodiments.Based on the embodiment in the application, those of ordinary skill in the art are not making the every other embodiment obtained under creative work prerequisite, all should belong to the scope of the application's protection.
Realize based on to the counting of any active ues to the statistics of active users in prior art, namely the number of times that each subscriber directory number occurs in ticket is counted, finally count the quantity that occurrence number is more than or equal to the subscriber directory number of 1, i.e. active users, inventor is realizing finding in process of the present invention that this active users statistical method Statistical Speed is slow, and efficiency is low.
The flow chart of a kind of active users statistical method that the embodiment of the present application provides as shown in Figure 1, comprising:
Step S101: the original state presetting subscriber directory number to be counted, comprising: the described subscriber directory number number of pressing section to be counted divided into groups; Wherein, each organizes the corresponding some bytes of subscriber directory number to be counted, the unique corresponding subscriber directory number to be counted in a byte position in a byte; The original state of described subscriber directory number to be counted is corresponding with the initial value of described byte position;
Described subscriber directory number to be counted refers to all numbers that operator can run; Described original state refers to initial statistical state, i.e. non-statistic behavior;
Described section refers to the prefix of telephone number, can refer to first 3 of telephone number, as 135,131,189 etc., also can refer to first 4 of telephone number, as 1351,1352,1313,1315,1891,1892 etc., is not specifically limited here.
In the present embodiment, the corresponding some bytes of each group subscriber directory number, described byte both can be single byte, also can be double byte, certainly can also be double word (i.e. nybble), here be not specifically limited, the unique corresponding subscriber directory number to be counted in a byte position in a byte, the original state of described subscriber directory number to be counted is corresponding with the initial value of described byte position.
Step S102: read ticket to be counted;
Step S103: original state corresponding for the subscriber directory number occurred in described ticket is revised as statistic behavior;
After reading the number occurred in ticket, it is statistic behavior by the status modifier of this number, due in the present embodiment, the original state of subscriber directory number to be counted is corresponding with the initial value of byte position, so by the status modifier of this number be statistic behavior time, its original state can be revised by the mode of the value revising byte position corresponding to this number.
Step S104: state in described subscriber directory number to be counted of adding up is the subscriber directory number number of statistic behavior.
In described subscriber directory number to be counted, state is the subscriber directory number number of statistic behavior is exactly active users.
In the present embodiment, each byte bit-identify subscriber directory number to be counted, in statistic processes, as long as one-time positioning, once revise the statistics that just can realize active users, space (bit number) shared by this statistics is suitable with subscriber directory number number, take memory space little, and do not need the number of times that enlivens treating counting user telephone number to count due to this programme, therefore, when revising continuously the state of same subscriber directory number, amendment order need not be considered, improve statistical efficiency.
Such scheme, preferably, the initial value of described byte position can be 0, and so, original state corresponding for the subscriber directory number occurred in described ticket being revised as statistic behavior can comprise:
Byte position corresponding in byte belonging to the subscriber directory number occurred in described ticket is searched in described some bytes; When searching, namely can search in the byte-by-byte position of the order of its place group by the number occurred in described ticket, also first can locate the byte belonging to it, and then search the byte position of its correspondence in byte belonging to it.
Determine to revise parameter, described revision parameter is indicated by a byte, and the value of the byte position of byte position opposite position corresponding with byte belonging to the subscriber directory number occurred in described ticket in described revision parameter is 1, and other byte position is 0; Suppose the corresponding some single byte of each group subscriber directory number, so in the present embodiment, the byte corresponding with described subscriber directory number is corresponding, and described revision parameter is also single byte.If the corresponding some double bytes of each group subscriber directory number, so in the present embodiment, the byte corresponding with described subscriber directory number is corresponding, and described revision parameter is also double byte.
Described revision parameter is carried out step-by-step with the byte belonging to described subscriber directory number or operates that original state corresponding for the subscriber directory number occurred in described ticket is revised as statistic behavior.The mode of namely carrying out with step-by-step or operate revises the initial value of byte position corresponding to the subscriber directory number that occurs in described ticket.
Such as, suppose that the byte belonging to subscriber directory number occurred in described ticket is 00100100, the byte position of its correspondence is from the 4th from left to right, so, described revision parameter is just 00010000, after two byte step-by-steps are carried out or operated, the byte belonging to the subscriber directory number occurred in described ticket becomes 00110100; If the byte position that the subscriber directory number occurred in described ticket is corresponding is from the 3rd from left to right, so, described revision parameter is just that after 00100000, two byte step-by-steps are carried out or operated, the byte belonging to the subscriber directory number occurred in described ticket is still 00100100;
Preferably, in such scheme, the initial value of described byte position also can be 1, and original state corresponding for the subscriber directory number occurred in described ticket being revised as statistic behavior can comprise:
Byte position corresponding in byte belonging to the subscriber directory number occurred in described ticket is searched in described some bytes;
Determine to revise parameter, described revision parameter is indicated by a byte, and the value of the byte position of byte position opposite position corresponding with byte belonging to the subscriber directory number occurred in described ticket in described revision parameter is 0, and other byte position is 1;
Described revision parameter and the byte belonging to described subscriber directory number are carried out step-by-step with operation so that original state corresponding for the subscriber directory number occurred in described ticket is revised as statistic behavior.Namely carry out revising with the mode of operation the initial value of byte position corresponding to subscriber directory number that described ticket always occurs with step-by-step.
Such as, suppose that the byte belonging to subscriber directory number occurred in described ticket is 00100100, the byte position of its correspondence is from the 4th from left to right, so, described revision parameter is just 11101111, two byte step-by-steps carry out with operation after, the byte belonging to the subscriber directory number occurred in described ticket is still 00100100; If the byte position that the subscriber directory number occurred in described ticket is corresponding is from the 3rd from left to right, so, described revision parameter be just 11011111, two byte step-by-steps carry out with operation after, the byte belonging to the subscriber directory number occurred in described ticket becomes 00000100;
In order to the byte position corresponding to the subscriber directory number that occurs in ticket described in quick position, in the embodiment of the present application, the original state of described subscriber directory number to be counted is stored in Hash file, as shown in Figure 2, and the structural representation of the Hash file that Fig. 2 provides for the embodiment of the present application;
Described Hash file comprises: file header, description block and data block, wherein,
Described file header can comprise: type identification, for identifying Hash file, such as, described type identification can be " HS ", certainly, also can be other mark, in the present embodiment, described type identification can take 2 bytes, is not specifically limited here.
Update time the latest, for representing the system time of final updating Hash file, it can take 4 bytes, and its data type can be unsigned int.
The total size of file, for representing the byte-sized of current whole Hash file, it can take 8 bytes, and its data type can be unsigned int.
Group number, namely described subscriber directory number foundation section to be counted carries out the group number that divides into groups, and it can take 2 bytes.
Described description block is for describing the scope of statistics of each grouping, and it can comprise following three parts:
Number section, represents the number part not needing to add up corresponding to current group, and as number one section, to be 135, No. second section be 136 etc., and it can take 8 bytes, and its data type can be character string.
Start numeral, represent the beginning except described section of the number of the needs statistics corresponding to current group, as " 00000000 " (when number section is 3) or " 0000000 " (when number section is 4), it can take 4 bytes, and data type can be unsigned int.
End number, represent the latter end except described section of the number of the needs statistics corresponding to current group, as " 99999999 " (when number section is 3) or " 9999999 " (when number section is 4), it can take 4 bytes, and data type can be unsigned int.
Described data block is used for grouping and stores described data to be counted, and deposits statistics by the order of packets in description block, and each byte indicates the subscriber directory number to be counted of number identical with the figure place of described byte successively.
Known based on described Hash file, the byte number that the statistics of current group takies is:
N=(p-q)/m+1, wherein,
N is the byte number that the statistics of current group takies; P is the end number in current group; Q is the beginning numeral in current group; M is the figure place of selected byte, if single byte, then and m=8, if double byte, then m=16, if double word (i.e. nybble), then m=32.
When in data block, the value of certain byte position is different from initial value, illustrate that the subscriber directory number of this byte position correspondence occurred once and the ticket of above number of times, then this user is any active ues, if certain byte position keeps initial value constant, illustrate that any ticket did not appear in the subscriber directory number of this byte position correspondence, this user is inactive users.It can thus be appreciated that, a kind of active users statistical method that the embodiment of the present application provides, the rate of increase of the byte number shared by its statistics file and counting user telephone number is 1:m, that is, often increase statistics m subscriber directory number, file size only increases by 1 byte unit (single byte, double byte or nybble).The space that the active users statistical method that further illustrating the embodiment of the present application provides takies is little, saves memory space.
To add up the active users of 135 ~ 139 these 5 numbers sections, the size of its statistics file is 62500096 bytes, wherein:
File header size=16 byte;
Description block size=No. 135 section description block size+No. 136 section description block sizes+No. 137 section description block sizes+No. 138 section description block sizes+No. 139 section description block size=16+16+16+16+16=80 bytes;
Data block size=No. 135 segment data block size+No. 136 segment data block sizes+No. 137 segment data block sizes+No. 138 segment data block sizes+No. 139 segment data block size=5* ((99999999-0)/8+1)=5*12500000=62500000 bytes.
Further, treat counting user telephone number when the multiple statistics file of application to add up, need when the first statistics file in described multiple statistics file and the second statistics file are merged into the 3rd statistics file, can the statistics being only present in described first statistics file or being only present in subscriber directory number in described second statistics file corresponding be added in described 3rd statistics file; Statistics step-by-step corresponding for the subscriber directory number be simultaneously present in described first statistics file and described second statistics file is carried out or operated, and operating result is added in described 3rd statistics file.
Such as, suppose statistics number in the first statistics file not 13500000000 to 13500019999, statistics number in second statistics file not 13500010000 to 1350002999, so, can merge described first statistics file and the second statistics file according to following step:
Set up the 3rd statistics file;
Scan two statistics files, merging number range is 13500000000 to 13500029999;
Merge statistics, wherein, 13500000000 to 13500010000 exist in the first statistics file, directly add in described 3rd statistics file, the data of number 13500010000 to 13500019999 are present in the first statistics file and the second statistics file simultaneously, take out this part statistics in two files, by byte position corresponding for each number with byte or mode generate new statistics and add in described 3rd statistics file.And number 13500020000 to 13500029999 is present in the second statistics file, directly add in described 3rd statistics file.
Export the 3rd statistics file that final merging is later.
It can thus be appreciated that a kind of active users statistical method that the embodiment of the present application provides, when merging statistics, operating procedure is few, saves computational resource.
Corresponding with embodiment of the method, please refer to Fig. 3, Fig. 3 comprises for the structural representation of a kind of active users statistic device that the embodiment of the present application provides:
Presetting module 301, read module 302, modified module 303 and statistical module 304, wherein,
Presetting module 301, for presetting the original state of subscriber directory number to be counted, comprising: the described subscriber directory number number of pressing section to be counted divided into groups; Each organizes the corresponding some bytes of subscriber directory number to be counted, the unique corresponding subscriber directory number in a byte position in a byte; The original state of described subscriber directory number to be counted is corresponding with the initial value of described byte position;
Read module 302, for reading ticket to be counted;
Modified module 303, for being revised as statistic behavior by original state corresponding for the subscriber directory number occurred in described ticket to be counted;
Statistical module 304 is the subscriber directory number number of statistic behavior for adding up state in described subscriber directory number to be counted.
On the basis of embodiment described in Fig. 3, as shown in Figure 4, the initial value of the position of byte described in the present embodiment is 0 to the structural representation of the another kind of active users statistic device that the embodiment of the present application provides,
Described modified module 303 can comprise:
Search unit 401, determining unit 402 and amendment unit 403, wherein,
Search unit 401 for searching byte position corresponding in byte belonging to the subscriber directory number that occurs in described ticket in described some bytes;
Determining unit 402 revises parameter for determining, described revision parameter is indicated by a byte, the value of the byte position of byte position opposite position corresponding with byte belonging to the subscriber directory number occurred in described ticket in described revision parameter is 1, and the value of other byte position is 0;
Amendment unit 403 is for carrying out step-by-step by described revision parameter with the byte belonging to described subscriber directory number or operating that original state corresponding for the subscriber directory number occurred in described ticket is revised as statistic behavior.
Certainly, the initial value of described byte position also can be 1, so, now, described determining unit 402 can also be used for determining to revise parameter, described revision parameter is indicated by a byte, and the value of the byte position of byte position opposite position corresponding with byte belonging to the subscriber directory number occurred in described ticket in described revision parameter is 0, and the value of other byte position is 1; Now, described amendment unit 403 is also for carrying out step-by-step with operation so that original state corresponding for the subscriber directory number occurred in described ticket is revised as statistic behavior by described revision parameter and the byte belonging to described subscriber directory number.
In order to optimize above-described embodiment, in the embodiment of the present application, the initial shape of described subscriber directory number to be counted is stored in Hash file, and described Hash file comprises file header, description block and data block, wherein,
Described file header comprises: file type identifies, and update time the latest, the total size of file, carries out the group number divided into groups by the described subscriber directory number number of pressing section to be counted;
Described description block comprises: the beginning numeral outside number section, telephone number division sign section and end number;
Described data block comprises: the statistics that each group subscriber directory number is corresponding.
Each embodiment in this description all adopts the mode of going forward one by one to describe, between each embodiment identical similar part mutually see, what each embodiment stressed is the difference with other embodiments.The above is only the detailed description of the invention of the application; it should be pointed out that for those skilled in the art, under the prerequisite not departing from the application's principle; can also make some improvements and modifications, these improvements and modifications also should be considered as the protection domain of the application.