CN105989076A - Data statistical method and device - Google Patents

Data statistical method and device Download PDF

Info

Publication number
CN105989076A
CN105989076A CN201510070951.0A CN201510070951A CN105989076A CN 105989076 A CN105989076 A CN 105989076A CN 201510070951 A CN201510070951 A CN 201510070951A CN 105989076 A CN105989076 A CN 105989076A
Authority
CN
China
Prior art keywords
row
dimension
key value
value
achievement data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510070951.0A
Other languages
Chinese (zh)
Other versions
CN105989076B (en
Inventor
沈健
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Tencent Cloud Computing Beijing Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201510070951.0A priority Critical patent/CN105989076B/en
Publication of CN105989076A publication Critical patent/CN105989076A/en
Application granted granted Critical
Publication of CN105989076B publication Critical patent/CN105989076B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Abstract

The embodiment of the invention discloses a data statistical method and a data statistical device. The method comprises the following steps of: acquiring a statistical query request, acquiring a target dimension value corresponding to a query dimension value carried by the statistical query request from a column family of a preset dimension index list, and acquiring a first target rowkey value corresponding to the target dimension value from rowkeys of the dimension index list; acquiring a second target rowkey value corresponding to the first target rowkey value from rowkeys of a preset index data storage list; and merging target index data which correspond to the second target rowkey value in the column family of the index data storage list so as to obtain statistical data corresponding to the statistical query request. By using the data statistical method and the data statistical device, the data volume which needs to be scanned can be reduced so that the efficiency of statistical summary for data is improved.

Description

A kind of data statistical approach and device
Technical field
The present invention relates to field of computer technology, particularly relate to a kind of data statistical approach and device.
Background technology
Storage organization designed at present is mainly the data storage format of flattening, i.e. one log recording pair Answering a line, each field value is respectively stored in respective column.The data being stored current storage organization are entered The process of row statistical summaries can be: scans every a line log recording, and sieves the field meeting condition Select, then the data corresponding to the field that will filter out carry out statistical summaries.Owing to needs scan all of daily record Record, so when the log recording of storage is too much, being greatly increased the data volume needing scanning, reduce Carry out the efficiency of statistical summaries to data.
Content of the invention
The embodiment of the present invention provides a kind of data statistical approach and device, can reduce the data volume needing scanning, To improve the efficiency carrying out statistical summaries to data.
In order to solve above-mentioned technical problem, first aspect present invention provides a kind of data statistical approach, comprising:
Obtaining statistical query request, obtaining in the row race of default dimension concordance list please with described statistical query Seek entrained inquiry dimension values corresponding target dimension value, and obtain in the row of described dimension concordance list is strong The corresponding first object row key value with described target dimension value;
Obtain and described first object row key value corresponding second in the row of default achievement data storage table is strong Target line key value;
By in the row race of described achievement data storage table with the described second corresponding target indicator number of target line key value According to merging, to obtain the corresponding statistics with the request of described statistical query.
Second aspect present invention provides a kind of data statistics device, comprising:
First acquisition module, is used for obtaining statistical query request, obtains in the row race of default dimension concordance list Take the inquiry dimension values corresponding target dimension value entrained with the request of described statistical query, and in described dimension Corresponding first object row key value is obtained with described target dimension value during the row of concordance list is strong;
Second acquisition module, for obtaining and described first mesh in the row of default achievement data storage table is strong The mark corresponding second target line key value of row key value;
Merge module, for by described achievement data storage table row race in described second target line key value pair The target indicator data answered merge, to obtain the corresponding statistics with the request of described statistical query.
The embodiment of the present invention is by obtaining the inquiry dimension entrained with statistical query request in dimension concordance list It is worth corresponding first object row key value, then obtain corresponding with first object row key value in achievement data storage table The second target line key value, achievement data storage table will can refer to the second corresponding target of target line key value Mark data merge, to obtain statistics.Due to only need to by scanning dimension concordance list can be to index Target indicator data in table data store carry out statistical summaries, i.e. without all sweeping all log recordings Retouch, it is possible to reduce the data volume needing scanning, to improve the efficiency carrying out statistical summaries to data.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to enforcement In example or description of the prior art, the accompanying drawing of required use is briefly described, it should be apparent that, describe below In accompanying drawing be only some embodiments of the present invention, for those of ordinary skill in the art, do not paying On the premise of going out creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the schematic flow sheet of a kind of data statistical approach that the embodiment of the present invention provides;
Fig. 2 is the schematic flow sheet of the another kind of data statistical approach that the embodiment of the present invention provides;
Fig. 3 is the schematic flow sheet of a kind of data-updating method that the embodiment of the present invention provides;
Fig. 4 is the structural representation of a kind of data statistics device that the embodiment of the present invention provides;
Fig. 5 is the structural representation of the another kind of data statistics device that the embodiment of the present invention provides;
Fig. 6 is the structural representation of a kind of first acquisition module that the embodiment of the present invention provides;
Fig. 7 is the structural representation of a kind of second acquisition module that the embodiment of the present invention provides;
Fig. 8 is a kind of structural representation merging module that the embodiment of the present invention provides;
Fig. 9 is the structural representation of another data statistics device that the embodiment of the present invention provides.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clearly It Chu, is fully described by, it is clear that described embodiment is only a part of embodiment of the present invention, rather than Whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art are not making creation Property work under the premise of the every other embodiment that obtained, broadly fall into the scope of protection of the invention.
Refer to Fig. 1, be the schematic flow sheet of a kind of data statistical approach that the embodiment of the present invention provides, described Method may include that
S101, obtains statistical query request, obtains and described statistics in the row race of default dimension concordance list Inquiry dimension values corresponding target dimension value entrained by inquiry request, and strong at the row of described dimension concordance list Middle acquisition is worth corresponding first object row key value with described target dimension;
Concrete, when data statistics device receives statistical query request, described data statistics device is permissible The inquiry dimension values pair entrained with the request of described statistical query is obtained in the row race of default dimension concordance list The target dimension value answered, and obtain corresponding with described target dimension value in the row of described dimension concordance list is strong First object row key value.Wherein, described dimension concordance list is based on HBase (Hadoop Database, Hadoop Database) database created, and the row race of described dimension concordance list includes at least one dimension row name, institute State each dimension row name at least one dimension values corresponding respectively, described dimension concordance list at least one dimension row name Row strong include at least one the first row key value, at least one the first row key value described is according to described at least one Individual dimension values is calculated.The concrete structure of described dimension concordance list may refer to such as table 1 below:
Table 1
Wherein, the dimension1 ... in table 1, dimensionM are the row race of described dimension concordance list and are wrapped The dimension row name including, each dimension row name at least one dimension values all corresponding, such as the dimension for dimension1 The corresponding dimension values of row name includes Bytes_1, Bytes_2 ..., Bytes_n;Each dimension values in table 1 It is in log recording and needs the field of alternative condition, i.e. log recording will need when storing log recording The field of alternative condition as dimension values to store in described dimension concordance list;The first row key value in table 1 Including xxx_hashcode_1, xxx_hashcode_2 ..., xxx_hashcode_n, in the first row key value Hashcode_n is the cryptographic Hash that calculated after splicing dimension values combination;Xxx in the first row key value For three, the tail of hashcode_n, owing in HBase, table is with lexcographical order storage, so in the first row key value Xxx design the data of storage can be made to be more evenly distributed, with improve write and inquiry concurrency.Wherein The circular of hashcode_n can be:
Hashcode_n=(dim [1]+" t "+dim [2]+" t "+...+dim [M) .hashCode ()
Wherein, dim [1] represents the dimension values of the line n in this row of dimension1, and dim [2] represents dimension2 The dimension values of the line n in this row, dim [M] represents the dimension of the line n in this row of dimensionM Value.As a example by xxx_hashcode_1, the value of hashcode_1 by with xxx_hashcode_1 with a line Each dimension values carries out splicing calculated.
As a example by above-mentioned table 1, if finding in Table 1 entrained by Bytes_2 and the request of described statistical query Inquiry dimension values is identical, then may determine that Bytes_2 is target dimension value, then obtains in the row of table 1 is strong Bytes_2 corresponding first object row key value, described first object row key value is xxx_hashcode_2.
S102, obtains corresponding with described first object row key value in the row of default achievement data storage table is strong The second target line key value;
Concrete, after getting described first object row key value, described data statistics device can be in advance If the row of achievement data storage table strong in obtain and be good for corresponding second target line of described first object row key value Value.Wherein, described achievement data storage table is created based on HBase database, described achievement data The row race of storage table includes at least one index row name, each index row status at least one index row name described At least one achievement data not corresponding, the row of described achievement data storage table is strong includes that at least one second row is good for Value, at least one second row key value described is calculated according at least one dimension values described and time value, Described time value is to be divided according to default time granularity.The concrete structure of described achievement data storage table May refer to such as table 2 below:
Table 2
Wherein, the index row name included by race for the row in table 2 includes: the 0000th, the 0001st ..., 1023, each refers to Mark row name at least one achievement data corresponding, the achievement data bag as corresponding to the index row name for " 0000 " Include: SliceIndex_1, SliceIndex_2 ..., SliceIndex_n;Each achievement data in table 2 is day In will record for the numerical value added up, i.e. when storing log recording by log recording be used for into Row statistics numerical value as achievement data to store described achievement data storage table, described achievement data be with The storage of Thrift structure sequenceization, Thrift is a software frame and expansible and across language for carrying out The exploitation of service, described achievement data can include overall situation counting index, estimate number of users index array, meter Number index, duplicate removal index, total index, maximum index, minimum of a value index and mean value index etc.. The second row key value in table 2 include xxx_hashcode_1_time, xxx_hashcode_2_time ..., Xxx_hashcode_n_time, wherein, the structure of xxx_hashcode_n and calculation and the first row key value Identical, it is not discussed here.Time in second row key value is time value, described time value be according to The time granularity preset is divided, if time granularity is 1 hour, then time can be divided into: the 0000th, 0100、0200、…、2300;If time granularity is 15 minutes, then time can be divided into: the 0000th, the 0015th, 0030、…、2345.For example, if with 1 hour as time granularity, then the second row key value can include Xxx_hashcode_1_0100, xxx_hashcode_2_0200 etc..
Wherein, described statistical query request also includes timing statistics scope, is getting the second target line key value Before, need first to enumerate time value according to described timing statistics scope and default time granularity, if described Timing statistics scope is whole day and time granularity is 1 hour, then enumerate the time value obtaining and include the 0000th, 0100th, the 0200th ..., 2300, then by the described first object row key value getting with enumerated each when Between value splicing;If described first object row key value is xxx_hashcode_2, then described first object row key value With enumerated each time value splicing after, can obtain row key value to be checked: xxx_hashcode_2_0000, xxx_hashcode_2_0100、…、xxx_hashcode_2_2300;Again in described achievement data storage table Search and any one the second identical row key value in multiple row key values to be checked, and the second row that will find out Key value is defined as the second target line key value.
S103, by the row race of described achievement data storage table with the described second corresponding target of target line key value Achievement data merges, to obtain the corresponding statistics with the request of described statistical query;
Concrete, after getting described second target line key value, described data statistics device can first by The row race of described achievement data storage table is carried out with the described second corresponding target indicator data of target line key value Row merges, and is expert at and can merge two-by-two successively from top to bottom when merging, until row is merged into remaining a line Target indicator data, then the target indicator data after row merging are entered ranks merging, also permissible when row merge Merge two-by-two successively, available corresponding with the request of described statistical query when row are merged into surplus next column Statistics.
The embodiment of the present invention is by obtaining the inquiry dimension entrained with statistical query request in dimension concordance list It is worth corresponding first object row key value, then obtain corresponding with first object row key value in achievement data storage table The second target line key value, achievement data storage table will can refer to the second corresponding target of target line key value Mark data merge, to obtain statistics.Due to the dimension values in log recording and achievement data respectively Be stored in dimension concordance list and achievement data storage table, and only need to by scanning dimension concordance list can be to finger Target indicator data in mark table data store carry out statistical summaries, i.e. without all carrying out all log recordings Scanning, it is possible to reduce the data volume needing scanning, to improve the efficiency carrying out statistical summaries to data.
Refer to Fig. 2, be the schematic flow sheet of the another kind of data statistical approach that the embodiment of the present invention provides, institute The method of stating may include that
S201, presets dimension concordance list and achievement data storage table according to HBase database;
Concrete, data statistics device can preset dimension concordance list and achievement data according to HBase database Storage table.It is dimension values by the field definition needing alternative condition in log recording, and log recording will be used It is defined as achievement data in the numerical value added up.Dimension values is stored in described dimension by described data statistics device In the row race of degree concordance list, achievement data is stored in the row race of described achievement data storage table.
Wherein, the row race of described dimension concordance list includes at least one dimension row name, at least one dimension described Each dimension row name at least one dimension values corresponding respectively in row name, the row of described dimension concordance list is strong to be included at least One the first row key value, at least one the first row key value described is to calculate according at least one dimension values described Arrive.The row race of described achievement data storage table includes at least one index row name, at least one index described Each index row name at least one achievement data corresponding respectively in row name, the strong bag of row of described achievement data storage table Including at least one second row key value, at least one second row key value described is according at least one dimension values described Calculated with time value, described time value is to be divided according to default time granularity.
The concrete structure of described dimension concordance list may refer to the table 1 in the corresponding embodiment of Fig. 1, in table 1 Dimension1 ..., dimensionM are dimension row name included by race for the row of described dimension concordance list, Each dimension row name at least one dimension values all corresponding, such as the corresponding dimension of dimension row name for dimension1 Value includes Bytes_1, Bytes_2 ..., Bytes_n;Each dimension values in table 1 is in log recording Need the field of alternative condition, i.e. log recording will need the field of alternative condition when storing log recording As dimension values to store in described dimension concordance list;The first row key value in table 1 includes Xxx_hashcode_1, xxx_hashcode_2 ..., xxx_hashcode_n, in the first row key value Hashcode_n is the cryptographic Hash that calculated after splicing dimension values combination;Xxx in the first row key value For three, the tail of hashcode_n, owing in HBase, table is with lexcographical order storage, so in the first row key value Xxx design the data of storage can be made to be more evenly distributed, with improve write and inquiry concurrency.Wherein The circular of hashcode_n can be:
Hashcode_n=(dim [1]+" t "+dim [2]+" t "+...+dim [M]) .hashCode ()
Wherein, dim [1] represents the dimension values of the line n in this row of dimension1, and dim [2] represents dimension2 The dimension values of the line n in this row, dim [M] represents the dimension of the line n in this row of dimensionM Value.As a example by xxx_hashcode_1, the value of hashcode_1 by with xxx_hashcode_1 with a line Each dimension values carries out splicing calculated.
The concrete structure of described achievement data storage table may refer to the table 2 in the corresponding embodiment of Fig. 1, in table 2 Index row name included by race for the row include: the 0000th, the 0001st ..., 1023, each index row name correspondence is extremely A few achievement data, the achievement data as corresponding to the index row name for " 0000 " includes: SliceIndex_1, SliceIndex_2、…、SliceIndex_n;Each achievement data in table 2 be in log recording for entering The numerical value of row statistics, i.e. when storing log recording numerical value that being used in log recording is added up as Achievement data is to store described achievement data storage table, and described achievement data is to deposit with Thrift structure sequence Storage, described achievement data can include number of users, number of times, go weight values, count value, aggregate value, maximum Value, minimum of a value, mean value etc..The second row key value in table 2 include xxx_hashcode_1_time, Xxx_hashcode_2_time ..., xxx_hashcode_n_time, wherein, the knot of xxx_hashcode_n Structure is identical with the first row key value with calculation, is not discussed here.Time in second row key value is Time value, described time value is to be divided according to default time granularity, if time granularity is 1 hour, Then time can be divided into: the 0000th, the 0100th, the 0200th ..., 2300;If time granularity is 15 minutes, then Time can be divided into: the 0000th, the 0015th, the 0030th ..., 2345.For example, if with 1 hour as time Granularity, then the second row key value can include xxx_hashcode_1_0100, xxx_hashcode_2_0200 etc. Deng.
S202, obtains statistical query request, and the request of described statistical query carries inquiry dimension values;
Concrete, when described data statistics device receives statistical query request, described statistics can be obtained Inquiry dimension values entrained by inquiry request.
S203, obtains at least one dimension values described in described dimension concordance list and described inquiry dimension It is worth corresponding dimension values, and target dimension value will be defined as with the described inquiry corresponding dimension values of dimension values;
Concrete, after described data statistics device gets inquiry dimension values, can index in described dimension At least one dimension values described in table obtains and the described inquiry corresponding dimension values of dimension values, and will be with institute State the inquiry corresponding dimension values of dimension values and be defined as target dimension value.With the table 1 in the corresponding embodiment of above-mentioned Fig. 1 As a example by, inquiry dimension values includes dimension values A and dimension values B, if finding and dimension in the row race of table 1 The identical dimension values of value A is Bytes_1, and the dimension values identical with dimension values B is Bytes_2, then by Bytes_1 It is defined as target dimension value with Bytes_2.
S204, obtains and described target at least one the first row key value described in described dimension concordance list Dimension values corresponding the first row key value, and corresponding the first row key value will be defined as with described target dimension value One target line key value;
Concrete, after described data statistics device determines target dimension value, can index in described dimension At least one the first row key value described in table obtains the corresponding the first row key value with described target dimension value, And corresponding the first row key value will be defined as first object row key value with described target dimension value.Again with above-mentioned table 1 As a example by, however, it is determined that the target dimension value going out is Bytes_1 and Bytes_2, then can obtain in the row of table 1 is strong Getting with Bytes_1 corresponding the first row key value is xxx_hashcode_1, and Bytes_2 corresponding first Row key value is xxx_hashcode_2, and xxx_hashcode_1 and xxx_hashcode_2 is defined as first Target line key value.
S205, the timing statistics scope according to entrained by described time granularity and described statistical query are asked, raw Become at least one object time value in the range of described timing statistics;
Concrete, the request of described statistical query also includes timing statistics scope.Described data statistics device determines After going out first object row key value, can be according to entrained by described time granularity and described statistical query be asked Timing statistics scope, generates at least one object time value in the range of described timing statistics.For example, if Timing statistics scope is whole day, and the time granularity preset is 1 hour, then the object time value being generated Including the 0000th, the 0100th, the 0200th ..., 2300, i.e. the time value of 0:00 to 23:00.
S206, by least one object time described be worth in each object time value respectively with described first object row Key value is spliced, to obtain row key value to be checked;
Concrete, after described data statistics device generates at least one object time value, can by described extremely In a few object time value, each object time value is spliced with described first object row key value respectively, with To row key value to be checked.For example, if described timing statistics scope is whole day and time granularity is 1 hour, And described first object row key value is xxx_hashcode_2, then at least one object time value described includes 0000th, the 0100th, the 0200th ..., 2300, then by xxx_hashcode_2 respectively with at least one target described In time value, each object time value is spliced, and can obtain row key value to be checked, described row key value to be checked Including: xxx_hashcode_2_0000, xxx_hashcode_2_0100 ..., xxx_hashcode_2_2300.
S207, described in described achievement data storage table, at least one second row key value obtains with described The corresponding second row key value of row key value to be checked, and will with described row key value to be checked corresponding second row key value It is defined as the second target line key value;
Concrete, after described data statistics device obtains row key value to be checked, can be at described achievement data At least one second row key value described in storage table obtains and corresponding second row of described row key value to be checked Key value, and corresponding second row key value will be defined as the second target line key value with described row key value to be checked.Example As described row key value to be checked includes: xxx_hashcode_2_0000, xxx_hashcode_2_0100 ..., Xxx_hashcode_2_2300, searches whether exist and described row to be checked in described achievement data storage table The second identical row key value of key value, is good for described row to be checked if finding in described achievement data storage table The second identical row key value of value includes xxx_hashcode_2_0000 and xxx_hashcode_2_0100, then will Xxx_hashcode_2_0000 and xxx_hashcode_2_0100 in described achievement data storage table is defined as Second target line key value.
S208, obtains at least one achievement data described in described achievement data storage table and described the The two corresponding achievement datas of target line key value, and will be true with the described second corresponding achievement data of target line key value It is set to target indicator data;
Concrete, after described data statistics device determines the second target line key value, can be in described index At least one achievement data described in table data store obtains and the described second corresponding finger of target line key value Mark data, and target indicator data will be defined as with the described second corresponding achievement data of target line key value.With As a example by table 2 in the corresponding embodiment of above-mentioned Fig. 1, if described second target line key value includes: Xxx_hashcode_1_time and xxx_hashcode_2_time, then same with xxx_hashcode_1_time Each row achievement data of row is target indicator data, and arranges with each of a line with xxx_hashcode_2_time Achievement data is also target indicator data.
Described target indicator data are entered every trade and are merged by S209, and enter the target indicator data after row merging Ranks merge, to obtain the corresponding statistics with the request of described statistical query;
Concrete, after described data statistics device determines target indicator data, can be first by described target Achievement data enters every trade and merges, and is expert at and can depend on the target indicator data of each column respectively from top to bottom when merging Secondary merge two-by-two, until row is merged into remaining a line target indicator data.Refer to above-mentioned Fig. 1 pair again Answer the table 2 in embodiment, enter every trade with SliceIndex_1 and SliceIndex_2 and merge into example, if SliceIndex_1 and SliceIndex_2 all includes overall situation counting index, estimates number of users index array, counting Index, duplicate removal index, total index, maximum index, minimum of a value index and mean value index are then right The detailed process that SliceIndex_1 and SliceIndex_2 enters every trade merging includes:
The merging of overall situation counting index: sliceIndex1.countTotal+sliceIndex2.countTotal;
The merging of estimation number of users index array:
The merging of counting index: sliceIndex1.countIndex+sliceIndex2.countIndex;
The merging of duplicate removal index: sliceIndex1.distincts.addAll (sliceIndex2.distincts) (set merges);
Add up to the merging of index: sliceIndex1.sum+sliceIndex2.sum;
The merging of minimum of a value index: Min (sliceIndex1.min, sliceIndex2.min);
The merging of minimum of a value index: Max (sliceIndex1.max, sliceIndex2.max);
The merging of mean value index:
(sliceIndex1.avg*sliceIndex1.countIndex1+sliceIndex2.avg*sliceIndex2.countIndex2 )/(sliceIndex1.countIndex1+sliceIndex2.countIndex2)。
Wherein, the process that the target indicator data after merging row enter ranks merging is similar with the process that row merges, Target indicator data after i.e. can merging row in each row when row merge merge successively two-by-two, until Row are merged into surplus next total target indicator data, and described total target indicator data are and described statistics The corresponding statistics of inquiry request.Wherein, row merge and row merges different place and is only that to estimation The merging of number of users index array, needed first according to the estimation user after row merging in each row before arranging merging Number this variograph of index array calculates a number of users numerical value, more each column count that directly adds up when row merge The number of users numerical value going out can count the total of estimation this desired value of number of users of inquiry under certain dimension condition Amount, arranges, for other indexs, the process merging identical with the process that row merges.Wherein, according to estimation user The detailed process that number this variograph of index array calculates a number of users numerical value can be:
Wherein, userCount is the number of users numerical value calculating.
The embodiment of the present invention is by obtaining the inquiry dimension entrained with statistical query request in dimension concordance list It is worth corresponding first object row key value, then obtain corresponding with first object row key value in achievement data storage table The second target line key value, achievement data storage table will can refer to the second corresponding target of target line key value Mark data merge, to obtain statistics.Due to the dimension values in log recording and achievement data respectively Be stored in dimension concordance list and achievement data storage table, and only need to by scanning dimension concordance list can be to finger Target indicator data in mark table data store carry out statistical summaries, i.e. without all carrying out all log recordings Scanning, it is possible to reduce the data volume needing scanning, to improve the efficiency carrying out statistical summaries to data.
Refer to Fig. 3, be the schematic flow sheet of a kind of data-updating method that the embodiment of the present invention provides, described Method may include that
S301, when getting log information, extracts the daily record index number in described log information According to, daily record dimension values, temporal information and critical field;
Concrete, before S301 step, first preset dimension concordance list and achievement data storage table, preset dimension Degree concordance list may refer in the corresponding embodiment of above-mentioned Fig. 2 with the specific implementation of achievement data storage table S201, is not discussed here.S301 step can S101-S103 in the corresponding embodiment of above-mentioned Fig. 1 Any instant in step performs, or can be in the S202-S209 step in the corresponding embodiment of above-mentioned Fig. 2 Any instant perform.
When described data statistics device gets log information, described log information can be extracted In daily record achievement data, daily record dimension values, temporal information and critical field.Wherein, described daily record refers to Mark data can include the overall situation counting index, estimate number of users index array, counting index, duplicate removal index, Add up at least one in index, maximum index, minimum of a value index and mean value index.The described time Information is for generating the time point of described log information.
S302, generates the first row key value to be updated according to described daily record dimension values, and according to described daily record dimension Value is updated with described first corresponding each dimension values of row key value to be updated in described dimension concordance list;
Concrete, described data statistics device extracts daily record achievement data, daily record dimension values, temporal information And after critical field, described daily record dimension values can be put into dim array, calculate this dim array The hashcode_n of splicing, to obtain the corresponding first row key value xxx_hashcode_n to be updated, xxx is Three, the tail of hashcode_n, the circular of described hashcode_n can be:
Hashcode_n=(dim [1]+" t "+dim [2]+" t "+...+dim [M]) .hashCode ();
As a example by table 1 in the corresponding embodiment of above-mentioned Fig. 1, the dim [1] in the computing formula of hashcode_n represents The dimension values of the line n in this row of dimension1, dim [2] represents n-th in this row of dimension2 The dimension values of row, dim [M] represents the dimension values of the line n in this row of dimensionM.Again with As a example by xxx_hashcode_1, the value of hashcode_1 be by with xxx_hashcode_1 with a line each dimension Angle value carries out splicing calculated.If the first row key value to be updated being generated is xxx_hashcode_1, Then each daily record dimension values is respectively written in table 1 with xxx_hashcode_1 with in each dimension values of a line, To complete to the operation being updated with described first corresponding each dimension values of row key value to be updated, wherein, Owing to each daily record dimension values is identical with each dimension values of a line with xxx_hashcode_1, so write is each The process of individual daily record dimension values is to be covered with each dimension values of a line again to xxx_hashcode_1 Lid, or, it is also possible to without being updated operation to xxx_hashcode_1 with each dimension values of a line.
Alternatively, if the described first row key value to be updated cannot be found in described dimension concordance list, then may be used To increase a line in described dimension concordance list newly, to tie up described first row key value to be updated and each daily record In the newly-increased row of angle value write.
S303, generates the second row key value to be updated, and root according to described daily record dimension values and described temporal information Generate index row name to be updated according to described critical field, and according to described daily record achievement data to described index number Enter with described second row key value to be updated and the corresponding achievement data of described index row name to be updated according in storage table Row updates;
Concrete, S303 step can be carried out with the synchronization of S302 step.Described data statistics device can basis Described daily record dimension values generates xxx_hashcode_n, raw further according to temporal information and default time granularity Become time, xxx_hashcode_n and time is carried out splicing and i.e. can get the second row key value to be updated xxx_hashcode_n_time.Generate index row name to be updated further according to described critical field, generate to be updated The detailed process of index row name is: calculate the murmur cryptographic Hash (64 is integer value) of described critical field, I.e. can get index row name (0000~1023) to be updated after right shift 48 divided by 64, i.e. calculate and treat more The method of New Set row name is: (murmurhash (key) > > 48)/64.Generating the second row key value to be updated After index row name to be updated, can be according to described daily record achievement data in described achievement data storage table It is updated with described second row key value to be updated and the corresponding achievement data of described index row name to be updated.With As a example by table 2 in the corresponding embodiment of above-mentioned Fig. 1, if described second row key value to be updated is Xxx_hashcode_1_time, described index to be updated row entitled 0000, then can be according to described daily record index SliceIndex_1 is updated by data;Wherein, the process being updated SliceIndex_1 may include that Renewal to overall situation counting index countTotal: countTotal+1;
Renewal to estimating number of users index array buckets:
P=(murmurhash (key) > > 48) %64
The leading zero number of zeroNum=(murmurhash (key) < < 16)
if(buckets.get(p)<zeroNum)then bucket.set(p,zeroNum);
If i-th: countIndex.get (i)+1 that the index updating is counting index;
If i-th: distincts.get (i) .set (v) that the index updating is duplicate removal index;
If the index updating is for adding up to i-th: sum.get (i)+v of index;
If the index updating is minimum of a value index i-th: Min (min.get (i), v);
If the index updating is maximum index i-th: Max (max.get (i), v);
I-th that if the index updating is mean value index: (avg.get(i)*(countTotal.get(i)-1)+v)/countTotal.get(i)。
Alternatively, if the second row key value to be updated cannot be found in described achievement data storage table, then may be used With newly-increased a line in described achievement data storage table, with what described second row key value to be updated write was increased newly In row, and in this newly-increased row and the described daily record of corresponding position write with described index row name to be updated Achievement data.
The embodiment of the present invention by extract daily record achievement data in described log information, daily record dimension values, Temporal information and critical field, can be updated and to index to corresponding dimension values in dimension concordance list In table data store, corresponding achievement data is updated, and can only update during due to write log information The dimension values of relevant position and achievement data, i.e. can avoid often writing a log information will increase Article one, line number, so effectively reducing storage overhead;Meanwhile, by updating the side of dimension values and achievement data Formula writes log information, it is also possible to reduce the data volume of required scanning when statistical query, to enter one Step improves the efficiency carrying out statistical summaries to data.
Refer to Fig. 4, be the structural representation of a kind of data statistics device that the embodiment of the present invention provides, described Data statistics device 1 may include that the 20th, the first acquisition module the 10th, the second acquisition module merges module 30;
Described first acquisition module 10, is used for obtaining statistical query request, at the row of default dimension concordance list Race obtains the inquiry dimension values corresponding target dimension value entrained with the request of described statistical query, and in institute State the strong middle acquisition of row and the described target dimension value corresponding first object row key value of dimension concordance list;
Concrete, when data statistics device 1 receives statistical query request, described first acquisition module 10 The inquiry dimension entrained with the request of described statistical query can be obtained in the row race of default dimension concordance list It is worth corresponding target dimension value, and obtain and described target dimension value pair in the row of described dimension concordance list is strong The first object row key value answered.Wherein, described dimension concordance list is created based on HBase database, The row race of described dimension concordance list includes at least one dimension row name, respectively ties up at least one dimension row name described Degree row name at least one dimension values corresponding respectively, the row of described dimension concordance list is strong includes at least one the first row Key value, at least one the first row key value described is calculated according at least one dimension values described.Again please Seeing the table 1 in the corresponding embodiment of above-mentioned Fig. 1, the concrete structure that table 1 is described dimension concordance list, in table 1 Dimension1 ..., dimensionM be the dimension row name included by race for the row of described dimension concordance list, Each dimension row name at least one dimension values all corresponding, such as the corresponding dimension of dimension row name for dimension1 Value includes Bytes_1, Bytes_2 ..., Bytes_n;Each dimension values in table 1 is in log recording Need the field of alternative condition, i.e. log recording will need the field of alternative condition when storing log recording As dimension values to store in described dimension concordance list;The first row key value in table 1 includes Xxx_hashcode_1, xxx_hashcode_2 ..., xxx_hashcode_n, in the first row key value Hashcode_n is the cryptographic Hash that calculated after splicing dimension values combination;Xxx in the first row key value For three, the tail of hashcode_n, owing in HBase, table is with lexcographical order storage, so in the first row key value Xxx design the data of storage can be made to be more evenly distributed, with improve write and inquiry concurrency.Wherein The circular of hashcode_n can be:
Hashcode_n=(dim [1]+" t "+dim [2]+" t "+...+dim [M]) .hashCode ()
Wherein, dim [1] represents the dimension values of the line n in this row of dimension1, and dim [2] represents dimension2 The dimension values of the line n in this row, dim [M] represents the dimension of the line n in this row of dimensionM Value.As a example by xxx_hashcode_1, the value of hashcode_1 by with xxx_hashcode_1 with a line Each dimension values carries out splicing calculated.
As a example by above-mentioned table 1, if finding in Table 1 entrained by Bytes_2 and the request of described statistical query Inquiry dimension values is identical, then described first acquisition module 10 may determine that Bytes_2 is target dimension value, institute State the first acquisition module 10 and in the row of table 1 is strong, obtain Bytes_2 corresponding first object row key value, institute again Stating first object row key value is xxx_hashcode_2.
Described second acquisition module 20, for obtain in the row of default achievement data storage table is strong with described The corresponding second target line key value of first object row key value;
Concrete, after described first acquisition module 10 gets described first object row key value, described second obtains Delivery block 20 can obtain and described first object row key value pair in the row of default achievement data storage table is strong The the second target line key value answered.Wherein, described achievement data storage table is to be created based on HBase database , the row race of described achievement data storage table includes at least one index row name, and at least one index described arranges Each index row name at least one achievement data corresponding respectively in Ming, the row of described achievement data storage table is strong to be included At least one second row key value, at least one second row key value described be according at least one dimension values described and Time value is calculated, and described time value is to be divided according to default time granularity.Described index number May refer to the table 2 in the corresponding embodiment of above-mentioned Fig. 1 according to the concrete structure of storage table, wherein, row in table 2 Index row name included by race includes: the 0000th, the 0001st ..., 1023, each index row name correspondence at least one Individual achievement data, the achievement data as corresponding to the index row name for " 0000 " includes: SliceIndex_1, SliceIndex_2、…、SliceIndex_n;Each achievement data in table 2 be in log recording for entering The numerical value of row statistics, i.e. when storing log recording numerical value that being used in log recording is added up as Achievement data is to store described achievement data storage table, and described achievement data is to deposit with Thrift structure sequence Storage, Thrift is a software frame and for carrying out the exploitation of expansible and service across language, described finger Mark data can include the overall situation counting index, estimate number of users index array, counting index, duplicate removal index, Add up to index, maximum index, minimum of a value index and mean value index etc..The second row in table 2 is good for Value includes xxx_hashcode_1_time, xxx_hashcode_2_time ..., xxx_hashcode_n_time, Wherein, the structure of xxx_hashcode_n is identical with the first row key value with calculation, no longer goes to live in the household of one's in-laws on getting married here State.Time in second row key value is time value, and described time value is to be divided according to default time granularity , if time granularity is 1 hour, then time can be divided into: the 0000th, the 0100th, the 0200th ..., 2300; If time granularity is 15 minutes, then time can be divided into: the 0000th, the 0015th, the 0030th ..., 2345.Example As, if with 1 hour as time granularity, then the second row key value can include xxx_hashcode_1_0100, Xxx_hashcode_2_0200 etc..
Wherein, described statistical query request also includes timing statistics scope, and described second acquisition module 20 is obtaining Before getting the second target line key value, need first according to described timing statistics scope and default time granularity Enumerate time value, if described timing statistics scope is whole day and time granularity is 1 hour, then described second Acquisition module 20 is enumerated the time value obtaining and is included the 0000th, the 0100th, the 0200th ..., 2300, then will get Described first object row key value with enumerated each time value splicing;If described first object row key value is Xxx_hashcode_2, then first object row key value described in described second acquisition module 20 with enumerated each Time value splicing after, can obtain row key value to be checked: xxx_hashcode_2_0000, xxx_hashcode_2_0100、…、xxx_hashcode_2_2300;Described second acquisition module 20 exists again Described achievement data storage table is searched and is good for any one the second identical row in multiple row key values to be checked Value, and the second row key value finding out is defined as the second target line key value.
Described merging module 30, for by described achievement data storage table row race in described second target line Key value corresponding target indicator data merge, to obtain the corresponding statistical number with the request of described statistical query According to;
Concrete, after described second acquisition module 20 gets described second target line key value, described conjunction And module 30 can first by described achievement data storage table row race in corresponding with described second target line key value Target indicator data are entered every trade and are merged, and are expert at and can merge two-by-two successively from top to bottom when merging, until Row is merged into remaining a line target indicator data, then the target indicator data after row merging are entered ranks merging, Also can merge two-by-two successively when row merge, available and described when row are merged into surplus next column The corresponding statistics of statistical query request.
The embodiment of the present invention is by obtaining the inquiry dimension entrained with statistical query request in dimension concordance list It is worth corresponding first object row key value, then obtain corresponding with first object row key value in achievement data storage table The second target line key value, achievement data storage table will can refer to the second corresponding target of target line key value Mark data merge, to obtain statistics.Due to the dimension values in log recording and achievement data respectively Be stored in dimension concordance list and achievement data storage table, and only need to by scanning dimension concordance list can be to finger Target indicator data in mark table data store carry out statistical summaries, i.e. without all carrying out all log recordings Scanning, it is possible to reduce the data volume needing scanning, to improve the efficiency carrying out statistical summaries to data.
Refer to Fig. 5 again, be the structural representation of the another kind of data statistics device that the embodiment of the present invention provides, Described data statistics device 1 can include the first acquisition module in the corresponding embodiment of above-mentioned Fig. 4 the 10th, second The 20th, acquisition module merges module 30, and further, described data statistics device 1 can also include: preset Module the 40th, information extraction modules the 50th, the first more new module the 60th, the second more new module 70;
Described presetting module 40, for presetting dimension concordance list and achievement data storage according to HBase database Table;
Concrete, described presetting module 40 can preset dimension concordance list and index number according to HBase database According to storage table.It is dimension values by the field definition needing alternative condition in log recording, and by log recording Numerical value for adding up is defined as achievement data.Dimension values is stored in institute by described data statistics device 1 State in the row race of dimension concordance list, achievement data is stored in the row race of described achievement data storage table.
Wherein, the row race of described dimension concordance list includes at least one dimension row name, at least one dimension described Each dimension row name at least one dimension values corresponding respectively in row name, the row of described dimension concordance list is strong to be included at least One the first row key value, at least one the first row key value described is to calculate according at least one dimension values described Arrive.The row race of described achievement data storage table includes at least one index row name, at least one index described Each index row name at least one achievement data corresponding respectively in row name, the strong bag of row of described achievement data storage table Including at least one second row key value, at least one second row key value described is according at least one dimension values described Calculated with time value, described time value is to be divided according to default time granularity.
The concrete structure of described dimension concordance list may refer to the table 1 in the corresponding embodiment of Fig. 1, in table 1 Dimension1 ..., dimensionM are dimension row name included by race for the row of described dimension concordance list, Each dimension row name at least one dimension values all corresponding, such as the corresponding dimension of dimension row name for dimension1 Value includes Bytes_1, Bytes_2 ..., Bytes_n;Each dimension values in table 1 is in log recording Need the field of alternative condition, i.e. log recording will need the field of alternative condition when storing log recording As dimension values to store in described dimension concordance list;The first row key value in table 1 includes Xxx_hashcode_1, xxx_hashcode_2 ..., xxx_hashcode_n, in the first row key value Hashcode_n is the cryptographic Hash that calculated after splicing dimension values combination;Xxx in the first row key value For three, the tail of hashcode_n, owing in HBase, table is with lexcographical order storage, so in the first row key value Xxx design the data of storage can be made to be more evenly distributed, with improve write and inquiry concurrency.Wherein The circular of hashcode_n can be:
Hashcode_n=(dim [1]+" t "+dim [2]+" t "+...+dim [M]) .hashCode ()
Wherein, dim [1] represents the dimension values of the line n in this row of dimension1, and dim [2] represents dimension2 The dimension values of the line n in this row, dim [M] represents the dimension of the line n in this row of dimensionM Value.As a example by xxx_hashcode_1, the value of hashcode_1 by with xxx_hashcode_1 with a line Each dimension values carries out splicing calculated.
The concrete structure of described achievement data storage table may refer to the table 2 in the corresponding embodiment of Fig. 1, in table 2 Index row name included by race for the row include: the 0000th, the 0001st ..., 1023, each index row name correspondence is extremely A few achievement data, the achievement data as corresponding to the index row name for " 0000 " includes: SliceIndex_1, SliceIndex_2、…、SliceIndex_n;Each achievement data in table 2 be in log recording for entering The numerical value of row statistics, i.e. when storing log recording numerical value that being used in log recording is added up as Achievement data is to store described achievement data storage table, and described achievement data is to deposit with Thrift structure sequence Storage, described achievement data can include number of users, number of times, go weight values, count value, aggregate value, maximum Value, minimum of a value, mean value etc..The second row key value in table 2 include xxx_hashcode_1_time, Xxx_hashcode_2_time ..., xxx_hashcode_n_time, wherein, the knot of xxx_hashcode_n Structure is identical with the first row key value with calculation, is not discussed here.Time in second row key value is Time value, described time value is to be divided according to default time granularity, if time granularity is 1 hour, Then time can be divided into: the 0000th, the 0100th, the 0200th ..., 2300;If time granularity is 15 minutes, then Time can be divided into: the 0000th, the 0015th, the 0030th ..., 2345.For example, if with 1 hour as time Granularity, then the second row key value can include xxx_hashcode_1_0100, xxx_hashcode_2_0200 etc. Deng.
Described information extraction modules 50, for when getting log information, extracts described log recording Daily record achievement data in information, daily record dimension values, temporal information and critical field;
Concrete, when described information extraction modules 50 gets log information, described information extraction mould Block 50 can extract the daily record achievement data in described log information, daily record dimension values, temporal information with And critical field.Wherein, described daily record achievement data can include overall situation counting index, estimate that number of users refers to Mark array, counting index, duplicate removal index, add up to index, maximum index, minimum of a value index and average At least one in value index.Described temporal information is for generating the time point of described log information.
Described first more new module 60, for generating the first row key value to be updated according to described daily record dimension values, And according to described daily record dimension values to corresponding respectively with described first row key value to be updated in described dimension concordance list Individual dimension values is updated;
Concrete, described information extraction modules 50 extracts daily record achievement data, daily record dimension values, time letter After breath and critical field, described daily record dimension values can be put into dim number by the described first more new module 60 Group, calculates the hashcode_n of this dim array splicing, to obtain the corresponding first row key value to be updated Xxx_hashcode_n, xxx are three, the tail of hashcode_n, the circular of described hashcode_n Can be:
Hashcode_n=(dim [1]+" t "+dim [2]+" t "+...+dim [M]) .hashCode ();
As a example by table 1 in the corresponding embodiment of above-mentioned Fig. 1, the dim [1] in the computing formula of hashcode_n represents The dimension values of the line n in this row of dimension1, dim [2] represents n-th in this row of dimension2 The dimension values of row, dim [M] represents the dimension values of the line n in this row of dimensionM.If described first The first row key value to be updated that more new module 60 is generated is xxx_hashcode_1, then described first renewal mould Each daily record dimension values is respectively written in table 1 with xxx_hashcode_1 with each dimension values of a line by block 60 In, to complete to the operation being updated with described first corresponding each dimension values of row key value to be updated, its In, owing to each daily record dimension values is identical with each dimension values of a line with xxx_hashcode_1, so writing Enter the process of each daily record dimension values to be and carry out again with each dimension values of a line to xxx_hashcode_1 Secondary covering, or, it is also possible to without being updated with each dimension values of a line to xxx_hashcode_1 Operation.
Alternatively, if the described first row key value to be updated, then institute cannot be found in described dimension concordance list State the first more new module 60 and can increase a line in described dimension concordance list newly, with by described first row to be updated In the row that key value and the write of each daily record dimension values increase newly.
Described second more new module 70, for generating second according to described daily record dimension values and described temporal information Row key value to be updated, and generate index row name to be updated according to described critical field, and refer to according to described daily record Mark data arrange with described second row key value to be updated and described index to be updated in described achievement data storage table The corresponding achievement data of name is updated;
Concrete, the described second more new module 70 can generate according to described daily record dimension values Xxx_hashcode_n, generates time further according to temporal information and default time granularity, will Xxx_hashcode_n and time carries out splicing i.e. available second row key value to be updated xxx_hashcode_n_time.Described second more new module 70 generates finger to be updated further according to described critical field Mark row name, the detailed process generating index row name to be updated is: calculate the murmur Hash of described critical field Value (64 is integer value), i.e. can get index row name to be updated divided by 64 after right shift 48 (0000~1023), the method i.e. calculating index row name to be updated is: (murmurhash (key) > > 48)/64. After generating the second row key value to be updated and index row name to be updated, the described second more new module 70 can root According to described daily record achievement data in described achievement data storage table with described second row key value to be updated and described The corresponding achievement data of index row name to be updated is updated.It with the table 2 in the corresponding embodiment of above-mentioned Fig. 1 is Example, if described second row key value to be updated is xxx_hashcode_1_time, described index to be updated row are entitled 0000, then SliceIndex_1 can be carried out by the described second more new module 70 according to described daily record achievement data Update;Wherein, the process being updated SliceIndex_1 may include that
Renewal to overall situation counting index countTotal: countTotal+1;
Renewal to estimating number of users index array buckets:
P=(murmurhash (key) > > 48) %64
The leading zero number of zeroNum=(murmurhash (key) < < 16)
if(buckets.get(p)<zeroNum)then bucket.set(p,zeroNum);
If i-th: countIndex.get (i)+1 that the index updating is counting index;
If i-th: distincts.get (i) .set (v) that the index updating is duplicate removal index;
If the index updating is for adding up to i-th: sum.get (i)+v of index;
If the index updating is minimum of a value index i-th: Min (min.get (i), v);
If the index updating is maximum index i-th: Max (max.get (i), v);
I-th that if the index updating is mean value index: (avg.get(i)*(countTotal.get(i)-1)+v)/countTotal.get(i)。
Alternatively, if the second row key value to be updated, then institute cannot be found in described achievement data storage table State the second more new module 70 and can increase a line in described achievement data storage table newly, to treat described second more In the newly-increased row of newline key value write, and in this newly-increased row and corresponding with described index row name to be updated The described daily record achievement data of position write.
Further, then refer to Fig. 6, it is that one first in the corresponding embodiment of above-mentioned Fig. 4 or Fig. 5 obtains The structural representation of module 10, described first acquisition module 10 may include that the 101st, acquisition request unit is tieed up Degree acquiring unit the 102nd, the first row is good for acquiring unit 103;
Described acquisition request unit 101, is used for obtaining statistical query request, and the request of described statistical query carries Inquiry dimension values;
Concrete, when described acquisition request unit 101 receives statistical query request, can obtain described The entrained inquiry dimension values of statistical query request.
Described dimension acquiring unit 102, at least one dimension values described in described dimension concordance list Obtain and the described inquiry corresponding dimension values of dimension values, and will be true with the described inquiry corresponding dimension values of dimension values It is set to target dimension value;
Concrete, after described acquisition request unit 101 gets inquiry dimension values, described dimension obtains single Unit 102 can obtain and described inquiry dimension at least one dimension values described in described dimension concordance list It is worth corresponding dimension values, and target dimension value will be defined as with the described inquiry corresponding dimension values of dimension values.With As a example by table 1 in the corresponding embodiment of above-mentioned Fig. 1, inquiry dimension values includes dimension values A and dimension values B, if Described dimension acquiring unit 102 finds the dimension values identical with dimension values A in the row race of table 1 Bytes_1, the dimension values identical with dimension values B is Bytes_2, then be defined as Bytes_1 and Bytes_2 Target dimension value.
Described the first row is good for acquiring unit 103, for described in described dimension concordance list at least one first Row key value obtains the corresponding the first row key value with described target dimension value, and will be with described target dimension value pair The first row key value answered is defined as first object row key value;
Concrete, after described dimension acquiring unit 102 determines target dimension value, described the first row is strong to be obtained Take unit 103 at least one the first row key value described in described dimension concordance list can obtain with described The corresponding the first row key value of target dimension value, and corresponding the first row key value will determine with described target dimension value For first object row key value.Again as a example by above-mentioned table 1, however, it is determined that the target dimension value going out be Bytes_1 and Bytes_2, then described the first row is good for acquiring unit 103 and can be got and Bytes_1 in the row of table 1 is strong Corresponding the first row key value is xxx_hashcode_1, and Bytes_2 corresponding the first row key value is Xxx_hashcode_2, and xxx_hashcode_1 and xxx_hashcode_2 is defined as first object row is good for Value.
Further, then refer to Fig. 7, it is a kind of second acquisition module 20 in above-mentioned Fig. 4 or Fig. 5 Structural representation, described second acquisition module 20 may include that time value signal generating unit the 201st, concatenation unit 202nd, the second row is good for acquiring unit 203;
Described time value signal generating unit 201, for asking to be taken according to described time granularity and described statistical query The timing statistics scope of band, generates at least one object time value in the range of described timing statistics;
Concrete, the request of described statistical query also includes timing statistics scope.Described time value signal generating unit 201 Can generate in institute according to the timing statistics scope entrained by described time granularity and the request of described statistical query State at least one the object time value in the range of timing statistics.For example, if timing statistics scope is whole day, and The time granularity preset is 1 hour, then the object time value that described time value signal generating unit 201 is generated Including the 0000th, the 0100th, the 0200th ..., 2300, i.e. the time value of 0:00 to 23:00.
Described concatenation unit 202, for will at least one object time described be worth in each object time be worth respectively with Described first object row key value is spliced, to obtain row key value to be checked;
Concrete, after described time value signal generating unit 201 generates at least one object time value, described spelling Order unit 202 can by least one object time described be worth in each object time value respectively with described first mesh Mark row key value is spliced, to obtain row key value to be checked.For example, if described timing statistics scope is whole day And time granularity is 1 hour, and described first object row key value is xxx_hashcode_2, then described extremely The 0100th, the 0200th, the 0000th, a few object time value include ..., 2300, described concatenation unit 202 general Xxx_hashcode_2 splices with each object time value at least one object time value described respectively, can To obtain row key value to be checked, described row key value to be checked includes: xxx_hashcode_2_0000, xxx_hashcode_2_0100、…、xxx_hashcode_2_2300。
Described second row is good for acquiring unit 203, is used at least one described in described achievement data storage table Second row key value obtains and the corresponding second row key value of described row key value to be checked, and will be to be checked with described The corresponding second row key value of row key value is defined as the second target line key value;
Concrete, after described concatenation unit 202 obtains row key value to be checked, described second row is strong obtains list Unit 203 described in described achievement data storage table, at least one second row key value can obtain with described The corresponding second row key value of row key value to be checked, and will with described row key value to be checked corresponding second row key value It is defined as the second target line key value.For example, described row key value to be checked includes: xxx_hashcode_2_0000, Xxx_hashcode_2_0100 ..., xxx_hashcode_2_2300, described second row is good for acquiring unit 203 In described achievement data storage table, search whether that there is second row identical with described row key value to be checked is good for Value, if find the second row key value identical with described row key value to be checked in described achievement data storage table Including xxx_hashcode_2_0000 and xxx_hashcode_2_0100, then described second row is good for acquiring unit Described achievement data is stored xxx_hashcode_2_0000 and xxx_hashcode_2_0100 in table by 203 It is defined as the second target line key value.
Further, then refer to Fig. 8, it is a kind of structure merging module 30 in above-mentioned Fig. 4 or Fig. 5 Schematic diagram, it is single that described merging module 30 may include that achievement data acquiring unit the 301st, achievement data merges Unit 302;
Described achievement data acquiring unit 301, is used at least one described in described achievement data storage table Achievement data obtains and the described second corresponding achievement data of target line key value, and will be with described second target The corresponding achievement data of row key value is defined as target indicator data;
Concrete, described achievement data acquiring unit 301 can be described in described achievement data storage table At least one achievement data obtains and the described second corresponding achievement data of target line key value, and will with described The second corresponding achievement data of target line key value is defined as target indicator data.With the corresponding embodiment of above-mentioned Fig. 1 In table 2 as a example by, if described second target line key value includes: xxx_hashcode_1_time and Xxx_hashcode_2_time, then be mesh with xxx_hashcode_1_time with each row achievement data of a line Mark achievement data, and it is also target indicator with xxx_hashcode_2_time with each row achievement data of a line Data.
Described achievement data combining unit 302, merges for described target indicator data are entered every trade, and will row Target indicator data after merging are entered ranks and are merged, to obtain the corresponding statistical number with the request of described statistical query According to;
Concrete, after described achievement data acquiring unit 301 determines target indicator data, described index Described target indicator data first can be entered every trade and merge by data combination unit 302, be expert at merging when can be from Respectively the target indicator data of each column are merged successively two-by-two under up to, until row is merged into remaining a line Target indicator data.Refer to the table 2 in the corresponding embodiment of above-mentioned Fig. 1 again, with SliceIndex_1 and SliceIndex_2 enters every trade and merges into example, if SliceIndex_1 and SliceIndex_2 all includes overall situation counting Index, estimate number of users index array, counting index, duplicate removal index, add up to index, maximum index, Minimum of a value index and mean value index, then described achievement data combining unit 302 to SliceIndex_1 and The detailed process that SliceIndex_2 enters every trade merging includes:
The merging of overall situation counting index: sliceIndex1.countTotal+sliceIndex2.countTotal;
The merging of estimation number of users index array:
The merging of counting index: sliceIndex1.countIndex+sliceIndex2.countIndex;
The merging of duplicate removal index: sliceIndex1.distincts.addAll (sliceIndex2.distincts) (set merges);
Add up to the merging of index: sliceIndex1.sum+sliceIndex2.sum;
The merging of minimum of a value index: Min (sliceIndex1.min, sliceIndex2.min);
The merging of minimum of a value index: Max (sliceIndex1.max, sliceIndex2.max);
The merging of mean value index:
(sliceIndex1.avg*sliceIndex1.countIndex1+sliceIndex2.avg*sliceIndex2.countIndex2 )/(sliceIndex1.countIndex1+sliceIndex2.countIndex2)。
Wherein, the target indicator data after row is merged by described achievement data combining unit 302 are entered ranks and are merged Process with row merge process similar, i.e. row merge when can to each row in row merging after target indicator Data merge successively two-by-two, until row are merged into surplus next total target indicator data, described total Target indicator data are the corresponding statistics with the request of described statistical query.Wherein, row merge and row closes And different place is only that to the merging estimating number of users index array, before row merge, need first root Estimation number of users this variograph of index array after merging according to row in each row calculates a number of users numerical value, then The number of users numerical value that each column count that directly adds up when row merge goes out can count to be inquired about under certain dimension condition The total amount of estimation this desired value of number of users, the process merging and the capable mistake merging arranges for other indexs Cheng Xiangtong.Wherein, the tool of a number of users numerical value is calculated according to estimation number of users this variograph of index array Body process can be:
Wherein, userCount is the number of users numerical value calculating.
The embodiment of the present invention is by obtaining the inquiry dimension entrained with statistical query request in dimension concordance list It is worth corresponding first object row key value, then obtain corresponding with first object row key value in achievement data storage table The second target line key value, achievement data storage table will can refer to the second corresponding target of target line key value Mark data merge, to obtain statistics.Due to the dimension values in log recording and achievement data respectively Be stored in dimension concordance list and achievement data storage table, and only need to by scanning dimension concordance list can be to finger Target indicator data in mark table data store carry out statistical summaries, i.e. without all carrying out all log recordings Scanning, it is possible to reduce the data volume needing scanning, to improve the efficiency carrying out statistical summaries to data; And by extract daily record achievement data in described log information, daily record dimension values, temporal information with And critical field, corresponding dimension values in dimension concordance list can be updated and to achievement data storage table In corresponding achievement data be updated, owing to can only update relevant position during write log information Dimension values and achievement data, i.e. can avoid often writing a log information will increase a line number, So effectively reducing storage overhead;Meanwhile, by way of updating dimension values and achievement data, day is write Will record information, it is also possible to reduce the data volume of required scanning when statistical query, to improve logarithm further According to the efficiency carrying out statistical summaries.
Refer to Fig. 9, be the structural representation of another data statistics device that the embodiment of the present invention provides, institute State data statistics device 1000 and may include that at least one processor 1001, such as CPU, at least one Network interface 1004, user interface 1003, memory 1005, at least one communication bus 1002.Wherein, Communication bus 1002 is for realizing the connection communication between these assemblies.Wherein, user interface 1003 is permissible Including display screen (Display), keyboard (Keyboard), optional user interface 1003 can also include standard Wireline interface, wave point.Network interface 1004 optionally can include the wireline interface of standard, wireless Interface (such as WI-FI interface).Memory 1005 can be high-speed RAM memory, it is also possible to be non-shakiness Fixed memory (non-volatile memory), for example, at least one magnetic disc store.Memory 1005 can That selects can also is that at least one is located remotely from the storage device of aforementioned processor 1001.As it is shown in figure 9, make Operating system, network communication module, use can be included in the memory 1005 of a kind of computer-readable storage medium Family interface module and equipment control application program.
In the data statistics device 1000 shown in Fig. 9, network interface 1004 is mainly used in connecting user eventually End, enters row data communication with described user terminal;And user interface 1003 is mainly used in providing the user input Interface, obtain user output data;And processor 1001 may be used for calling in memory 1005 and deposits The equipment control application program of storage, and specifically perform following steps:
Obtaining statistical query request, obtaining in the row race of default dimension concordance list please with described statistical query Seek entrained inquiry dimension values corresponding target dimension value, and obtain in the row of described dimension concordance list is strong The corresponding first object row key value with described target dimension value;
Obtain and described first object row key value corresponding second in the row of default achievement data storage table is strong Target line key value;
By in the row race of described achievement data storage table with the described second corresponding target indicator number of target line key value According to merging, to obtain the corresponding statistics with the request of described statistical query.
In one embodiment, described processor 1001 is performing to obtain statistical query request, in default dimension The row race of degree concordance list obtains the inquiry dimension values corresponding target dimension entrained with the request of described statistical query Angle value, and in the row of described dimension concordance list is strong, obtain with described target dimension value corresponding first object row Before key value, also execution following steps:
Preset dimension concordance list and achievement data storage table according to HBase database;
Wherein, the row race of described dimension concordance list includes at least one dimension row name, at least one dimension described Each dimension row name at least one dimension values corresponding respectively in row name, the row of described dimension concordance list is strong to be included at least One the first row key value, at least one the first row key value described is to calculate according at least one dimension values described Arrive;
Wherein, the row race of described achievement data storage table includes at least one index row name, described at least one Each index row name at least one achievement data corresponding respectively in index row name, the row of described achievement data storage table Being good for and including at least one second row key value, at least one second row key value described is according at least one dimension described Angle value and time value are calculated, and described time value is to be divided according to default time granularity.
In one embodiment, described processor 1001 is performing to obtain statistical query request, in default dimension The row race of degree concordance list obtains the inquiry dimension values corresponding target dimension entrained with the request of described statistical query Angle value, and in the row of described dimension concordance list is strong, obtain with described target dimension value corresponding first object row During key value, specifically execution following steps:
Obtaining statistical query request, the request of described statistical query carries inquiry dimension values;
At least one dimension values described in described dimension concordance list obtain corresponding with described inquiry dimension values Dimension values, and will with described inquiry the corresponding dimension values of dimension values be defined as target dimension value;
At least one the first row key value described in described dimension concordance list obtains and described target dimension value Corresponding the first row key value, and corresponding the first row key value will be defined as first object with described target dimension value Row key value.
In one embodiment, described processor 1001 is good at the row performing to store table at default achievement data When middle acquisition is with described first object row key value corresponding second target line key value, specifically execution following steps:
Timing statistics scope according to entrained by described time granularity and described statistical query are asked, generates in institute State at least one the object time value in the range of timing statistics;
Each object time value at least one object time value described is entered with described first object row key value respectively Row splicing, to obtain row key value to be checked;
At least one second row key value described in described achievement data storage table obtain to be checked with described The corresponding second row key value of row key value, and corresponding second row key value will be defined as with described row key value to be checked Second target line key value.
In one embodiment, described processor 1001 is performing in the row race of described achievement data storage table Merge with the described second corresponding target indicator data of target line key value, to obtain and described statistical query When asking corresponding statistics, specifically execution following steps:
At least one achievement data described in described achievement data storage table obtains and described second target The corresponding achievement data of row key value, and mesh will be defined as with the described second corresponding achievement data of target line key value Mark achievement data;
Described target indicator data are entered every trade merge, and the target indicator data after row merging are entered ranks conjunction And, to obtain the corresponding statistics with the request of described statistical query.
In one embodiment, described processor 1001 also performs following steps:
When getting log information, extract the daily record achievement data in described log information, day Will dimension values, temporal information and critical field;
Generate the first row key value to be updated according to described daily record dimension values, and according to described daily record dimension values to institute State in dimension concordance list and be updated with described first corresponding each dimension values of row key value to be updated;
Generate the second row key value to be updated according to described daily record dimension values and described temporal information, and according to described Critical field generates index row name to be updated, and stores described achievement data according to described daily record achievement data Table is carried out more with described second row key value to be updated and the corresponding achievement data of described index row name to be updated Newly.
The embodiment of the present invention is by obtaining the inquiry dimension entrained with statistical query request in dimension concordance list It is worth corresponding first object row key value, then obtain corresponding with first object row key value in achievement data storage table The second target line key value, achievement data storage table will can refer to the second corresponding target of target line key value Mark data merge, to obtain statistics.Due to the dimension values in log recording and achievement data respectively Be stored in dimension concordance list and achievement data storage table, and only need to by scanning dimension concordance list can be to finger Target indicator data in mark table data store carry out statistical summaries, i.e. without all carrying out all log recordings Scanning, it is possible to reduce the data volume needing scanning, to improve the efficiency carrying out statistical summaries to data; And by extract daily record achievement data in described log information, daily record dimension values, temporal information with And critical field, corresponding dimension values in dimension concordance list can be updated and to achievement data storage table In corresponding achievement data be updated, owing to can only update relevant position during write log information Dimension values and achievement data, i.e. can avoid often writing a log information will increase a line number, So effectively reducing storage overhead;Meanwhile, by way of updating dimension values and achievement data, day is write Will record information, it is also possible to reduce the data volume of required scanning when statistical query, to improve logarithm further According to the efficiency carrying out statistical summaries.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, Can be by computer program and complete to instruct related hardware, described program can be stored in a calculating In machine read/write memory medium, this program is upon execution, it may include such as the flow process of the embodiment of above-mentioned each method. Wherein, described storage medium can for magnetic disc, CD, read-only store-memory body (Read-Only Memory, Or random store-memory body (Random Access Memory, RAM) etc. ROM).
Above disclosed be only present pre-ferred embodiments, certainly can not with this limit the present invention it Interest field, the equivalent variations therefore made according to the claims in the present invention, still belong to the scope that the present invention is covered.

Claims (12)

1. a data statistical approach, it is characterised in that include:
Obtaining statistical query request, obtaining in the row race of default dimension concordance list please with described statistical query Seek entrained inquiry dimension values corresponding target dimension value, and obtain in the row of described dimension concordance list is strong The corresponding first object row key value with described target dimension value;
Obtain and described first object row key value corresponding second in the row of default achievement data storage table is strong Target line key value;
By in the row race of described achievement data storage table with the described second corresponding target indicator number of target line key value According to merging, to obtain the corresponding statistics with the request of described statistical query.
2. the method for claim 1, it is characterised in that ask at described acquisition statistical query, The row race of the dimension concordance list preset obtains the inquiry dimension values entrained with the request of described statistical query corresponding Target dimension value, and in the row of described dimension concordance list is strong, obtain corresponding with described target dimension value the Before the step of one target line key value, also include:
Preset dimension concordance list and achievement data storage table according to HBase database;
Wherein, the row race of described dimension concordance list includes at least one dimension row name, at least one dimension described Each dimension row name at least one dimension values corresponding respectively in row name, the row of described dimension concordance list is strong to be included at least One the first row key value, at least one the first row key value described is to calculate according at least one dimension values described Arrive;
Wherein, the row race of described achievement data storage table includes at least one index row name, described at least one Each index row name at least one achievement data corresponding respectively in index row name, the row of described achievement data storage table Being good for and including at least one second row key value, at least one second row key value described is according at least one dimension described Angle value and time value are calculated, and described time value is to be divided according to default time granularity.
3. method as claimed in claim 2, it is characterised in that described acquisition statistical query is asked, in advance If dimension concordance list row race in obtain with described statistical query request entrained by inquiry dimension values corresponding Target dimension value, and obtain in the row of described dimension concordance list is strong and described target dimension value corresponding first Target line key value, comprising:
Obtaining statistical query request, the request of described statistical query carries inquiry dimension values;
At least one dimension values described in described dimension concordance list obtain corresponding with described inquiry dimension values Dimension values, and will with described inquiry the corresponding dimension values of dimension values be defined as target dimension value;
At least one the first row key value described in described dimension concordance list obtains and described target dimension value Corresponding the first row key value, and corresponding the first row key value will be defined as first object with described target dimension value Row key value.
4. method as claimed in claim 2, it is characterised in that described at default achievement data storage table Row strong in obtain and the corresponding second target line key value of described first object row key value, comprising:
Timing statistics scope according to entrained by described time granularity and described statistical query are asked, generates in institute State at least one the object time value in the range of timing statistics;
Each object time value at least one object time value described is entered with described first object row key value respectively Row splicing, to obtain row key value to be checked;
At least one second row key value described in described achievement data storage table obtain to be checked with described The corresponding second row key value of row key value, and corresponding second row key value will be defined as with described row key value to be checked Second target line key value.
5. method as claimed in claim 2, it is characterised in that described by described achievement data storage table Row race merges with the described second corresponding target indicator data of target line key value, to obtain and described system The meter corresponding statistics of inquiry request, comprising:
At least one achievement data described in described achievement data storage table obtains and described second target The corresponding achievement data of row key value, and mesh will be defined as with the described second corresponding achievement data of target line key value Mark achievement data;
Described target indicator data are entered every trade merge, and the target indicator data after row merging are entered ranks conjunction And, to obtain the corresponding statistics with the request of described statistical query.
6. method as claimed in claim 2, it is characterised in that also include:
When getting log information, extract the daily record achievement data in described log information, day Will dimension values, temporal information and critical field;
Generate the first row key value to be updated according to described daily record dimension values, and according to described daily record dimension values to institute State in dimension concordance list and be updated with described first corresponding each dimension values of row key value to be updated;
Generate the second row key value to be updated according to described daily record dimension values and described temporal information, and according to described Critical field generates index row name to be updated, and stores described achievement data according to described daily record achievement data Table is carried out more with described second row key value to be updated and the corresponding achievement data of described index row name to be updated Newly.
7. a data statistics device, it is characterised in that include:
First acquisition module, is used for obtaining statistical query request, obtains in the row race of default dimension concordance list Take the inquiry dimension values corresponding target dimension value entrained with the request of described statistical query, and in described dimension Corresponding first object row key value is obtained with described target dimension value during the row of concordance list is strong;
Second acquisition module, for obtaining and described first mesh in the row of default achievement data storage table is strong The mark corresponding second target line key value of row key value;
Merge module, for by described achievement data storage table row race in described second target line key value pair The target indicator data answered merge, to obtain the corresponding statistics with the request of described statistical query.
8. device as claimed in claim 7, it is characterised in that also include:
Presetting module, for presetting dimension concordance list and achievement data storage table according to HBase database;
Wherein, the row race of described dimension concordance list includes at least one dimension row name, at least one dimension described Each dimension row name at least one dimension values corresponding respectively in row name, the row of described dimension concordance list is strong to be included at least One the first row key value, at least one the first row key value described is to calculate according at least one dimension values described Arrive;
Wherein, the row race of described achievement data storage table includes at least one index row name, described at least one Each index row name at least one achievement data corresponding respectively in index row name, the row of described achievement data storage table Being good for and including at least one second row key value, at least one second row key value described is according at least one dimension described Angle value and time value are calculated, and described time value is to be divided according to default time granularity.
9. device as claimed in claim 8, it is characterised in that described first acquisition module includes:
Acquisition request unit, is used for obtaining statistical query request, and the request of described statistical query carries inquiry dimension Angle value;
Dimension acquiring unit, at least one dimension values described in described dimension concordance list obtains with The described inquiry corresponding dimension values of dimension values, and mesh will be defined as with the described inquiry corresponding dimension values of dimension values Mark dimension values;
The first row is good for acquiring unit, is used at least one the first row key value described in described dimension concordance list Middle obtain and the described target dimension corresponding the first row key value of value, and will corresponding with described target dimension value the A line key value is defined as first object row key value.
10. device as claimed in claim 8, it is characterised in that described second acquisition module includes:
Time value signal generating unit, for the system according to entrained by described time granularity and the request of described statistical query Meter time range, generates at least one object time value in the range of described timing statistics;
Concatenation unit, was worth respectively with described for each object time in being worth at least one object time described One target line key value is spliced, to obtain row key value to be checked;
Second row is good for acquiring unit, is used at least one second row described in described achievement data storage table Key value obtains and the corresponding second row key value of described row key value to be checked, and will be with described row key value to be checked Corresponding second row key value is defined as the second target line key value.
11. devices as claimed in claim 8, it is characterised in that described merging module includes:
Achievement data acquiring unit, is used at least one index number described in described achievement data storage table Obtain and the described second corresponding achievement data of target line key value according to middle, and will be with described second target line key value Corresponding achievement data is defined as target indicator data;
Achievement data combining unit, merges for described target indicator data are entered every trade, and by after row merging Target indicator data enter ranks and merge, to obtain and the described statistical query corresponding statistics of request.
12. devices as claimed in claim 8, it is characterised in that also include:
Information extraction modules, for when getting log information, extracts in described log information Daily record achievement data, daily record dimension values, temporal information and critical field;
First more new module, is used for according to described daily record dimension values generation the first row key value to be updated, and according to Described daily record dimension values in described dimension concordance list with described first corresponding each dimension of row key value to be updated Value is updated;
Second more new module, to be updated for generating second according to described daily record dimension values and described temporal information Row key value, and generate index row name to be updated according to described critical field, and according to described daily record achievement data Described achievement data is stored in table corresponding with described second row key value to be updated and described index row name to be updated Achievement data be updated.
CN201510070951.0A 2015-02-10 2015-02-10 A kind of data statistical approach and device Active CN105989076B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510070951.0A CN105989076B (en) 2015-02-10 2015-02-10 A kind of data statistical approach and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510070951.0A CN105989076B (en) 2015-02-10 2015-02-10 A kind of data statistical approach and device

Publications (2)

Publication Number Publication Date
CN105989076A true CN105989076A (en) 2016-10-05
CN105989076B CN105989076B (en) 2019-05-07

Family

ID=57041808

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510070951.0A Active CN105989076B (en) 2015-02-10 2015-02-10 A kind of data statistical approach and device

Country Status (1)

Country Link
CN (1) CN105989076B (en)

Cited By (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528674A (en) * 2016-10-31 2017-03-22 厦门服云信息科技有限公司 Method and device for high-performance query based on Hbase row keys
CN106649687A (en) * 2016-12-16 2017-05-10 飞狐信息技术(天津)有限公司 Method and device for on-line analysis and processing of large data
CN106682100A (en) * 2016-12-02 2017-05-17 浙江宇视科技有限公司 Data statistical method and system based on Hbase database
CN107767010A (en) * 2017-08-04 2018-03-06 平安科技(深圳)有限公司 Range value data statistical method, electronic installation and computer-readable recording medium
CN108398641A (en) * 2017-11-30 2018-08-14 深圳市科列技术股份有限公司 A kind of battery data processing method and battery data server
CN109033158A (en) * 2018-06-14 2018-12-18 浙江口碑网络技术有限公司 Data deduplication statistical method and device based on specified time window
CN109145059A (en) * 2018-06-29 2019-01-04 深圳市彬讯科技有限公司 For the data processing method of data statistics, server and storage medium
CN109165377A (en) * 2018-06-11 2019-01-08 玖富金科控股集团有限责任公司 Generate the method and tabulating equipment of form data
CN109299141A (en) * 2018-10-19 2019-02-01 深圳市元征科技股份有限公司 A kind of method of data query, system and associated component
CN109299106A (en) * 2018-10-31 2019-02-01 中国联合网络通信集团有限公司 Data query method and apparatus
CN109783646A (en) * 2019-02-12 2019-05-21 四川大学华西医院 A kind of data processing method and device
CN110019014A (en) * 2017-12-19 2019-07-16 华为技术有限公司 To the method and apparatus of file system write-in data record
CN110413631A (en) * 2018-04-25 2019-11-05 中移(苏州)软件技术有限公司 A kind of data query method and device
CN110460876A (en) * 2019-08-15 2019-11-15 网易(杭州)网络有限公司 Processing method, device and the electronic equipment of log is broadcast live
CN110688412A (en) * 2019-09-27 2020-01-14 杭州有赞科技有限公司 Mass data statistical method and mass data statistical system based on ES
CN110990394A (en) * 2018-09-28 2020-04-10 杭州海康威视数字技术股份有限公司 Distributed column database table-oriented line number statistical method and device and storage medium
CN111221883A (en) * 2018-11-27 2020-06-02 浙江宇视科技有限公司 Data statistical method and system
CN111782645A (en) * 2019-11-29 2020-10-16 北京沃东天骏信息技术有限公司 Data processing method and device
CN111221883B (en) * 2018-11-27 2024-04-26 浙江宇视科技有限公司 Data statistics method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020062313A1 (en) * 2000-10-27 2002-05-23 Lg Electronics Inc. File structure for streaming service, apparatus and method for providing streaming service using the same
CN102750356A (en) * 2012-06-11 2012-10-24 清华大学 Construction and management method for secondary indexes of key value library
CN103020204A (en) * 2012-12-05 2013-04-03 北京普泽天玑数据技术有限公司 Method and system for carrying out multi-dimensional regional inquiry on distribution type sequence table
CN103617232A (en) * 2013-11-26 2014-03-05 北京京东尚科信息技术有限公司 Paging inquiring method for HBase table
CN104239567A (en) * 2014-09-28 2014-12-24 北京国双科技有限公司 Method and device for processing dimension in data warehouse

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20020062313A1 (en) * 2000-10-27 2002-05-23 Lg Electronics Inc. File structure for streaming service, apparatus and method for providing streaming service using the same
CN102750356A (en) * 2012-06-11 2012-10-24 清华大学 Construction and management method for secondary indexes of key value library
CN103020204A (en) * 2012-12-05 2013-04-03 北京普泽天玑数据技术有限公司 Method and system for carrying out multi-dimensional regional inquiry on distribution type sequence table
CN103617232A (en) * 2013-11-26 2014-03-05 北京京东尚科信息技术有限公司 Paging inquiring method for HBase table
CN104239567A (en) * 2014-09-28 2014-12-24 北京国双科技有限公司 Method and device for processing dimension in data warehouse

Cited By (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106528674B (en) * 2016-10-31 2019-10-01 厦门服云信息科技有限公司 The High Performance Data Query method and apparatus being good for based on Hbase row
CN106528674A (en) * 2016-10-31 2017-03-22 厦门服云信息科技有限公司 Method and device for high-performance query based on Hbase row keys
CN106682100A (en) * 2016-12-02 2017-05-17 浙江宇视科技有限公司 Data statistical method and system based on Hbase database
CN106682100B (en) * 2016-12-02 2020-10-20 浙江宇视科技有限公司 Data statistics method and system based on Hbase database
CN106649687A (en) * 2016-12-16 2017-05-10 飞狐信息技术(天津)有限公司 Method and device for on-line analysis and processing of large data
CN106649687B (en) * 2016-12-16 2023-11-21 飞狐信息技术(天津)有限公司 Big data online analysis processing method and device
CN107767010A (en) * 2017-08-04 2018-03-06 平安科技(深圳)有限公司 Range value data statistical method, electronic installation and computer-readable recording medium
CN108398641A (en) * 2017-11-30 2018-08-14 深圳市科列技术股份有限公司 A kind of battery data processing method and battery data server
CN110019014A (en) * 2017-12-19 2019-07-16 华为技术有限公司 To the method and apparatus of file system write-in data record
CN110413631A (en) * 2018-04-25 2019-11-05 中移(苏州)软件技术有限公司 A kind of data query method and device
CN109165377A (en) * 2018-06-11 2019-01-08 玖富金科控股集团有限责任公司 Generate the method and tabulating equipment of form data
CN109033158A (en) * 2018-06-14 2018-12-18 浙江口碑网络技术有限公司 Data deduplication statistical method and device based on specified time window
CN109145059A (en) * 2018-06-29 2019-01-04 深圳市彬讯科技有限公司 For the data processing method of data statistics, server and storage medium
CN110990394A (en) * 2018-09-28 2020-04-10 杭州海康威视数字技术股份有限公司 Distributed column database table-oriented line number statistical method and device and storage medium
CN110990394B (en) * 2018-09-28 2023-10-20 杭州海康威视数字技术股份有限公司 Method, device and storage medium for counting number of rows of distributed column database table
CN109299141A (en) * 2018-10-19 2019-02-01 深圳市元征科技股份有限公司 A kind of method of data query, system and associated component
CN109299106B (en) * 2018-10-31 2020-09-22 中国联合网络通信集团有限公司 Data query method and device
CN109299106A (en) * 2018-10-31 2019-02-01 中国联合网络通信集团有限公司 Data query method and apparatus
CN111221883A (en) * 2018-11-27 2020-06-02 浙江宇视科技有限公司 Data statistical method and system
CN111221883B (en) * 2018-11-27 2024-04-26 浙江宇视科技有限公司 Data statistics method and system
CN109783646A (en) * 2019-02-12 2019-05-21 四川大学华西医院 A kind of data processing method and device
CN110460876A (en) * 2019-08-15 2019-11-15 网易(杭州)网络有限公司 Processing method, device and the electronic equipment of log is broadcast live
CN110688412A (en) * 2019-09-27 2020-01-14 杭州有赞科技有限公司 Mass data statistical method and mass data statistical system based on ES
CN111782645A (en) * 2019-11-29 2020-10-16 北京沃东天骏信息技术有限公司 Data processing method and device

Also Published As

Publication number Publication date
CN105989076B (en) 2019-05-07

Similar Documents

Publication Publication Date Title
CN105989076A (en) Data statistical method and device
CN104090889B (en) Data processing method and system
CN107798038B (en) Data response method and data response equipment
CN108255958A (en) Data query method, apparatus and storage medium
US10185771B2 (en) Method and system for scheduling web crawlers according to keyword search
US20160188723A1 (en) Cloud website recommendation method and system based on terminal access statistics, and related device
CN103064933A (en) Data query method and system
CN109344153A (en) The processing method and terminal device of business datum
CN108268529B (en) Data summarization method and system based on business abstraction and multi-engine scheduling
CN101557427A (en) Method for providing diffluent information and realizing the diffluence of clients, system and server thereof
US20210357461A1 (en) Method, apparatus and storage medium for searching blockchain data
CN110245145A (en) Structure synchronization method and apparatus of the relevant database to Hadoop database
CN110781184A (en) Data table construction method, device, equipment and storage medium
CN106503008A (en) File memory method and device and file polling method and apparatus
CN110083600A (en) A kind of method, apparatus, calculating equipment and the storage medium of log collection processing
CN110008246A (en) Metadata management method and device
CN108874946A (en) A kind of ID management method and device
CN113297269A (en) Data query method and device
CN103902592A (en) Method and system for realizing analytic functions based on MapReduce
CN106227597A (en) Task priority treating method and apparatus
CN107784091B (en) Operation authority query method and terminal device
CN104166650B (en) Data storage device and date storage method
CN113051460A (en) Elasticissearch-based data retrieval method and system, electronic device and storage medium
CN108241639A (en) A kind of data duplicate removal method
CN107506473A (en) A kind of big data search method based on cloud computing

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20190730

Address after: Shenzhen Futian District City, Guangdong province 518000 Zhenxing Road, SEG Science Park 2 East Room 403

Co-patentee after: Tencent cloud computing (Beijing) limited liability company

Patentee after: Tencent Technology (Shenzhen) Co., Ltd.

Address before: Shenzhen Futian District City, Guangdong province 518000 Zhenxing Road, SEG Science Park 2 East Room 403

Patentee before: Tencent Technology (Shenzhen) Co., Ltd.

TR01 Transfer of patent right