Summary of the invention
The application's is designed to provide a kind of date storage method and device, feelings that can be constant in storing data precision
Under condition, optimize storage, the reading performance of database.
To realize that the above-mentioned application first purpose, one embodiment of the application provide a kind of date storage method, the side
Method includes:
After the user behavior data of same dimension is summarized daily, is summarized per hour, storage daily summarize data and
Summarize data per hour into the first storage organization, wherein first storage organization will summarize data and time pair per hour
Incidence relation is established in summarizing daily for answering between data;
By with summarize data daily and summarize the user behavior data of the same dimension of data per hour according to per minute, per second
After clock summarizes, storage summarizes data and summarizes data each second into the second storage organization per minute, wherein second storage
Structure will summarize that data are corresponding with the time to summarize per minute and establish incidence relation between data each second.
As the further improvement of one embodiment of the application, first storage organization includes a plurality of serial data, and every
Serial data is made of time window, and the time window of every data string respectively corresponds storage one and summarizes data and daily with this daily
Summarize the corresponding one or more of data time and summarizes data per hour;
Second storage organization includes a plurality of serial data, and every data string is made of multiple time windows, and every data
Multiple time windows of string respectively correspond storage one and summarize data per minute and summarize data time corresponding one per minute with this
A or multiple each seconds summarize data.
As the further improvement of one embodiment of the application,
First storage organization is the first storage table, and the column/row of first storage table includes that a column/row summarizes daily
Data, and more column/rows corresponding with data time is summarized daily summarize data per hour;
Second storage format is the second storage table, and the column/row of second storage table includes that a column/row is converged per minute
Total data, and multiple row/lines per second clock corresponding with data time is summarized per minute summarize data.
As the further improvement of one embodiment of the application, the method also includes:
A unique timestamp is configured for each serial data.
As the further improvement of one embodiment of the application, the method also includes:
If the storage numerical value of all time windows is all 0 in certain column/row, the column/row is not stored.
As the further improvement of one embodiment of the application, the method also includes:
When getting new user behavior data, summarize data, every each second of synchronized update and current time matches
Minute summarizes data, summarizes data per hour, and summarizes data daily.
To realize that the above-mentioned application first purpose, one embodiment of the application provide a kind of data storage device, the dress
It sets and includes:
Data memory format module, for that will summarize per hour, data are corresponding with the time to summarize and establish between data daily
Incidence relation forms the first storage organization, and will summarize that data are corresponding with the time to summarize and build between data per minute each second
Vertical incidence relation forms the second storage organization;
Mathematical logic memory module, for being summarized the user behavior data of same dimension daily, being summarized per hour
Afterwards, will summarize data daily and summarize data per hour and store into the first storage organization, and will with it is daily summarize data and
Summarize per hour the same dimension of data user behavior data summarize according to per minute, each second after, data will be summarized per minute
It stores with data are summarized each second into the second storage organization.
As the further improvement of one embodiment of the application, first storage organization includes a plurality of serial data, and every
Serial data is made of time window, and the time window of every data string can respectively correspond storage one and summarize data and every with this daily
Day summarizes the corresponding one or more of data time and summarizes data per hour;
Second storage organization includes a plurality of serial data, and every data string is made of time window, and every data string
Time window respectively corresponds storage one and summarizes data per minute and summarize the corresponding one or more of data time per minute with this
Each second summarizes data.
As the further improvement of one embodiment of the application,
First storage organization is the first storage table, and the column/row of first storage table includes that a column/row summarizes daily
Data, and more column/rows corresponding with data time is summarized daily summarize data per hour;
Second storage format is the second storage table, and the column/row of second storage table includes that a column/row is converged per minute
Total data, and multiple row/lines per second clock corresponding with data time is summarized per minute summarize data.
As the further improvement of one embodiment of the application, described device further include:
Mark module, for configuring a unique timestamp for each serial data.
As the further improvement of one embodiment of the application, the mathematical logic memory module is also used to:
If the storage numerical value of all time windows is all 0 in certain column/row, the column/row is not stored.
As the further improvement of one embodiment of the application, described device further include:
Update module, for when getting new user behavior data, driving the first logic storing module and second to be patrolled
The each second for collecting memory module synchronized update and current time matches summarizes data, summarizes data, per hour total amount per minute
According to, and summarize data daily.
Compared with the existing technology, the date storage method and device of the application having the technical effect that through the application, can
Optimize the storage organization to user behavior data, when counting, is subtracted with inquire to the user behavior data of accumulation/reading/
Few access times to database, optimize storage, the reading performance of database, improve response speed.
Specific embodiment
The application is described in detail below with reference to specific embodiment shown in the drawings.But these embodiments are simultaneously
The application is not limited, structure that those skilled in the art are made according to these embodiments, method or functionally
Transformation is all contained in the protection scope of the application.
To the statistical analysis of user behavior data, usually changes with the variation of business, may increase at any time, repair
Change the dimension of statistics, data volume of statistics etc., therefore, it is common to use the database (such as HBase database) of framework is not deposited
Store up the user behavior data of accumulation.
As shown in Figure 1, in one embodiment of the application, the date storage method, comprising:
S100, after the user behavior data of same dimension is summarized daily, summarized per hour, daily total amount is stored
According to summarize data per hour into the first storage organization, wherein first storage organization will summarize per hour data and when
Between corresponding summarize daily and establish incidence relation between data;
S200, by with summarize data daily and summarize the user behavior data of the same dimension of data per hour according to every point
After clock, each second summarize, storage summarizes data and summarizes data each second into the second storage organization per minute, wherein described
Second storage organization will summarize that data are corresponding with the time to summarize per minute and establish incidence relation between data each second.
So-called same dimension in present embodiment indicates meaning phase represented by the user behavior data for needing to accumulate
Together.For example, the dimension for the user behavior data accumulated can be the quantity for logging in website in a period of time;When can also be one section
Interior payment amount etc..Hereinafter, by with the payment amount in a period of time this dimension as an example, to present techniques side
Case is described in detail.
In the present embodiment, the user behavior data for accumulating storage can be stored in user behavior data library (such as HBase
Database) in comprising the first storage organization and the second storage organization.
Wherein, data will be summarized daily and summarizes data storage per hour into the first storage organization.It will summarize per minute
Summarize data and each second data storage into the second storage organization.
Further, in the present embodiment, first storage organization includes a plurality of serial data, every data string by when
Between window form, and the time window of every data string can respectively correspond storage one summarize data daily and summarize data daily with this
Time, corresponding one or more summarized data per hour;
Second storage organization includes a plurality of serial data, and every data string is made of multiple time windows, and every data
Multiple time windows of string can respectively correspond that storage one summarizes data per minute and to summarize data time per minute with this corresponding
One or more each seconds summarize data.
Join Fig. 2 a signal, first storage organization include 3 data strings, every data string by D, H0, H1 ...
H23,25 time windows form altogether, wherein the time window that D is indicated can store summarizes data daily;H0, H1 ... H23 are indicated
The time windows of lower 24 hours on the same day (from 0 point to 23 point) can store and corresponding summarize data per hour.It is understood that
In a data string, the total value of H0, H1 ... H23 time window store data inside, equal to the value of D time window store data inside.
Join Fig. 2 b signal, second storage organization include 3 data strings, every data string by M, S0, S1 ...
S59,61 time windows form altogether, wherein the time window that M is indicated can store summarizes data per minute;S0, S1 ... S59 table
The time window of lower 60 seconds of one minute shown (from 0 second to 59 second), which can store, summarizes data corresponding each second.It is understood that
In a data string, the total value of S, S1 ... S59 time window store data inside, equal to the value of M time window store data inside.
Further, the method also includes: for each serial data configure a unique timestamp.
In the present embodiment, first storage organization can be the first storage table, and the column of first storage table include
One column summarize data daily, and multiple row corresponding with data time is summarized daily summarizes data per hour.Join shown in Fig. 2 a, institute
Stating the first storage table can be based on rowkey (such as id information), and the first column data can be to summarize data daily, and following 24
Column data can be to summarize data per hour, and can also be 1~24 column data certainly is to summarize data per hour, and the 25th column data is every
Day summarizes data, and those skilled in the art can change the sequence according to customary means.In addition, for the unique of each serial data configuration
Timestamp (timestamp writes a Chinese character in simplified form ts) can be the date on the day of the serial data, to identify which day user's row the serial data be
For data.
Second storage organization can be the second storage table, and the column of second storage table include a column total amount per minute
According to, and summarize data multiple row each second corresponding with data time is summarized per minute.Join shown in Fig. 2 b, second storage table
Can based on rowkey (such as id information)+hour (such as relative users behavioral data occur time, be accurate to hour),
First column data is to summarize data per minute, and following 60 column data is to summarize data each second, can also be 1~60 columns certainly
According to summarize data each second, the 61st column data is to summarize data per minute, and those skilled in the art can become according to customary means
Change the sequence.In addition, the unique time stamps (timestamp writes a Chinese character in simplified form ts) for the configuration of each serial data can be that the serial data is current
Minutes, to identify that the serial data is the user behavior data of which minute.
Certainly, in the present embodiment, above-mentioned column can also be replaced with row, to realize the first essentially identical storage
Structure and the second storage organization:
First storage format is the first storage table, and the row of first storage table includes that a line summarizes data daily,
And multirow corresponding with data time is summarized daily summarizes data per hour;
Second storage format is the second storage table, and the row of second storage table includes a line total amount per minute
According to, and summarize data multirow each second corresponding with data time is summarized per minute.
The specifically derivation that signal can be beyond all doubt by Fig. 2 a, Fig. 2 b and the above-mentioned description to first/second storage table column
It obtains, details are not described herein.
Further, in the present embodiment, due to the column data of the first storage table and the second storage table be it is dynamic, because
This method also includes: if the storage numerical value of all time windows is all 0 in certain column/row, do not store the column/row.Thus may be used
To save many memory spaces.It is illustrated below by way of a specific example:
Assuming that get the user that id information is 2088xx1 has carried out 3 payments respectively within following times:
3 yuan were paid at 20150101 01:00:01 seconds;
5 yuan were paid at 20150101 01:00:12 seconds;
2 yuan were paid at 20150101 02:32:12 seconds.
So, the data stored in the first storage format are as follows:
rowkey |
timestamp |
D |
H1 |
H2 |
2088xx1 |
20150101 |
10 yuan |
8 yuan |
2 yuan |
The data stored in the second storage format are as follows:
Discovery that can be beyond all doubt from above-mentioned example, because of 20150101 01:00:01 to 20150101 01:00:12
The storage numerical value of corresponding time window is 0, and the corresponding time window of 20150101 02:32:00 to 20150101 02:32:12
Storage numerical value be also 0, therefore do not store S2~S11 column.
Certainly, above-mentioned example is to store summarize data daily, summarize data per hour, summarize per minute by way of column
Data, each second summarize data, and those skilled in the art can also be used in capable mode and store summarizes data, per hour daily
Summarize data, summarize data per minute, each second summarizes in data, details are not described herein.
Further, in the present embodiment, when getting new user behavior data, synchronized update and current time
Summarize data matched each second, summarize data per minute, summarize data per hour, and summarizes data daily.Still more than
The example in face continues to illustrate:
If the user that id information is 2088xx1 has carried out 7 yuan of a payment in 20150101 02:32:12 again,
The each second for needing to update 20150101 02:32:12 summarizes data, and 20150101 02:32's summarizes data per minute,
Summarize data per hour when 20150101 02,20150101 summarize data daily.Updated data are as follows:
The data stored in the first storage format are as follows:
rowkey |
timestamp |
D |
H1 |
H2 |
2088xx1 |
20150101 |
17 yuan |
8 yuan |
9 yuan |
The data stored in the second storage format are as follows:
rowkey |
timestamp |
M |
S1 |
S12 |
2088xx1_20150101-01 |
20150101 01:00 points |
8 yuan |
3 yuan |
5 yuan |
2088xx1_20150101-02 |
20150101 02:32 points |
9 yuan |
It is empty |
9 yuan |
It is will be explained in detail after accumulating user behavior data using aforesaid way below, user behavior data is inquired
Process:
When inquiring user behavior data, can be divided according to the time range of inquiry, and time-slotting into
Row inquiry acquires data.
For example, wanting counting user 2088xx1 from 20141125 12:35:09 to the payment of 20141128 15:35:09 gold
Volume.So:
S1, the second storage table of inquiry, rowkey=2088xx1_20141125-12 and ts are 20141125 12:35 points
To 20141125 12:59 points of data.It is possible thereby to which 20141125 12:35:09 are calculated to 20141125 12:59:59
Data.
S2, the first storage table of inquiry, the data that rowkey=2088xx1 and ts are 20141125 to 20141128.By
20141125 13:00:00 can be calculated to the data of 20141128 15:00:00 in this.
S3, the second storage table of inquiry, rowkey=2088xx1_20141128-15 and ts are 20141128 15:00 points
To 20141128 15:35 points of data.It is possible thereby to which 20141128 15:00:00 are calculated to 20141128 15:35:09
Data.
The data summarization that above three step is calculated can be obtained user 2088xx1 from 20141125 12:35:09
To the payment amount of 20141128 15:35:09.
It is understood that starting the data terminated to any time for any time, it is only necessary at most inquire three times
User behavior data library, so that it may obtain precision to second accumulation data.Simultaneously because data will be summarized each second, summarized per minute
Data with summarize data per hour, summarize data daily and be stored separately, compared with the existing technology, the data volume inquired will significantly
It reduces, efficiency improves a lot.In addition, for the data that inquiry terminates since any time to current time, due to current
To be exactly that current time is corresponding summarize data daily to the data that summarize of time, therefore only needs above-mentioned S1, S2 step, can inquire
It obtains.Therefore it may only be necessary to twice inquiry user behavior data library can be obtained precision to the second accumulation data.
As shown in figure 3, in one embodiment of the application, the data storage device, including user behavior data library 20,
The user behavior data library 20 includes:
Data memory format module 201, for that will summarize per hour, data are corresponding with the time to summarize between data daily
Establish incidence relation and form the first storage organization, and will summarize each second data it is corresponding with the time it is per minute summarize data it
Between establish incidence relation formed the second storage organization;
Mathematical logic memory module 203, for being summarized the user behavior data of same dimension daily, being converged per hour
The General Logistics Department stores summarizing data daily and summarizing data per hour into the first storage organization, and will summarize data with daily
Summarize per hour the same dimension of data user behavior data summarize according to per minute, each second after, by total amount per minute
According to summarize data each second and store into the second storage organization.
So-called same dimension in present embodiment indicates meaning phase represented by the user behavior data for needing to accumulate
Together.For example, the dimension for the user behavior data accumulated can be the quantity for logging in website in a period of time;When can also be one section
Interior payment amount etc..Hereinafter, by with the payment amount in a period of time this dimension as an example, to present techniques side
Case is described in detail.
In the present embodiment, the user behavior data of accumulation can be stored in (such as the HBase number of user behavior data library 20
According to library) in comprising the first storage organization and the second storage organization.
Wherein, data will be summarized daily and summarizes data storage per hour into the first storage organization.It will summarize per minute
Summarize data and each second data storage into the second storage organization.
Further, in the present embodiment, first storage organization includes a plurality of serial data, every data string by when
Between window form, and the time window of every data string can respectively correspond storage one summarize data daily and summarize data daily with this
Time, corresponding one or more summarized data per hour;
Second storage organization includes a plurality of serial data, and every data string is made of multiple time windows, and every data
Multiple time windows of string can respectively correspond that storage one summarizes data per minute and to summarize data time per minute with this corresponding
One or more each seconds summarize data.
Join Fig. 2 a signal, first storage organization include 3 data strings, every data string by D, H0, H1 ...
H23,25 time windows form altogether, wherein the time window that D is indicated can store summarizes data daily;H0, H1 ... H23 are indicated
The time windows of lower 24 hours on the same day (from 0 point to 23 point) can store and corresponding summarize data per hour.It is understood that
In a data string, the total value of H0, H1 ... H23 time window store data inside, equal to the value of D time window store data inside.
Join Fig. 2 b signal, second storage organization include 3 data strings, every data string by M, S0, S1 ...
S59,61 time windows form altogether, wherein the time window that M is indicated can store summarizes data per minute;S0, S1 ... S59 table
The time window of lower 60 seconds of one minute shown (from 0 second to 59 second), which can store, summarizes data corresponding each second.It is understood that
In a data string, the total value of S, S1 ... S59 time window store data inside, equal to the value of M time window store data inside.
Further, the user behavior data library 20 further includes mark module 205, is used for as the configuration of each serial data
One unique timestamp.
In the present embodiment, first storage organization can be the first storage table, and the column of first storage table include
One column summarize data daily, and multiple row corresponding with data time is summarized daily summarizes data per hour.Join shown in Fig. 2 a, institute
Stating the first storage table can be based on rowkey (such as id information), and the first column data can be to summarize data daily, and following 24
Column data can be to summarize data per hour, and can also be 1~24 column data certainly is to summarize data per hour, and the 25th column data is every
Day summarizes data, and those skilled in the art can change the sequence according to customary means.In addition, for the unique of each serial data configuration
Timestamp (timestamp writes a Chinese character in simplified form ts) can be the date on the day of the serial data, to identify which day user's row the serial data be
For data.
Second storage organization can be the second storage table, and the column of second storage table include a column total amount per minute
According to, and summarize data multiple row each second corresponding with data time is summarized per minute.Join shown in Fig. 2 b, second storage table
Can based on rowkey (such as id information)+hour (such as relative users behavioral data occur time, be accurate to hour),
First column data is to summarize data per minute, and following 60 column data is to summarize data each second, can also be 1~60 columns certainly
According to summarize data each second, the 61st column data is to summarize data per minute, and those skilled in the art can become according to customary means
Change the sequence.In addition, the unique time stamps (timestamp writes a Chinese character in simplified form ts) for the configuration of each serial data can be that the serial data is current
Minutes, to identify that the serial data is the user behavior data of which minute.
Certainly, in the present embodiment, above-mentioned column can also be replaced with row, to realize the first essentially identical storage
Structure and the second storage organization:
First storage format is the first storage table, and the row of first storage table includes that a line summarizes data daily,
And multirow corresponding with data time is summarized daily summarizes data per hour;
Second storage format is the second storage table, and the row of second storage table includes a line total amount per minute
According to, and summarize data multirow each second corresponding with data time is summarized per minute.
The specifically derivation that signal can be beyond all doubt by Fig. 2 a, Fig. 2 b and the above-mentioned description to first/second storage table column
It obtains, details are not described herein.
Further, in the present embodiment, due to the column data of the first storage table and the second storage table be it is dynamic, because
This, the mathematical logic memory module 203 is also used to: if the storage numerical value of all time windows is all 0 in certain column/row, not being deposited
Store up the column/row.It is possible thereby to save many memory spaces.It is illustrated below by way of a specific example:
Assuming that get the user that id information is 2088xx1 has carried out 3 payments respectively within following times:
3 yuan were paid at 20150101 01:00:01 seconds;
5 yuan were paid at 20150101 01:00:12 seconds;
2 yuan were paid at 20150101 02:32:12 seconds.
So, the data stored in the first storage format are as follows:
rowkey |
timestamp |
D |
H1 |
H2 |
2088xx1 |
20150101 |
10 yuan |
8 yuan |
2 yuan |
The data stored in the second storage format are as follows:
rowkey |
timestamp |
M |
S1 |
S12 |
2088xx1_20150101-01 |
20150101 01:00 points |
8 yuan |
3 yuan |
5 yuan |
2088xx1_20150101-02 |
20150101 02:32 points |
2 yuan |
It is empty |
2 yuan |
Discovery that can be beyond all doubt from above-mentioned example, because of 20150101 01:00:01 to 20150101 01:00:12
The storage numerical value of corresponding time window is 0, and the corresponding time window of 20150101 02:32:00 to 20150101 02:32:12
Storage numerical value be also 0, therefore do not store S2~S11 column.
Certainly, above-mentioned example is to store summarize data daily, summarize data per hour, summarize per minute by way of column
Data, each second summarize data, and those skilled in the art can also be used in capable mode and store summarizes data, per hour daily
Summarize data, summarize data per minute, each second summarizes in data, details are not described herein.
Further, in the present embodiment, the user behavior data library 20 further includes update module 207, it is described more
New module be used for when getting new user behavior data, summarize each second of synchronized update and current time matches data,
Summarize data per minute, summarize data per hour, and summarizing data daily.Still continue to illustrate with above example:
If the user that id information is 2088xx1 has carried out 7 yuan of a payment in 20150101 02:32:12 again,
The each second for needing to update 20150101 02:32:12 summarizes data, and 20150101 02:32's summarizes data per minute,
Summarize data per hour when 20150101 02,20150101 summarize data daily.Updated data are as follows:
The data stored in the first storage format are as follows:
rowkey |
timestamp |
D |
H1 |
H2 |
2088xx1 |
20150101 |
17 yuan |
8 yuan |
9 yuan |
The data stored in the second storage format are as follows:
rowkey |
timestamp |
M |
S1 |
S12 |
2088xx1_20150101-01 |
20150101 01:00 points |
8 yuan |
3 yuan |
5 yuan |
2088xx1_20150101-02 |
20150101 02:32 points |
9 yuan |
It is empty |
9 yuan |
In conclusion passing through the date storage method and device of the application, the storage knot to user behavior data can be optimized
Structure, data will be summarized each second, summarize data per minute and summarize per hour data, it is daily summarize data and be stored separately, with
Inquire to the user behavior data of accumulation/reading/when counting, reduces the access times to database, optimize database
Storage, reading performance improve response speed.
It is apparent to those skilled in the art that for convenience and simplicity of description, the device of foregoing description,
The specific work process of device and module, can be with reference to the corresponding process in preceding method embodiment, and details are not described herein.
In several embodiments provided herein, it should be understood that disclosed device, device and method can
To realize by another way.For example, device embodiments described above are only schematical, for example, the mould
The division of block, only a kind of logical function partition, there may be another division manner in actual implementation, for example, multiple modules or
Component may be combined or can be integrated into another device, or some features can be ignored or not executed.Another point is shown
The mutual coupling, direct-coupling or communication connection shown or discussed can be through some interfaces, between device or module
Coupling or communication connection are connect, can be electrical property, mechanical or other forms.
The module as illustrated by the separation member may or may not be physically separated, aobvious as module
The component shown may or may not be physical module, it can and it is in one place, or may be distributed over multiple
On network module.Some or all of the modules therein can be selected to realize present embodiment scheme according to the actual needs
Purpose.
In addition, can integrate in a processing module in each functional module in each embodiment of the application, it can also
To be that modules physically exist alone, can also be integrated in a module with 2 or 2 with upper module.Above-mentioned integrated mould
Block both can take the form of hardware realization, can also realize in the form of hardware adds software function module.
The above-mentioned integrated module realized in the form of software function module, can store and computer-readable deposit at one
In storage media.Above-mentioned software function module is stored in a storage medium, including some instructions are used so that a computer
It is each that device (can be personal computer, server or network equipment etc.) or processor (processor) execute the application
The part steps of embodiment the method.And storage medium above-mentioned includes: USB flash disk, mobile hard disk, read-only memory (Read-
Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic or disk etc. it is various
It can store the medium of program code.
Finally, it should be noted that embodiment of above is only to illustrate the technical solution of the application, rather than its limitations;To the greatest extent
Pipe is described in detail the application referring to aforementioned embodiments, those skilled in the art should understand that: its according to
It can so modify to technical solution documented by aforementioned each embodiment, or part of technical characteristic is equal
Replacement;And these are modified or replaceed, each embodiment technical solution of the application that it does not separate the essence of the corresponding technical solution
Spirit and scope.