CN106326220A - Data storage method and apparatus - Google Patents

Data storage method and apparatus Download PDF

Info

Publication number
CN106326220A
CN106326220A CN201510333071.8A CN201510333071A CN106326220A CN 106326220 A CN106326220 A CN 106326220A CN 201510333071 A CN201510333071 A CN 201510333071A CN 106326220 A CN106326220 A CN 106326220A
Authority
CN
China
Prior art keywords
data
cohersive
integrated data
storage
day
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510333071.8A
Other languages
Chinese (zh)
Other versions
CN106326220B (en
Inventor
窦方钰
冯凯
陈锣斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201510333071.8A priority Critical patent/CN106326220B/en
Publication of CN106326220A publication Critical patent/CN106326220A/en
Application granted granted Critical
Publication of CN106326220B publication Critical patent/CN106326220B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/23Updating
    • G06F16/2379Updates performed during online database operations; commit processing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2282Tablespace storage structures; Management thereof

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Debugging And Monitoring (AREA)

Abstract

The invention discloses a data storage method and apparatus. The method comprises the steps of performing daily summarization and hourly summarization on user behavior data of the same dimension, and storing the daily summarized data and the hourly summarized data in a first storage structure, wherein the first storage structure establishes an associative relationship between the hourly summarized data and the daily summarized data corresponding to the time; and performing summarization per minute and second on the user behavior data with the same dimension as the daily summarized data and the hourly summarized data, and storing the data summarized per minute and the data summarized per second in a second storage structure, wherein the second storage structure establishes an associative relationship between the data summarized per second and the data summarized per minute and corresponding to the time. Through the data storage method and apparatus, the storage structure of the user behavior data can be optimized, so that the access frequency of a database is reduced when the accumulated user behavior data is subjected to query/reading/statistics; and the storage and reading performance of the database is optimized, so that the response speed is increased.

Description

Date storage method and device
Technical field
The application relates to computer realm, especially relates to a kind of date storage method and device.
Background technology
In a lot of application scenarios, all user behavior data can be carried out accumulation storage.These user behavior numbers According to the historical operation behavior of user can be embodied, use these user behavior datas can analyze user behavior (example In one hour, the gold that website, one IP of statistics paid in one day several times has been logged in as added up a user Volume has how many etc.), and then provide better service (such as, it is judged that whether the operation of user exists industry Business risk etc.).
Currently for this demand, mainly realize by the way of the detailed flowing water of storage or storage accumulation account. But, these modes are not only higher to the performance requirement such as database server, network, and response speed is relatively Slowly.
Summary of the invention
The purpose of the application is to provide a kind of date storage method and device, and it can be constant at storage data precision In the case of, optimize the storage of data base, reading performance.
For realizing one of above-mentioned application purpose, the application one embodiment provides a kind of date storage method, Described method includes:
After carrying out the user behavior data of same dimension collecting every day, collecting per hour, storage collects every day Data and per hour cohersive and integrated data are in the first storage organization, and wherein, described first storage organization will per hour Cohersive and integrated data sets up incidence relation between cohersive and integrated data at every day corresponding with the time;
By with cohersive and integrated data every day and per hour cohersive and integrated data with dimension user behavior data according to per minute, After each second collects, store cohersive and integrated data per minute and cohersive and integrated data each second in the second storage organization, its In, described second storage organization is built between cohersive and integrated data per minute corresponding with the time for cohersive and integrated data each second Vertical incidence relation.
As the further improvement of the application one embodiment, described first storage organization includes a plurality of serial data, Every data string is made up of time window, and the corresponding storage respectively of the time window of every data string collects one every day Data and the one or more per hour cohersive and integrated data corresponding with this cohersive and integrated data time every day;
Described second storage organization includes that a plurality of serial data, every data string are made up of multiple time windows, and often Multiple time windows of data string correspondence one cohersive and integrated data per minute of storage respectively and with this total amount per minute According to the cohersive and integrated data one or more each second that the time is corresponding.
As the further improvement of the application one embodiment,
Described first storage organization is the first storage table, and the column/row of described first storage table includes column/row every day Cohersive and integrated data, and the many column/rows per hour cohersive and integrated data corresponding with cohersive and integrated data time every day;
Described second storage format is the second storage table, and the column/row of described second storage table includes string/lines per minute Clock cohersive and integrated data, and the multiple row corresponding with the cohersive and integrated data time per minute/lines per second clock cohersive and integrated data.
As the further improvement of the application one embodiment, described method also includes:
A unique timestamp is configured for each serial data.
As the further improvement of the application one embodiment, described method also includes:
If the storage numerical value of all time windows is all 0 in certain column/row, the most do not store this column/row.
As the further improvement of the application one embodiment, described method also includes:
When getting new user behavior data, synchronized update and total amount each second of current time matches According to, cohersive and integrated data per minute, per hour cohersive and integrated data, and cohersive and integrated data every day.
For realizing one of above-mentioned application purpose, the application one embodiment provides a kind of data storage device, Described device includes:
Data memory format module, for by corresponding with the time for cohersive and integrated data per hour every day cohersive and integrated data it Between set up incidence relation and form the first storage organization, and by corresponding with the time for cohersive and integrated data each second every point Set up incidence relation between clock cohersive and integrated data and form the second storage organization;
Mathematical logic memory module, for carrying out the user behavior data of same dimension collecting, the least every day Time collect after, by cohersive and integrated data every day and per hour cohersive and integrated data store to the first storage organization, and will With cohersive and integrated data every day and per hour cohersive and integrated data with the user behavior data of dimension according to per minute, per second After clock collects, cohersive and integrated data per minute and cohersive and integrated data each second are stored to the second storage organization.
As the further improvement of the application one embodiment, described first storage organization includes a plurality of serial data, Every data string is made up of time window, and the time window of every data string can be distinguished corresponding storage and converge one every day Total data and the one or more per hour cohersive and integrated data corresponding with this cohersive and integrated data time every day;
Described second storage organization includes a plurality of serial data, and every data string is made up of time window, and every number According to string time window respectively correspondence one cohersive and integrated data per minute of storage and with this cohersive and integrated data time pair per minute The cohersive and integrated data one or more each second answered.
As the further improvement of the application one embodiment,
Described first storage organization is the first storage table, and the column/row of described first storage table includes column/row every day Cohersive and integrated data, and the many column/rows per hour cohersive and integrated data corresponding with cohersive and integrated data time every day;
Described second storage format is the second storage table, and the column/row of described second storage table includes string/lines per minute Clock cohersive and integrated data, and the multiple row corresponding with the cohersive and integrated data time per minute/lines per second clock cohersive and integrated data.
As the further improvement of the application one embodiment, described device also includes:
Mark module, for configuring a unique timestamp for each serial data.
As the further improvement of the application one embodiment, described mathematical logic memory module is additionally operable to:
If the storage numerical value of all time windows is all 0 in certain column/row, the most do not store this column/row.
As the further improvement of the application one embodiment, described device also includes:
More new module, for when getting new user behavior data, drive the first logic storing module and Second logic storing module synchronized update and cohersive and integrated data each second of current time matches, total amount per minute According to, the cohersive and integrated data of cohersive and integrated data, and every day per hour.
Relative to prior art, the application have the technical effect that the date storage method by the application and Device, can optimize the storage organization to user behavior data, to look into the user behavior data accumulated During inquiry/read/statistics, reduce the access times to data base, optimize the storage of data base, reading performance, carry High response speed.
Accompanying drawing explanation
Fig. 1 is the flow chart of date storage method in the application one embodiment;
Fig. 2 a is the first storage organization schematic diagram in the application one embodiment;
Fig. 2 b is the second storage organization schematic diagram in the application one embodiment;
Fig. 3 is the module map of data storage device in the application one embodiment.
Detailed description of the invention
Below with reference to detailed description of the invention shown in the drawings, the application is described in detail.But these are real The mode of executing is not limiting as the application, and those of ordinary skill in the art is made according to these embodiments Structure, method or conversion functionally are all contained in the protection domain of the application.
Statistical analysis to user behavior data, is usually and changes along with the change of business, Ke Nengsui Shi Zengjia, the dimension of amendment statistics, the data volume etc. of statistics, therefore, it is common to use there is no the number of framework User behavior data according to storehouse (such as HBase data base) storage accumulation.
As it is shown in figure 1, in the application one embodiment, described date storage method, including:
S100, carry out the user behavior data of same dimension collecting every day, collect per hour after, storage is every Day cohersive and integrated data and per hour cohersive and integrated data are in the first storage organization, wherein, and described first storage organization general The every day that cohersive and integrated data is corresponding with the time per hour sets up incidence relation between cohersive and integrated data;
S200, by with cohersive and integrated data every day and per hour cohersive and integrated data with dimension user behavior data according to After per minute, each second collects, store cohersive and integrated data per minute and cohersive and integrated data each second to the second storage knot In structure, wherein, described second storage organization is by total amount per minute corresponding with the time for cohersive and integrated data each second Incidence relation is set up between according to.
So-called same dimension in present embodiment, represents the meaning represented by user behavior data needing accumulation Justice is identical.Such as, the dimension of the user behavior data accumulated can be to log in the number of website in a period of time Amount;It is alternatively the payment etc. in a period of time.Hereinafter, by with the payment in a period of time this Technical scheme, as example, is described in detail by dimension.
In the present embodiment, the user behavior data of accumulation storage can be stored in user behavior data storehouse (example Such as HBase data base) in, it includes the first storage organization and the second storage organization.
Wherein, by cohersive and integrated data every day with during cohersive and integrated data stores the first storage organization per hour.By every point Clock cohersive and integrated data and cohersive and integrated data each second store in the second storage organization.
Further, in the present embodiment, described first storage organization includes a plurality of serial data, every number Be made up of time window according to string, and the time window of every data string can distinguish corresponding store one every day cohersive and integrated data And the one or more per hour cohersive and integrated data corresponding with this cohersive and integrated data time every day;
Described second storage organization includes that a plurality of serial data, every data string are made up of multiple time windows, and often Multiple time windows of data string can one cohersive and integrated data per minute of corresponding storage and per minute collecting with this respectively The cohersive and integrated data one or more each second that data time is corresponding.
Ginseng Fig. 2 a signal, described first storage organization includes 3 data strings, every data string by D, H0, H1 ... H23, altogether 25 time window compositions, wherein, the time window that D represents can store and collect every day Data;H0, H1 ... lower 24 hours of the same day that H23 represents, the time window of (from 0 o'clock to 23 o'clock) could The cohersive and integrated data per hour that storage is corresponding.It is understood that in a data string, H0, H1 ... H23 The total value of time window store data inside, equal to the value of D time window store data inside.
Ginseng Fig. 2 b signal, described second storage organization includes 3 data strings, every data string by M, S0, S1 ... S59, altogether 61 time window compositions, wherein, the time window that M represents can store remittance per minute Total data;S0, S1 ... the time window of a minute lower 60 seconds (from 0 second to 59 second) that S59 represents can be deposited Cohersive and integrated data each second that storage is corresponding.It is understood that in a data string, S, S1 ... during S59 Between the total value of window store data inside, equal to the value of M time window store data inside.
Further, described method also includes: configure a unique timestamp for each serial data.
In the present embodiment, described first storage organization can be the first storage table, described first storage table Row include string cohersive and integrated data every day, and the multiple row per hour total amount corresponding with cohersive and integrated data time every day According to.Ginseng Fig. 2 a shown in, described first storage table can based on rowkey (such as id information), first Column data can be cohersive and integrated data every day, and following 24 column data can be cohersive and integrated data per hour, certainly may also be 1~24 column data are cohersive and integrated data per hour, and the 25th column data is cohersive and integrated data every day, those skilled in the art This order can be changed according to customary means.It addition, be each serial data configuration unique time stamps (timestamp, Write a Chinese character in simplified form ts) can be date on this serial data same day, to identify which day user behavior data this serial data be.
Described second storage organization can be the second storage table, and the row of described second storage table include that string is per minute Cohersive and integrated data, and the multiple row each second cohersive and integrated data corresponding with the cohersive and integrated data time per minute.Ginseng Fig. 2 b institute Show, described second storage table can based on rowkey (such as id information)+hour (such as relative users The time that behavioral data occurs, it is accurate to hour), first row data are cohersive and integrated data per minute, following 60 Column data is cohersive and integrated data each second, and certainly may also be 1~60 column data is cohersive and integrated data each second, the 61st Column data is cohersive and integrated data per minute, and those skilled in the art can change this order according to customary means.It addition, The unique time stamps (timestamp writes a Chinese character in simplified form ts) configured for each serial data can be current the dividing of this serial data Clock time, to identify that this serial data is the user behavior data of which minute.
Certainly, in the present embodiment, it is also possible to above-mentioned row are replaced with row, essentially identical to realize First storage organization and the second storage organization:
Described first storage format is the first storage table, and the row of described first storage table includes that a line collects every day Data, and the multirow per hour cohersive and integrated data corresponding with cohersive and integrated data time every day;
Described second storage format is the second storage table, and the row of described second storage table includes a line remittance per minute Total data, and the multirow each second cohersive and integrated data corresponding with the cohersive and integrated data time per minute.
Concrete signal can be by Fig. 2 a, Fig. 2 b, and the above-mentioned description to first/second storage tabular is beyond all doubt Be derived from, do not repeat them here.
Further, in the present embodiment, it is due to the column data of the first storage table and the second storage table State, the most described method also includes: if in certain column/row, the storage numerical value of all time windows is all 0, the most not Store this column/row.Thus can save a lot of memory space.Illustrate below by way of a concrete example:
Assume to get the user that id information is 2088xx1 within the following time, to have carried out 3 times respectively pay:
3 yuan are paid in 20150101 01:00:01 seconds;
5 yuan are paid in 20150101 01:00:12 seconds;
2 yuan are paid in 20150101 02:32:12 seconds.
So, in the first storage format, the data of storage are:
rowkey timestamp D H1 H2
2088xx1 20150101 10 yuan 8 yuan 2 yuan
In the second storage format, the data of storage are:
rowkey timestamp M S1 S12
2088xx1_20150101‐01 20150101 01:00 divide 8 yuan 3 yuan 5 yuan
2088xx1_20150101‐02 20150101 02:32 divide 2 yuan Empty 2 yuan
Discovery that can be beyond all doubt from above-mentioned example, because 20150101 01:00:01 to 20150101 The storage numerical value of the time window that 01:00:12 is corresponding is 0, and 20150101 02:32:00 to 20150101 The storage numerical value of the time window that 02:32:12 is corresponding is also 0, therefore does not store S2~S11 row.
Certainly, above-mentioned example is to store cohersive and integrated data every day, per hour cohersive and integrated data, every by the way of row Minute cohersive and integrated data, cohersive and integrated data each second, those skilled in the art also can be used in the mode of row and store Every day cohersive and integrated data, per hour in cohersive and integrated data, cohersive and integrated data per minute, each second cohersive and integrated data, at this Repeat no more.
Further, in the present embodiment, when getting new user behavior data, synchronized update with Cohersive and integrated data each second of current time matches, cohersive and integrated data per minute, per hour cohersive and integrated data, and often Day cohersive and integrated data.Still proceed to illustrate with above example:
If the user that id information is 2088xx1 has carried out again of 7 yuan at 20150101 02:32:12 Pay, then need to update cohersive and integrated data each second of 20150101 02:32:12, every point of 20150101 02:32 Clock cohersive and integrated data, the cohersive and integrated data per hour when 20,150,101 02, cohersive and integrated data every day of 20150101.More Data after Xin are as follows:
In the first storage format, the data of storage are:
rowkey timestamp D H1 H2
2088xx1 20150101 17 yuan 8 yuan 9 yuan
In the second storage format, the data of storage are:
rowkey timestamp M S1 S12
2088xx1_20150101‐01 20150101 01:00 divide 8 yuan 3 yuan 5 yuan
2088xx1_20150101-02 20150101 02:32 divide 9 yuan Empty 9 yuan
After below will be explained in detail employing aforesaid way accumulation user behavior data, user behavior data is carried out The process of inquiry:
When user behavior data is inquired about, can divide according to the time range of inquiry, and timesharing Between sheet carry out inquiry and acquire data.
Such as, counting user 2088xx1 from 20141125 12:35:09 to 20141128 15:35:09's Payment.So:
S1, inquiry the second storage table, rowkey=2088xx1_20141125-12 and ts is 20141125 12:35 assigns to the data that 20141125 12:59 divide.Thus can be calculated 20141125 12:35:09 extremely The data of 20141125 12:59:59.
S2, inquiry the first storage table, rowkey=2088xx1 and ts is 20141125 to 20141128 Data.Thus can be calculated the data of 20141125 13:00:00 to 20141128 15:00:00.
S3, inquiry the second storage table, rowkey=2088xx1_20141128-15 and ts is 20141128 15:00 assigns to the data that 20141128 15:35 divide.Thus can be calculated 20141128 15:00:00 to arrive The data of 20141128 15:35:09.
By the above three calculated data summarization of step, user 2088xx1 can be acquired from 20141125 The payment of 12:35:09 to 20141128 15:35:09.
It is understood that start the data terminated to random time for random time, it is only necessary to most three Secondary inquiry user behavior data storehouse, it is possible to obtain precision to second accumulation data.Simultaneously because will converge each second Total data, cohersive and integrated data per minute are stored separately with cohersive and integrated data per hour, cohersive and integrated data every day, relative to Prior art, the data volume inquired will greatly reduce, and efficiency improves a lot.It addition, for inquiry from Random time starts the data terminated to current time, owing to the cohersive and integrated data of current time is exactly current time Corresponding cohersive and integrated data every day, therefore has only to above-mentioned S1, S2 step, can inquire about and obtain.Therefore, only Twice inquiry user behavior data storehouse is needed i.e. to can get the precision accumulation data to the second.
As it is shown on figure 3, in the application one embodiment, described data storage device, including user's row For data base 20, described user behavior data storehouse 20 includes:
Data memory format module 201, for by cohersive and integrated data every day corresponding with the time for cohersive and integrated data per hour Between set up incidence relation and form the first storage organization, and by corresponding with the time for cohersive and integrated data each second every Set up incidence relation between minute cohersive and integrated data and form the second storage organization;
Mathematical logic memory module 203, for carrying out collecting every day, often by the user behavior data of same dimension After hour collecting, by cohersive and integrated data every day and per hour cohersive and integrated data store to the first storage organization, and By with cohersive and integrated data every day and per hour cohersive and integrated data with the user behavior data of dimension according to per minute, every After second collecting, cohersive and integrated data per minute and cohersive and integrated data each second are stored to the second storage organization.
So-called same dimension in present embodiment, represents the meaning represented by user behavior data needing accumulation Justice is identical.Such as, the dimension of the user behavior data accumulated can be to log in the number of website in a period of time Amount;It is alternatively the payment etc. in a period of time.Hereinafter, by with the payment in a period of time this Technical scheme, as example, is described in detail by dimension.
In the present embodiment, the user behavior data of accumulation can be stored in user behavior data storehouse 20 (such as HBase data base) in, it includes the first storage organization and the second storage organization.
Wherein, by cohersive and integrated data every day with during cohersive and integrated data stores the first storage organization per hour.By every point Clock cohersive and integrated data and cohersive and integrated data each second store in the second storage organization.
Further, in the present embodiment, described first storage organization includes a plurality of serial data, every number Be made up of time window according to string, and the time window of every data string can distinguish corresponding store one every day cohersive and integrated data And the one or more per hour cohersive and integrated data corresponding with this cohersive and integrated data time every day;
Described second storage organization includes that a plurality of serial data, every data string are made up of multiple time windows, and often Multiple time windows of data string can one cohersive and integrated data per minute of corresponding storage and per minute collecting with this respectively The cohersive and integrated data one or more each second that data time is corresponding.
Ginseng Fig. 2 a signal, described first storage organization includes 3 data strings, every data string by D, H0, H1 ... H23, altogether 25 time window compositions, wherein, the time window that D represents can store and collect every day Data;H0, H1 ... lower 24 hours of the same day that H23 represents, the time window of (from 0 o'clock to 23 o'clock) could The cohersive and integrated data per hour that storage is corresponding.It is understood that in a data string, H0, H1 ... H23 The total value of time window store data inside, equal to the value of D time window store data inside.
Ginseng Fig. 2 b signal, described second storage organization includes 3 data strings, every data string by M, S0, S1 ... S59, altogether 61 time window compositions, wherein, the time window that M represents can store remittance per minute Total data;S0, S1 ... the time window of a minute lower 60 seconds (from 0 second to 59 second) that S59 represents can be deposited Cohersive and integrated data each second that storage is corresponding.It is understood that in a data string, S, S1 ... during S59 Between the total value of window store data inside, equal to the value of M time window store data inside.
Further, described user behavior data storehouse 20 also includes identifying module 205, and it is used for as each number According to string configuration one unique timestamp.
In the present embodiment, described first storage organization can be the first storage table, described first storage table Row include string cohersive and integrated data every day, and the multiple row per hour total amount corresponding with cohersive and integrated data time every day According to.Ginseng Fig. 2 a shown in, described first storage table can based on rowkey (such as id information), first Column data can be cohersive and integrated data every day, and following 24 column data can be cohersive and integrated data per hour, certainly may also be 1~24 column data are cohersive and integrated data per hour, and the 25th column data is cohersive and integrated data every day, those skilled in the art This order can be changed according to customary means.It addition, be each serial data configuration unique time stamps (timestamp, Write a Chinese character in simplified form ts) can be date on this serial data same day, to identify which day user behavior data this serial data be.
Described second storage organization can be the second storage table, and the row of described second storage table include that string is per minute Cohersive and integrated data, and the multiple row each second cohersive and integrated data corresponding with the cohersive and integrated data time per minute.Ginseng Fig. 2 b institute Show, described second storage table can based on rowkey (such as id information)+hour (such as relative users The time that behavioral data occurs, it is accurate to hour), first row data are cohersive and integrated data per minute, following 60 Column data is cohersive and integrated data each second, and certainly may also be 1~60 column data is cohersive and integrated data each second, the 61st Column data is cohersive and integrated data per minute, and those skilled in the art can change this order according to customary means.It addition, The unique time stamps (timestamp writes a Chinese character in simplified form ts) configured for each serial data can be current the dividing of this serial data Clock time, to identify that this serial data is the user behavior data of which minute.
Certainly, in the present embodiment, it is also possible to above-mentioned row are replaced with row, essentially identical to realize First storage organization and the second storage organization:
Described first storage format is the first storage table, and the row of described first storage table includes that a line collects every day Data, and the multirow per hour cohersive and integrated data corresponding with cohersive and integrated data time every day;
Described second storage format is the second storage table, and the row of described second storage table includes a line remittance per minute Total data, and the multirow each second cohersive and integrated data corresponding with the cohersive and integrated data time per minute.
Concrete signal can be by Fig. 2 a, Fig. 2 b, and the above-mentioned description to first/second storage tabular is beyond all doubt Be derived from, do not repeat them here.
Further, in the present embodiment, it is due to the column data of the first storage table and the second storage table State, therefore, described mathematical logic memory module 203 is additionally operable to: if all time windows deposits in certain column/row Storage numerical value is all 0, does not the most store this column/row.Thus can save a lot of memory space.Below by way of a tool Body example illustrates:
Assume to get the user that id information is 2088xx1 within the following time, to have carried out 3 times respectively pay:
3 yuan are paid in 20150101 01:00:01 seconds;
5 yuan are paid in 20150101 01:00:12 seconds;
2 yuan are paid in 20150101 02:32:12 seconds.
So, in the first storage format, the data of storage are:
rowkey timestamp D H1 H2
2088xx1 20150101 10 yuan 8 yuan 2 yuan
In the second storage format, the data of storage are:
rowkey timestamp M S1 S12
2088xx1_20150101‐01 20150101 01:00 divide 8 yuan 3 yuan 5 yuan
2088xx1_20150101‐02 20150101 02:32 divide 2 yuan Empty 2 yuan
Discovery that can be beyond all doubt from above-mentioned example, because 20150101 01:00:01 to 20150101 The storage numerical value of the time window that 01:00:12 is corresponding is 0, and 20150101 02:32:00 to 20150101 The storage numerical value of the time window that 02:32:12 is corresponding is also 0, therefore does not store S2~S11 row.
Certainly, above-mentioned example is to store cohersive and integrated data every day, per hour cohersive and integrated data, every by the way of row Minute cohersive and integrated data, cohersive and integrated data each second, those skilled in the art also can be used in the mode of row and store Every day cohersive and integrated data, per hour in cohersive and integrated data, cohersive and integrated data per minute, each second cohersive and integrated data, at this Repeat no more.
Further, in the present embodiment, described user behavior data storehouse 20 also includes more new module 207, Described more new module is used for when getting new user behavior data, synchronized update and current time matches Each second cohersive and integrated data, cohersive and integrated data per minute, per hour cohersive and integrated data, and cohersive and integrated data every day.Still So proceed to illustrate with above example:
If the user that id information is 2088xx1 has carried out again of 7 yuan at 20150101 02:32:12 Pay, then need to update cohersive and integrated data each second of 20150101 02:32:12, every point of 20150101 02:32 Clock cohersive and integrated data, the cohersive and integrated data per hour when 20,150,101 02, cohersive and integrated data every day of 20150101.More Data after Xin are as follows:
In the first storage format, the data of storage are:
rowkey timestamp D H1 H2
2088xx1 20150101 17 yuan 8 yuan 9 yuan
In the second storage format, the data of storage are:
rowkey timestamp M S1 S12
2088xx1_20150101‐01 20150101 01:00 divide 8 yuan 3 yuan 5 yuan
2088xx1_20150101-02 20150101 02:32 divide 9 yuan Empty 9 yuan
In sum, by date storage method and the device of the application, can optimize user behavior data Storage organization, collects cohersive and integrated data each second, cohersive and integrated data per minute with cohersive and integrated data per hour, every day Data are stored separately, with to accumulation user behavior data inquire about/read/add up time, reduce to data The access times in storehouse, optimize the storage of data base, reading performance, improve response speed.
Those skilled in the art is it can be understood that arrive, and for convenience and simplicity of description, above-mentioned retouches The specific works process of the device stated, device and module, be referred in preceding method embodiment is right Answer process, do not repeat them here.
In several embodiments provided herein, it should be understood that disclosed device, device And method, can realize by another way.Such as, device embodiments described above is only It is schematic, such as, the division of described module, it is only a kind of logic function and divides, actual realization Time can have other dividing mode, the most multiple modules or assembly can in conjunction with or be desirably integrated into another One device, or some features can ignore, or do not perform.Another point, shown or discussed is mutual Between coupling or direct-coupling or communication connection can be indirect by some interfaces, device or module Coupling or communication connection, can be electrical, machinery or other form.
The described module illustrated as separating component can be or may not be physically separate, makees The parts shown for module can be or may not be physical module, i.e. may be located at a place, Or can also be distributed on multiple mixed-media network modules mixed-media.Can select according to the actual needs part therein or The whole module of person realizes the purpose of present embodiment scheme.
It addition, each functional module in each embodiment of the application can be integrated in a processing module In, it is also possible to it is that modules is individually physically present, it is also possible to 2 or 2 are integrated in one with upper module In individual module.Above-mentioned integrated module both can realize to use the form of hardware, it would however also be possible to employ hardware adds The form of software function module realizes.
The above-mentioned integrated module realized with the form of software function module, can be stored in a computer In read/write memory medium.Above-mentioned software function module is stored in a storage medium, including some fingers Make with so that a computer installation (can be personal computer, server, or network equipment etc.) Or processor (processor) performs the part steps of method described in each embodiment of the application.And it is front The storage medium stated includes: USB flash disk, portable hard drive, read only memory (Read-Only Memory, ROM), Random access memory (Random Access Memory, RAM), magnetic disc or CD etc. are various can To store the medium of program code.
Last it is noted that embodiment of above is only in order to illustrate the technical scheme of the application, rather than right It limits;Although the application being described in detail with reference to aforementioned embodiments, this area common Skilled artisans appreciate that the technical scheme described in aforementioned each embodiment still can be repaiied by it Change, or wherein portion of techniques feature is carried out equivalent;And these amendments or replacement, do not make The essence of appropriate technical solution departs from the spirit and scope of the application each embodiment technical scheme.

Claims (12)

1. a date storage method, it is characterised in that described method includes:
After carrying out the user behavior data of same dimension collecting every day, collecting per hour, storage collects every day Data and per hour cohersive and integrated data are in the first storage organization, and wherein, described first storage organization will per hour Cohersive and integrated data sets up incidence relation between cohersive and integrated data at every day corresponding with the time;
By with cohersive and integrated data every day and per hour cohersive and integrated data with dimension user behavior data according to per minute, After each second collects, store cohersive and integrated data per minute and cohersive and integrated data each second in the second storage organization, its In, described second storage organization is built between cohersive and integrated data per minute corresponding with the time for cohersive and integrated data each second Vertical incidence relation.
Date storage method the most according to claim 1, it is characterised in that described first storage organization Including a plurality of serial data, every data string is made up of time window, and the time window of every data string correspondence respectively Store one every day cohersive and integrated data and corresponding with this cohersive and integrated data time every day one or more collect per hour Data;
Described second storage organization includes that a plurality of serial data, every data string are made up of multiple time windows, and often Multiple time windows of data string correspondence one cohersive and integrated data per minute of storage respectively and with this total amount per minute According to the cohersive and integrated data one or more each second that the time is corresponding.
User behavior data integrating method the most according to claim 2, it is characterised in that
Described first storage organization is the first storage table, and the column/row of described first storage table includes column/row every day Cohersive and integrated data, and the many column/rows per hour cohersive and integrated data corresponding with cohersive and integrated data time every day;
Described second storage format is the second storage table, and the column/row of described second storage table includes string/lines per minute Clock cohersive and integrated data, and the multiple row corresponding with the cohersive and integrated data time per minute/lines per second clock cohersive and integrated data.
Date storage method the most according to claim 2, it is characterised in that described method also includes:
A unique timestamp is configured for each serial data.
Date storage method the most according to claim 3, it is characterised in that described method also includes:
If the storage numerical value of all time windows is all 0 in certain column/row, the most do not store this column/row.
Date storage method the most according to claim 1, it is characterised in that described method also includes:
When getting new user behavior data, synchronized update and total amount each second of current time matches According to, cohersive and integrated data per minute, per hour cohersive and integrated data, and cohersive and integrated data every day.
7. a data storage device, it is characterised in that described device includes:
Data memory format module, for by corresponding with the time for cohersive and integrated data per hour every day cohersive and integrated data it Between set up incidence relation and form the first storage organization, and by corresponding with the time for cohersive and integrated data each second every point Set up incidence relation between clock cohersive and integrated data and form the second storage organization;
Mathematical logic memory module, for carrying out the user behavior data of same dimension collecting, the least every day Time collect after, by cohersive and integrated data every day and per hour cohersive and integrated data store to the first storage organization, and will With cohersive and integrated data every day and per hour cohersive and integrated data with the user behavior data of dimension according to per minute, per second After clock collects, cohersive and integrated data per minute and cohersive and integrated data each second are stored to the second storage organization.
Data storage device the most according to claim 7, it is characterised in that described first storage organization Including a plurality of serial data, every data string is made up of time window, and the time window of every data string can be the most right Should store one every day cohersive and integrated data and corresponding with this cohersive and integrated data time every day one or more converge per hour Total data;
Described second storage organization includes a plurality of serial data, and every data string is made up of time window, and every number According to string time window respectively correspondence one cohersive and integrated data per minute of storage and with this cohersive and integrated data time pair per minute The cohersive and integrated data one or more each second answered.
User behavior data integrating device the most according to claim 8, it is characterised in that
Described first storage organization is the first storage table, and the column/row of described first storage table includes column/row every day Cohersive and integrated data, and the many column/rows per hour cohersive and integrated data corresponding with cohersive and integrated data time every day;
Described second storage format is the second storage table, and the column/row of described second storage table includes string/lines per minute Clock cohersive and integrated data, and the multiple row corresponding with the cohersive and integrated data time per minute/lines per second clock cohersive and integrated data.
Data storage device the most according to claim 8, it is characterised in that described device also includes:
Mark module, for configuring a unique timestamp for each serial data.
11. data storage devices according to claim 9, it is characterised in that described mathematical logic is deposited Storage module is additionally operable to:
If the storage numerical value of all time windows is all 0 in certain column/row, the most do not store this column/row.
12. data storage devices according to claim 7, it is characterised in that described device also includes:
More new module, for when getting new user behavior data, drive the first logic storing module and Second logic storing module synchronized update and cohersive and integrated data each second of current time matches, total amount per minute According to, the cohersive and integrated data of cohersive and integrated data, and every day per hour.
CN201510333071.8A 2015-06-16 2015-06-16 Date storage method and device Active CN106326220B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510333071.8A CN106326220B (en) 2015-06-16 2015-06-16 Date storage method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510333071.8A CN106326220B (en) 2015-06-16 2015-06-16 Date storage method and device

Publications (2)

Publication Number Publication Date
CN106326220A true CN106326220A (en) 2017-01-11
CN106326220B CN106326220B (en) 2019-08-27

Family

ID=57733480

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510333071.8A Active CN106326220B (en) 2015-06-16 2015-06-16 Date storage method and device

Country Status (1)

Country Link
CN (1) CN106326220B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492009A (en) * 2018-11-25 2019-03-19 杜广香 The method and system of relevance time quantum are identified in big data storage equipment
CN110704466A (en) * 2019-09-27 2020-01-17 武汉极意网络科技有限公司 Black product data storage method and device
EP4170512A4 (en) * 2020-06-30 2023-11-08 Huawei Technologies Co., Ltd. Time series data injection method, time series data query method and database system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141759A (en) * 2007-02-12 2008-03-12 中兴通讯股份有限公司 Call behavior statistical and analytical method and device
CN101860454A (en) * 2010-06-24 2010-10-13 杭州华三通信技术有限公司 Network performance data processing method and device thereof
CN102456065A (en) * 2011-07-01 2012-05-16 中国人民解放军国防科学技术大学 Methods for storing and querying offline historical statistical data of data stream
CN103399945A (en) * 2013-08-15 2013-11-20 成都博云科技有限公司 Data structure based on cloud computing database system
CN104572726A (en) * 2013-10-22 2015-04-29 北京品众互动网络营销技术有限公司 Advertisement analysis method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101141759A (en) * 2007-02-12 2008-03-12 中兴通讯股份有限公司 Call behavior statistical and analytical method and device
CN101860454A (en) * 2010-06-24 2010-10-13 杭州华三通信技术有限公司 Network performance data processing method and device thereof
CN102456065A (en) * 2011-07-01 2012-05-16 中国人民解放军国防科学技术大学 Methods for storing and querying offline historical statistical data of data stream
CN103399945A (en) * 2013-08-15 2013-11-20 成都博云科技有限公司 Data structure based on cloud computing database system
CN104572726A (en) * 2013-10-22 2015-04-29 北京品众互动网络营销技术有限公司 Advertisement analysis method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109492009A (en) * 2018-11-25 2019-03-19 杜广香 The method and system of relevance time quantum are identified in big data storage equipment
CN110704466A (en) * 2019-09-27 2020-01-17 武汉极意网络科技有限公司 Black product data storage method and device
CN110704466B (en) * 2019-09-27 2021-12-17 武汉极意网络科技有限公司 Black product data storage method and device
EP4170512A4 (en) * 2020-06-30 2023-11-08 Huawei Technologies Co., Ltd. Time series data injection method, time series data query method and database system

Also Published As

Publication number Publication date
CN106326220B (en) 2019-08-27

Similar Documents

Publication Publication Date Title
CN108415952B (en) User data storage method, label calculation method and calculation equipment
CN101416179B (en) System and method for providing regulated recommended word to every subscriber
CN105512153A (en) Method and device for service provision of online customer service system, and system
CN105446991A (en) Data storage method, query method and device
CN103731284A (en) Method and system for correlating a plurality of network accounts
CN103714004A (en) JVM online memory leak analysis method and system
CN105446990B (en) A kind of business data processing method and equipment
CN103714086A (en) Method and device used for generating non-relational data base module
CN106294128B (en) A kind of automated testing method and device exporting report data
CN110209643A (en) A kind of data processing method and device
CN106326220A (en) Data storage method and apparatus
CN109359160A (en) Method of data synchronization, device, computer equipment and storage medium
CN106709851A (en) Big data retrieval method and apparatus
CN109669995A (en) Data storage, quality calculation method, device, storage medium and server
CN103475748B (en) A kind of method and apparatus of the geographic location type determining IP address
CN104199977A (en) Method for creating information search based on data in database
CN109086149A (en) A kind of method that micro services interface calls analysis of central issue
CN105359172A (en) Calculating a probability of a business being delinquent
CN101436316A (en) Method and system for filtering work attendance data
CN110737432A (en) script aided design method and device based on root list
CN105335886A (en) Method and device for processing financial data
CN106384253A (en) Consumption behavior analysis method in bankcard transaction and consumption behavior analysis device thereof
CN108170837A (en) Method of Data Discretization, device, computer equipment and storage medium
CN105589900A (en) Data mining method based on multi-dimensional analysis
CN106156122B (en) Transaction information acquisition method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right
TR01 Transfer of patent right

Effective date of registration: 20201012

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201012

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.