CN104574159A - Data storage and query method and device - Google Patents
Data storage and query method and device Download PDFInfo
- Publication number
- CN104574159A CN104574159A CN201510053228.1A CN201510053228A CN104574159A CN 104574159 A CN104574159 A CN 104574159A CN 201510053228 A CN201510053228 A CN 201510053228A CN 104574159 A CN104574159 A CN 104574159A
- Authority
- CN
- China
- Prior art keywords
- user
- usage data
- attribute
- identity information
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
An embodiment of the invention provides a data storage and query method and a device. The data storage method includes: storing identity information of multiuser into user identity information attribute column files; respectively storing usage data of the multiuser in use data attribute column files of different data partitions according to time period, wherein the data partitions include at least one use data attribute column file, and storage order of use data of the different attributes of each user is identical to identity information storage order of the user in the user identity information attribute column files. Storage units of the use data of the users are smaller, so that storage and rapid data interaction are facilitated.
Description
Technical field
The embodiment of the present invention relates to computer technology, particularly relates to the storage of a kind of data, querying method and device.
Background technology
In marketing management system, generally can provide and extract customers, as marketing objectives client based on magnanimity client properties.The storage of client properties and search mode, have impact on the efficiency extracted customers and analyze customers' feature.
In the prior art, each property set is stored in row, and each attribute of all clients stores with a space respectively, as row, is called that column stores.By the capable relation of line number record.Similar as follows: to distribute a sequence number to each client in each period, each attribute has the sequence number of a table storage property value and correspondence.When extracting customers or analytical characteristic, the condition of scanning arranges, according to the condition filter of specifying.During plural condition, get the common factor or union etc. of multiple row line number, obtain qualified client.And be stored as at another kind of line, each period, all properties of each client distributed a space storage, as a line.Be similar to each client, have the form of a same format, divide the different attribute that have recorded customer data period, the attributes such as such as voice consumption, flow consumption, note consumption.When carrying out data search under line storage mode, first need in a period of selection, to scan one by one in the storage space of each client in this period, until find the attribute value of needs.During for avoiding attribute more, order reads the too many attribute irrelevant with querying condition of line scanning, and the attribute of a client can be assigned to the storage of multiple space by system sometimes, often putting together simultaneously.Consider the increase table co-related risks that may increase, too many group can not be divided.Similar 300 attributes are divided into 3 groups, can the beginning of dependency place group search, until find the property value of needs.When extracting customers, according to condition, subregion in one or several that scanning is specified, according to the condition filter of specifying, obtains qualified client in period.
But in the prior art, in the mode that column stores, the storage of data stores according to attribute, data store comparatively numerous and jumbled, when needing the data of searching certain time period, the memory row of needs to this attribute is lined by line scan, in scanning process, there will be the scanning of a large amount of irrelevant period data, add the time of data input and output (Input/Output is called for short IO).In the mode that line stores, the storage of data stored according to period, when a certain attribute data searched by needs, can scan a large amount of irrelevant attribute, increase the IO time.
Summary of the invention
The embodiment of the present invention provides the storage of a kind of data, querying method and device, stores comparatively numerous and jumbled, so that in scanning process, there will be the scanning of a large amount of irrelevant row, the period data that has nothing to do, add the problem of the time of data IO to solve data in prior art.
First aspect, the embodiment of the present invention provides a kind of date storage method, it is characterized in that, comprising:
The identity information of multiple user is stored in subscriber identity information attribute column file, in described subscriber identity information attribute column file, every a line stores the identity information of a user;
Be stored in the usage data attribute column file of different data partitions according to the time cycle respectively by the usage data of described multiple user, in described usage data attribute column file, every a line stores the usage data of an attribute of a described user;
Wherein, described data partition comprises at least one usage data attribute column file, the storage space of every a line of each usage data attribute column file is a regular length, the usage data of the different attribute of each user is stored in usage data attribute column files different in described data partition respectively, and the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
In conjunction with first aspect, in the first possible implementation of first aspect, the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically to be comprised:
The offset units lattice quantity storing the row correspondence of the usage data of described user is identical with the offset units lattice quantity of the row correspondence of the identity information storing described user.
In conjunction with first aspect, in the implementation that the second of first aspect is possible, if the usage data of at least one attribute of described multiple user comprises the usage data of multiple difference preference's classification, described method also comprises:
Be stored in successively by the usage data of described multiple difference preference's classifications of each user in the preference usage data attribute column file of described attribute, in the preference usage data attribute column file of described attribute, every a line stores the usage data of a categories of preferences of a described user;
Categories of preferences corresponding for the usage data of described multiple difference preference's classifications of each user mark is stored in categories of preferences identity property row file successively, and the identity information of the user of the usage data of described multiple difference preference's classification is stored in multidimensional data user identity attribute column file;
Wherein, in described preference usage data attribute column file, the storage order of the usage data of categories of preferences is identical with the storage order that categories of preferences in described categories of preferences identity property row file identifies, and identical with the storage order of the identity information of user in described multidimensional data user identity attribute column file;
The storage obtaining the usage data of described multiple difference preference's classifications of described user according to the number of the usage data of described multiple difference preference's classifications of each user terminates positional information, the storage of each user being terminated positional information is stored in storage end positional information attribute column file, and described storage terminates the storage end positional information that every a line in positional information attribute column file stores a described user;
Wherein, the storage order terminating positional information is stored in described storage end positional information attribute column file identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
In conjunction with the implementation that the second of first aspect is possible, in the third possible implementation of first aspect, the storage order storing end positional information in described storage end positional information attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically to be comprised:
The offset units lattice quantity of row correspondence of the identity information that the storage storing described user terminates offset units lattice quantity corresponding to the row of positional information and stores described user is identical.
In conjunction with first aspect, first aspect the first to the third any one possible implementation, in the 4th kind of possible implementation of first aspect, described method also comprises:
By the newly-increased usage data of described multiple user, temporally the cycle is stored in newly-increased data partition, described newly-increased data partition comprises at least one newly-increased usage data attribute column file, the different attribute data of described newly-increased usage data be stored in each newly-increased usage data attribute column file respectively, in described newly-increased usage data attribute column file, every a line stores the newly-increased usage data of an attribute of a described user;
Wherein, the storage space of every a line of each newly-increased usage data attribute column file is a regular length, and in described newly-increased data partition, the storage order of the newly-increased usage data of newly-increased usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
In conjunction with the 4th kind of possible implementation of first aspect, in the 5th kind of possible implementation of first aspect, if there is the usage data Added User within the time cycle belonging to described newly-increased data partition, described method also comprises:
The identity information Added User is stored in the afterbody of described subscriber identity information attribute column file, obtains new subscriber identity information attribute column file;
The described usage data Added User is stored in described newly-increased data partition, described in the different attribute data of usage data that Add User be stored in described newly-increased usage data attribute column file respectively;
Wherein, the storage order of the newly-increased usage data of described newly-increased usage data attribute column file is identical with the identity information storage order of user in described new subscriber identity information attribute column file, and described newly-increased usage data comprises the newly-increased usage data of described multiple user and the described newly-increased usage data Added User.
In conjunction with first aspect, in the 6th kind of possible implementation of first aspect, described method also comprises:
The user index of described user is set up, the identity information one_to_one corresponding of each user in each user index and described subscriber identity information attribute column file according to the identity information of the multiple users in described subscriber identity information attribute column file;
Obtain the identity information Added User, whether the identity information Added User described in judging according to the user index of described each user is present in described subscriber identity information attribute column file;
If not, then the described identity information Added User is stored in the afterbody of described subscriber identity information attribute column file, the described newly-increased usage data Added User is stored in newly-increased data partition.
Second aspect, the embodiment of the present invention provides the first utilizing first aspect, first aspect a kind of to realize the method for data query to the 6th kind of any one possible implementation, comprising:
Obtain querying condition, described querying condition comprises at least one attribute conditions and at least one time conditions;
The data partition corresponding with each time conditions is searched according at least one time conditions described, in each data partition, obtain the data attribute row file corresponding with each attribute conditions according at least one attribute conditions described, described data attribute row file comprises usage data attribute column file and preference usage data attribute column file;
Identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions, search the usage data of the user meeting described querying condition successively.
In conjunction with second aspect, in the first possible implementation of second aspect, if data attribute row file corresponding to described and each attribute conditions is usage data attribute column file, identical all row of the described data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
To go together as dummy row mutually in the usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
In conjunction with the first possible implementation of second aspect, in the implementation that the second of second aspect is possible, described record meets identity information corresponding to the usage data of the user in the dummy row of described querying condition, obtains Query Result customer group, comprising:
Obtain the primary importance of usage data in described usage data attribute column file of the user met in the dummy row of described querying condition, the identity information of the user of described first position is obtained in described subscriber identity information attribute, record the identity information of described user, obtain Query Result customer group.
In conjunction with second aspect, in the third possible implementation of second aspect, if data attribute row file corresponding to described and each attribute conditions is preference usage data attribute column file, identical all row of the described data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
To go together as dummy row mutually in the preference usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
In conjunction with the third possible implementation of second aspect, in the 4th kind of possible implementation of second aspect, described record meets identity information corresponding to the usage data of the user in the dummy row of described querying condition, obtains Query Result customer group, comprising:
Obtain the second place of usage data in described preference usage data attribute column file of the user met in the dummy row of described querying condition, the identity information of the user of described second position is obtained in described multidimensional data user identity attribute column file, record the identity information of described user, obtain Query Result customer group.
In conjunction with second aspect, in the 5th kind of possible implementation of second aspect, if data attribute row file corresponding to described and each attribute conditions comprises usage data attribute column file and preference usage data attribute column file, identical all row of the described data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
Respectively going together mutually as first in the usage data attribute column file corresponding with attribute conditions of each data partition is fictitiously planned to implement, respectively going together mutually as second in the preference usage data attribute column file corresponding with attribute conditions in each data partition is fictitiously planned to implement;
If the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second;
Obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
In conjunction with the 5th kind of possible implementation of second aspect, in the 6th kind of possible implementation of second aspect, if the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second, also comprises before:
In described subscriber identity information attribute column file, obtain the described first fictitious identity information planning to implement corresponding user, in described multidimensional data user identity attribute column file, obtain the described second fictitious identity information planning to implement corresponding user;
Judge that whether the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user.
In conjunction with the second or the 4th kind of possible implementation of second aspect, in the 7th kind of possible implementation of second aspect, described primary importance or the described second place are offset units lattice quantity.
In conjunction with second aspect, second aspect the first to the 6th kind of any one possible implementation, in the 8th kind of possible implementation of second aspect, described method also comprises:
Obtain the attributive analysis request of described Query Result customer group, described attributive analysis request comprises at least one attribute information;
Data attribute row file corresponding to described attribute is obtained according to the attribute information in described attributive analysis request, in described data attribute row file, obtain the usage data that the identity information of each user in described Query Result customer group is corresponding, statistical study is carried out to described usage data and obtains user property analysis result.
The third aspect, the embodiment of the present invention provides a kind of data storage device, comprising:
Identity information memory module, for the identity information of multiple user is stored in subscriber identity information attribute column file, in described subscriber identity information attribute column file, every a line stores the identity information of a user;
Data memory module, for being stored in the usage data attribute column file of different data partitions according to the time cycle respectively by the usage data of multiple user, in described usage data attribute column file, every a line stores the usage data of an attribute of a described user;
Wherein, described data partition comprises at least one usage data attribute column file, the storage space of every a line of each usage data attribute column file is a regular length, the usage data of the different attribute of each user is stored in usage data attribute column files different in described data partition respectively, and the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
In conjunction with the third aspect, in the first possible implementation of the third aspect, the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically to be comprised:
The offset units lattice quantity storing the row correspondence of the usage data of described user is identical with the offset units lattice quantity of the row correspondence of the identity information storing described user.
In conjunction with the third aspect, in the implementation that the second of the third aspect is possible, if the usage data of at least one attribute of described multiple user comprises the usage data of multiple difference preference's classification, described data memory module also for:
Be stored in successively by the usage data of described multiple difference preference's classifications of each user in the preference usage data attribute column file of described attribute, in the preference usage data attribute column file of described attribute, every a line stores the usage data of a categories of preferences of a described user;
Categories of preferences corresponding for the usage data of described multiple difference preference's classifications of each user mark is stored in categories of preferences identity property row file successively, and the identity information of the user of the usage data of described multiple difference preference's classification is stored in multidimensional data user identity attribute column file;
Wherein, in described preference usage data attribute column file, the storage order of the usage data of categories of preferences is identical with the storage order that categories of preferences in described categories of preferences identity property row file identifies, and identical with the storage order of the identity information of user in described multidimensional data user identity attribute column file;
The storage obtaining the usage data of described multiple difference preference's classifications of described user according to the number of the usage data of described multiple difference preference's classifications of each user terminates positional information, the storage of each user being terminated positional information is stored in storage end positional information attribute column file, and described storage terminates the storage end positional information that every a line in positional information attribute column file stores a described user;
Wherein, the storage order terminating positional information is stored in described storage end positional information attribute column file identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
In conjunction with the implementation that the second of the third aspect is possible, in the third possible implementation of the third aspect, the storage order storing end positional information in described storage end positional information attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically to be comprised:
The offset units lattice quantity of row correspondence of the identity information that the storage storing described user terminates offset units lattice quantity corresponding to the row of positional information and stores described user is identical.
In conjunction with the third aspect, the third aspect the first to the third any one possible implementation, in the 4th kind of possible implementation of first aspect, described data memory module also for:
By the newly-increased usage data of described multiple user, temporally the cycle is stored in newly-increased data partition, described newly-increased data partition comprises at least one newly-increased usage data attribute column file, the different attribute data of described newly-increased usage data be stored in each newly-increased usage data attribute column file respectively, in described newly-increased usage data attribute column file, every a line stores the newly-increased usage data of an attribute of a described user;
Wherein, the storage space of every a line of each newly-increased usage data attribute column file is a regular length, and in described newly-increased data partition, the storage order of the newly-increased usage data of newly-increased usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
In conjunction with the 4th kind of possible implementation of the third aspect, in the 5th kind of possible implementation of the third aspect, if there is the usage data Added User within the time cycle belonging to described newly-increased data partition, described identity information memory module also for the identity information Added User being stored in the afterbody of described subscriber identity information attribute column file, obtains new subscriber identity information attribute column file;
Described data memory module also for the described usage data Added User is stored in described newly-increased data partition, described in the different attribute data of usage data that Add User be stored in described newly-increased usage data attribute column file respectively;
Wherein, the storage order of the newly-increased usage data of described newly-increased usage data attribute column file is identical with the identity information storage order of user in described new subscriber identity information attribute column file, and described newly-increased usage data comprises the newly-increased usage data of described multiple user and the described newly-increased usage data Added User.
In conjunction with the third aspect, in the 6th kind of possible implementation of the third aspect, described identity information memory module also for:
The user index of described user is set up, the identity information one_to_one corresponding of each user in each user index and described subscriber identity information attribute column file according to the identity information of the multiple users in described subscriber identity information attribute column file;
Obtain the identity information Added User, whether the identity information Added User described in judging according to the user index of described each user is present in described subscriber identity information attribute column file;
If not, then the described identity information Added User is stored in the afterbody of described subscriber identity information attribute column file, the described newly-increased usage data Added User is stored in newly-increased data partition.
Fourth aspect, the embodiment of the present invention provides a kind of device utilizing the first data storage device to the 6th kind of any one possible implementation of the third aspect, the third aspect to realize data query, comprising:
Acquisition module, for obtaining querying condition, described querying condition comprises at least one attribute conditions and at least one time conditions;
Processing module, for searching the data partition corresponding with each time conditions according at least one time conditions described, in each data partition, obtain the data attribute row file corresponding with each attribute conditions according at least one attribute conditions described, described data attribute row file comprises usage data attribute column file and preference usage data attribute column file; Identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions, search the usage data of the user meeting described querying condition successively.
In conjunction with fourth aspect, in the first possible implementation of fourth aspect, if data attribute row file corresponding to described and each attribute conditions is usage data attribute column file, described processing module is specifically for traveling through identical all row of described corresponding with each attribute conditions data attribute row file successively, search the usage data of the user meeting described querying condition, specifically comprise:
To go together as dummy row mutually in the usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
In conjunction with the first possible implementation of fourth aspect, in the implementation that the second of fourth aspect is possible, described record meets identity information corresponding to the usage data of the user in the dummy row of described querying condition, obtains Query Result customer group, comprising:
Obtain the primary importance of usage data in described usage data attribute column file of the user met in the dummy row of described querying condition, the identity information of the user of described first position is obtained in described subscriber identity information attribute, record the identity information of described user, obtain Query Result customer group.
In conjunction with fourth aspect, in the third possible implementation of fourth aspect, if data attribute row file corresponding to described and each attribute conditions is preference usage data attribute column file, described processing module is used for identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
To go together as dummy row mutually in the preference usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
In conjunction with the third possible implementation of fourth aspect, in the 4th kind of possible implementation of fourth aspect, the identity information that the usage data that described record meets the user in the dummy row of described querying condition is corresponding, obtains Query Result customer group, comprising:
Obtain the second place of usage data in described preference usage data attribute column file of the user met in the dummy row of described querying condition, the identity information of the user of described second position is obtained in described multidimensional data user identity attribute column file, record the identity information of described user, obtain Query Result customer group.
In conjunction with fourth aspect, in the 5th kind of possible implementation of fourth aspect, if data attribute row file corresponding to described and each attribute conditions comprises usage data attribute column file and preference usage data attribute column file, described processing module is used for identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
Respectively going together mutually as first in the usage data attribute column file corresponding with attribute conditions of each data partition is fictitiously planned to implement, respectively going together mutually as second in the preference usage data attribute column file corresponding with attribute conditions in each data partition is fictitiously planned to implement;
If the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second;
Obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
In conjunction with the 5th kind of possible implementation of fourth aspect, in the 6th kind of possible implementation of fourth aspect, if it is identical with the described second fictitious identity information planning to implement corresponding user that described processing module is used for the described first fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second, before also for:
In described subscriber identity information attribute column file, obtain the described first fictitious identity information planning to implement corresponding user, in described multidimensional data user identity attribute column file, obtain the described second fictitious identity information planning to implement corresponding user;
Judge that whether the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user.
In conjunction with the second or the 4th kind of possible implementation of fourth aspect, in the 7th kind of possible implementation of fourth aspect, described primary importance or the described second place are offset units lattice quantity.
In conjunction with fourth aspect, fourth aspect the first to the 6th kind of any one possible implementation, in the 8th kind of possible implementation of fourth aspect, described processing module also for:
Obtain the attributive analysis request of described Query Result customer group, described attributive analysis request comprises at least one attribute information;
Data attribute row file corresponding to described attribute is obtained according to the attribute information in described attributive analysis request, in described data attribute row file, obtain the usage data that the identity information of each user in described Query Result customer group is corresponding, statistical study is carried out to described usage data and obtains user property analysis result.
A kind of data of the embodiment of the present invention store, querying method and device, by the identity information of multiple user is stored in subscriber identity information attribute column file, according to the time cycle, the usage data of multiple user is stored in the usage data attribute column file of different data partitions respectively, in this usage data attribute column file, every a line stores the usage data of an attribute of a user, this data partition comprises at least one usage data attribute column file, the storage space of the every a line in each user data attribute column is regular length, the usage data of the different attribute of each user is stored in usage data attribute column files different in data partition respectively, the storage order of the usage data of each usage data attribute column file is identical with the storage order of the identity information of the user in subscriber identity information attribute column file, namely the usage data of the different attribute of same subscriber is stored in going together mutually in each usage data attribute column file, such date storage method makes the usage data of user be divided into different pieces of information subregion according to the time, be stored in different attribute row file respectively according to column split in data partition, make the usage data storage unit of user (the attribute column file in a subregion) size less, be convenient to store and rapid data mutual, reading is being stored to utilizing above-mentioned date storage method to carry out data, the time of data IO is shorter.
Accompanying drawing explanation
In order to be illustrated more clearly in the embodiment of the present invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, apparently, accompanying drawing in the following describes is some embodiments of the present invention, for those of ordinary skill in the art, under the prerequisite not paying creative work, other accompanying drawing can also be obtained according to these accompanying drawings.
Fig. 1 is the process flow diagram of date storage method embodiment one of the present invention;
Fig. 2 is the process flow diagram of date storage method embodiment two of the present invention;
Fig. 3 is that the data in date storage method embodiment two of the present invention store schematic diagram;
Fig. 4 is the preference usage data attribute column file of date storage method embodiment two of the present invention and stores the schematic diagram terminating positional information attribute column file;
Fig. 5 utilizes the date storage method shown in Fig. 1 or Fig. 3 to realize the process flow diagram of the embodiment of the method one of data query for the present invention;
Fig. 6 is the process flow diagram of data enquire method embodiment two of the present invention;
Fig. 7 be data enquire method of the present invention illustrate a schematic diagram;
Fig. 8 be data enquire method of the present invention illustrate two schematic diagram;
Fig. 9 be data enquire method of the present invention illustrate three schematic diagram;
The data that Figure 10 provides for the embodiment of the present invention store, the particular flow sheet of querying method;
Figure 11 is the structural representation of data storage device embodiment one of the present invention;
Figure 12 is the structural representation of data query arrangement embodiment one of the present invention.
Embodiment
For making the object of the embodiment of the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawing in the embodiment of the present invention, technical scheme in the embodiment of the present invention is clearly and completely described, obviously, described embodiment is the present invention's part embodiment, instead of whole embodiments.Based on the embodiment in the present invention, those of ordinary skill in the art, not making the every other embodiment obtained under creative work prerequisite, belong to the scope of protection of the invention.
Fig. 1 is the process flow diagram of date storage method embodiment one of the present invention, and as shown in Figure 1, the method for the present embodiment can comprise:
Step 101, the identity information of multiple user is stored in subscriber identity information attribute column file, in described subscriber identity information attribute column file, every a line stores the identity information of a user.
Subscriber identity information attribute column file can be a data file, this data file is the data file that a multirow one arranges, the identity information of each user is stored in respectively in each row, the identity information of this user is numeric type information, can be any information for identifying user identity, can be such as user use the terminal identification information, ID users (IDentity) etc. of terminal.Each behavior fixed length storage space in this data file, the identity information of each user is stored in a line.
Step 102, be stored in the usage data attribute column file of different data partitions according to the time cycle respectively by the usage data of multiple user, in described usage data attribute column file, every a line stores the usage data of an attribute of a described user.
Wherein, described data partition comprises at least one usage data attribute column file, the storage space of the every a line in each usage data attribute column file is a regular length, the usage data of the different attribute of each user is stored in usage data attribute column files different in described data partition respectively, and the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
In the different time cycles such as concrete, this time cycle can be one day, January or 1 year, can arrange flexibly according to storing queries demand.According to this time cycle, user data is stored in different data partitions respectively, namely subregion is carried out according to this time cycle, in each subregion, each usage data attribute is a usage data attribute column file, stores the usage data of respective attributes in each usage data attribute column file.For example, time cycle is the moon, then data partition comprises January, February, the usage data of the user of the different time sections such as March, and the usage data of user comprises multiple attribute, such as voice toll charge, short-message fee, campus network etc., in this data partition, so just comprise the usage data attribute column file of voice toll charge in January, January short-message fee usage data attribute column file, January campus network usage data attribute column file, similar February is identical with storage mode in January with March, with January, the usage data attribute column file of voice toll charge illustrates, January voice toll charge usage data attribute column file in every a line store the usage data of this attribute corresponding to the identity information of a user, namely the first row can store the voice toll charge in January that the identity information of user is the user of 1, the identity information that second row can store user is the voice toll charge in January of the user of 2, sequentially store, the storage order of the usage data in each usage data attribute column file is identical with the storage order of the identity information of user in subscriber identity information attribute column file, it should be noted that, here " storage order is identical " not refers in particular to ascending order or descending, it is identical that " storage order is identical " specifically can refer to use the offset units lattice quantity that data attribute row file is corresponding with the start memory location of subscriber identity information attribute column file, the offset units lattice quantity that in each attribute column file, each row is corresponding is identical, namely the offset units lattice quantity using the offset units lattice quantity of the row correspondence storing first usage data in data attribute row file corresponding with the row of the identity information storing first user in subscriber identity information attribute column file is identical, the identity information of the user that usage data is afterwards corresponding with this usage data sequentially stores, understandable, such storage mode, with the usage data of behavior one attribute of same offset cell quantity in subscriber identity information attribute column file and the identity information of the user of this usage data in usage data attribute column.Certainly except above-mentioned this mode, start offset cell quantity and offset units lattice quantity corresponding to this start memory location can also be arranged, namely start element lattice quantity can be different, concrete, the offset units lattice quantity that the reference position of usage data attribute column file is corresponding can be 3, offset units lattice quantity corresponding to the reference position stored in subscriber identity information attribute column file can be 4, understandable, owing to having arranged offset units lattice quantity corresponding to reference position in advance, the identity information of the user of the usage data that usage data attribute column N-th row stores can be known, for the information of the capable storage of subscriber identity information attribute column N+1, wherein N is natural number.For the ease of understanding, namely row corresponding for reference position in each attribute column file can be thought line number is 1, namely storage mode above-mentioned like this can be understood as each usage data attribute column file user corresponding identical with the identical line number of subscriber identity information attribute column file.
In each attribute column file, the storage space of each row is a regular length, and in each attribute column file, the storage space of a line can be the same or different.
Optionally, the usage data of an attribute of user there will be the usage data comprising multiple difference preference's classifications, concrete, if the usage data of at least one attribute of described multiple user comprises the usage data of multiple difference preference's classification, then be stored in the preference usage data attribute column file of described attribute successively by the usage data of described multiple difference preference's classifications of each user, the preference that in the preference usage data attribute column file of described attribute, every a line stores a described user arranges other usage data; Categories of preferences corresponding for the usage data of described multiple difference preference's classifications of each user is stored in categories of preferences identity property row file successively, and the identity information of the user of the usage data of described multiple difference preference's classification is stored in multidimensional data user identity attribute column file.
Wherein, in described preference usage data attribute column file, the storage order of the usage data of categories of preferences is identical with the storage order that categories of preferences in described categories of preferences identity property row file identifies, and identical with the storage order of the identity information of user in described multidimensional data user identity attribute column file, " order is identical " herein can illustrate see above-mentioned explanation and repeat no more herein.
It should be noted that, the above-mentioned usage data by described multiple difference preference's classifications of each user is stored in the preference usage data attribute column file of described attribute successively, concrete finger, the usage data of multiple difference preference's classifications of an attribute of a user is stored in the adjacent lines of preference usage data attribute column file of this attribute.The corresponding multiple different categories of preferences of usage data of multiple difference preference's classification, categories of preferences is stored in categories of preferences identity property row file, the identity information of the user of the usage data of multiple difference preference's classification is stored in multidimensional data user identity attribute column file, understandable, the identity information of the usage data of multiple difference preference's classifications of same subscriber is identical, namely there will be the situation that the identity information of the user that adjacent lines store is identical in this multidimensional data user identity attribute column file.
The storage obtaining the usage data of described multiple difference preference's classifications of described user according to the number of the usage data of described multiple difference preference's classifications of each user terminates positional information, the storage of each user being terminated positional information is stored in storage end positional information attribute column file, and described storage terminates the storage end positional information that every a line in positional information attribute column file stores a described user.
Wherein, it is identical with the storage order of the identity information of user in described subscriber identity information attribute column file that described storage terminates to store the storage order terminating positional information in positional information attribute column file, and " order is identical " herein can repeat no more see above-mentioned explanation explanation herein.
Concrete, set up to store and terminate positional information attribute column file, the storage terminating the usage data of the described multiple difference preference's classification of positional information acquisition according to the number of usage data of multiple difference preference's classifications of each user and the storage of the last user of this user terminates positional information, the storage of each user is terminated positional information to be stored in this storage and to terminate in positional information attribute column file, wherein a line stores an of user and stores and terminate positional information.
Illustrate, the reference position line number terminating positional information attribute column file if store is 1, in subscriber identity information attribute column file, the number of the usage data of multiple difference preference's classifications of an attribute of the identity information of first user is 5, so the storage of this user terminates positional information is 5, then being recorded in this storage by 5 terminates in the first row of positional information attribute column file, further, in subscriber identity information attribute column file, the number of the usage data of multiple difference preference's classifications of this attribute of the identity information of second user is 0, so the storage of this user terminates positional information is 5.The storage of other users terminates positional information, and to obtain storage mode same as described above.
It should be noted that, above-mentioned is store end positional information to illustrate for storing end line number, in addition, it can also be store end address that this storage terminates positional information, accordingly, according to according to the usage data number of multiple difference preference's classifications of user and the storage end address of the storage regular length of every a line and a upper user, this user can be obtained and store end address, this storage end address is stored in the corresponding line storing and terminate positional information attribute column.
Wherein, one store terminate positional information attribute column file can in the preference usage data attribute column file of corresponding multiple different attribute, such as, the attribute of preference usage data attribute column file can be talk times attribute, note time number attribute etc., this preference usage data attribute column file comprises the usage data of multiple difference preference's classification, this difference preference's classification can be different category of employment, such as, catering industry, show business, medical industry etc., the classification logotype of this difference preference's classification is stored in categories of preferences identity property row file, the i.e. ID of every profession and trade classification, this difference preference's classification also can be different place classification, such as different regional addresss, corresponding is stored in categories of preferences attribute column file by ID corresponding for zones of different, here categories of preferences can be arranged flexibly according to data storage requirement, herein not in this, as restriction.
Optionally, by the newly-increased usage data of described multiple user, temporally the cycle is stored in newly-increased data partition, described newly-increased data partition comprises at least one newly-increased usage data attribute column file, the different attribute data of described newly-increased usage data is stored in each newly-increased usage data attribute column file respectively; In described newly-increased usage data attribute column file, every a line stores the newly-increased usage data of an attribute of a described user; Wherein, the storage space of every a line of each newly-increased usage data attribute column file is a regular length, and in described newly-increased data partition, the storage order of the newly-increased usage data of newly-increased usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
Optionally, if there is the usage data Added User within the time cycle belonging to described newly-increased data partition, the described identity information Added User is stored in the afterbody of described subscriber identity information attribute column file, obtains new subscriber identity information attribute column file; The described usage data Added User is stored in described newly-increased data partition, described in the different attribute data of usage data that Add User be stored in described newly-increased usage data attribute column file respectively; Wherein, the storage order of the newly-increased usage data of described newly-increased usage data attribute column file is identical with the identity information storage order of user in described new subscriber identity information attribute column file, and described newly-increased usage data comprises the newly-increased usage data of described multiple user and the described newly-increased usage data Added User.
Concrete, the identity information Added User does not have the usage data of corresponding user before addition, so after the identity information of adding users, the newly-increased usage data of the identity information Added User is stored in newly-increased data partition, for example, March Adds User, and the usage data in the March so just this Added User is stored in the data partition in March, and without the need to this newly-increased usage data is stored in the data partition in February.
Optionally, for the ease of inquiring about the usage data of unique user, user index can be set up, the identity information one_to_one corresponding of each user in described user index and described subscriber identity information attribute column according to the identity information of the multiple users in subscriber identity information attribute column file;
Obtain the identity information Added User, whether the identity information Added User described in judging according to described user index is present in described subscriber identity information attribute column file;
If not, then the described identity information Added User is stored in the afterbody of described subscriber identity information attribute column file, the described data that Add User Added User are stored in newly-increased data partition.
Concrete, the corresponding usage data of this user index can be found rapidly according to this user index, and this user index also conveniently can increase the importing of data newly, namely judge whether newly-increased subscriber identity information is present in subscriber identity information attribute column file according to this user index, if there is no, then the identity information of newly-increased user is stored in the afterbody of subscriber identity information attribute column, the usage data Added User is stored in newly-increased data partition, if and exist, then illustrate that the identity information that this Adds User not Adds User, without the need to carrying out data storage operations.
The present embodiment, by the identity information of multiple user is stored in subscriber identity information attribute column file, according to the time cycle, the usage data of multiple user is stored in the usage data attribute column file of different data partitions respectively, in this usage data attribute column file, every a line stores the usage data of an attribute of a user, this data partition comprises at least one usage data attribute column file, the storage space of the every a line in each user data attribute column is regular length, the usage data of the different attribute of each user is stored in usage data attribute column files different in data partition respectively, the storage order of the usage data of each usage data attribute column file is identical with the storage order of the identity information of the user in subscriber identity information attribute column file, namely the usage data of the different attribute of same subscriber is stored in going together mutually in each usage data attribute column file, such date storage method makes the usage data of user be divided into different pieces of information subregion according to the time, be stored in different attribute row file respectively according to column split in data partition, make the usage data storage unit of user (the attribute column file in a subregion) size less, be convenient to store and rapid data mutual, reading is being stored to utilizing above-mentioned date storage method to carry out data, the time of data IO is shorter.
It should be noted that, utilize the date storage method of the embodiment of the present invention, store relative to column of the prior art, effectively can reduce the scanning of irrelevant period data, thus reduce the IO time, store relative to line of the prior art, effectively can reduce a large amount of irrelevant attribute of scanning, thus reduce the IO time.
Adopt a specific embodiment below, the technical scheme of embodiment of the method shown in Fig. 1 is described in detail.
Fig. 2 is the process flow diagram of date storage method embodiment two of the present invention, Fig. 3 is that the data in date storage method embodiment two of the present invention store schematic diagram, Fig. 4 is the preference usage data attribute column file of date storage method embodiment two of the present invention and stores the schematic diagram terminating positional information attribute column file, the present embodiment specifically with the identity information of user for user ID, user ID comprises 1, 2 and 3, time cycle is the moon, the attribute of the usage data of user comprises voice toll charge, short-message fee and campus network, accordingly, the usage data attribute column file of user comprises voice toll charge attribute column file, short-message fee attribute column file and campus network attribute column file, illustrate, as shown in Figure 2, the method of the present embodiment can comprise:
S201, be stored in subscriber identity information attribute column file by the identity information of user, every a line of subscriber identity information attribute column file stores identity information a line of a user.
Concrete, user ID is comprised 1,2 and 3 and be stored in different rows in subscriber identity information attribute column file respectively, subscriber identity information attribute column file as shown in Figure 3, line number wherein in subscriber identity information attribute column file is to show that each row of subscriber identity information attribute column file is corresponding with each row of the usage data attribute column file in step below, it is virtual and non-existent, each attribute column file is the data file of a row multirow, and line number is only for the date storage method of the present invention of understanding clearly.Can set up user index after the identity information of user being stored in subscriber identity information attribute column file, this user index can be hash index, utilizes this user index can find the identity information of corresponding user.
S202, by the usage data of the different attribute of the user in the time cycle, to be stored in respectively in the usage data property file of the described attribute in data partition corresponding to this time cycle.
Concrete, be described with above-mentioned citing, be 1 by user ID, the usage data of the user in April of 2 and 3 is stored in the data partition in April in Fig. 3, the attribute that different attribute column file in this data partition is respectively used to the usage data storing user in April is voice toll charge, the usage data of short-message fee and campus network, if the April of Fig. 3 is shown in data partition, this, usage data attribute column file of data partition comprised voice toll charge attribute column file in April, short-message fee attribute column file and campus network attribute column file, wherein the line number of each use attribute row file is for identifying the identity information of each data attribute row file and the corresponding identical user that goes together mutually in subscriber identity information attribute column file, it is virtual and non-existent, wherein, it is 1 that each row of each usage data attribute column file stores user ID respectively, the data of the different attribute of the usage data of 2 and 3.As shown in Figure 3, in the storage order of the usage data in each usage data attribute column file and subscriber identity information attribute column file, the storage order of the identity information of user is unified.That is, for any two usage data attribute column files, two records that line number is identical are usage datas of same user.
Wherein, carrying out in above-mentioned subscriber identity information attribute column file and each usage data attribute column file that data store is all that fixed length stores, namely the storage space of every a line is formed objects, such storage mode is understandable that, if the reference position of dependency row file starts, if appointment line number is rownum, so the data of this appointment line number are stored in (rownum-1) * length+1 to rownum*length position, and wherein length is the memory length of each data.
If the usage data of at least one attribute of S203 user comprises the usage data of multiple difference preference's classification, then the usage data of multiple difference preference's classifications of user is stored in the preference usage data attribute column file of described attribute, and the identity information of the user of the usage data of difference preference's classification is stored in multidimensional data subscriber identity information attribute column file, store according to described multidimensional data subscriber identity information attribute column file set up and terminate positional information attribute column file.
Concrete, the usage data that can there is an attribute of a user comprises the usage data of multiple difference preference's classification, storage so for such usage data is concrete in the following way, the usage data of multiple difference preference's classifications of user is stored in successively in the preference usage data attribute column file of this attribute, categories of preferences corresponding for the usage data of the plurality of difference preference's classification mark is stored in categories of preferences identity property row file, the identity information of user corresponding for the usage data of the plurality of difference preference's classification is stored in multidimensional data subscriber identity information attribute column file, according to the number of the usage data of difference preference's classification of this each user of multidimensional data subscriber identity information attribute column file acquisition, the storage of the usage data of difference preference's classification of each user is terminated positional information to be stored in storage end positional information Attribute class file.
Concrete, be described with above-mentioned citing, the talk times attribute data of user and note number of times attribute data include the usage data of multiple difference preference's classification, so set up talk times attribute column file and note number of times attribute column file, as shown in Figure 4, user ID is the usage data that the usage data of the talk times attribute of the user of 1 comprises multiple difference preference's classification, it is respectively " is 94 with the talk times of the terminal of catering industry ID, be 72 with the talk times of the terminal of show business ID, be 41 with the talk times of the terminal of IT industry ID, be 68 with the talk times of the terminal of fashion industry ID ", so by the usage data 94 of talk times attribute, 72, 41 and 68 are stored in talk times attribute column file successively, by categories of preferences corresponding for usage data mark food and drink ID, amusement ID, IT ID and fashion ID is stored in categories of preferences identity property row file successively with same sequence, user ID (being 1) corresponding for the usage data of above-mentioned 4 attributes is stored in successively in multidimensional data user identity attribute column file, identical storage mode, it is the usage data that the usage data of the talk times attribute of the user of 3 comprises multiple difference preference's classification by user ID, the identity information of categories of preferences mark and multidimensional data user stores, in addition, user ID be 1 and be 3 the usage data of note time number attribute of user also can comprise the usage data of multiple difference preference's classification, store according to above-mentioned identical mode, repeat no more herein.The present embodiment is using row as memory location, the start memory location in each attribute column file of the usage data of difference preference's classification is that the first row illustrates, so user ID is that the storage end position of the usage data of multiple difference preference's classifications of the user of 1 is 4, because the user ID user that is 2 does not have the usage data of multiple difference preference's classifications of talk times attribute, it is still 4 that all its stores end position, user ID is that the storage end position of the use of multiple difference preference's classifications of the user of 3 is 9, by 4, 4, 9 are stored in storage successively terminates in positional information attribute column file, the storage order that this storage terminates positional information attribute column file is identical with the storage order of user ID in subscriber identity information attribute column file, the data of namely going together mutually are same subscriber.Such storage mode, need to obtain user ID be the usage data of talk times attribute of 2 time, can according to this storage terminate data that in positional information attribute column file, this user ID is gone together mutually be 4 and the storage of lastrow of this row terminate the usage data of user that positional information (4) can know this user ID not this attribute.
S204, when there being newly-increased usage data, temporally the cycle increases data partition and attribute column file, is stored in each attribute column file in corresponding data partition by the newly-increased usage data of user.
Concrete, as shown in the c region of Fig. 3, newly-increased usage data is the usage data of the user in May, then increase data partition in May, Attribute class file in this data partition comprises voice toll charge attribute column file, short-message fee attribute column file and campus network attribute column file in May, is stored into respectively in corresponding attribute column file by the newly-increased usage data of each user.
S205, when there being newly-increased user, the user ID of newly-increased user is increased in the afterbody of subscriber identity information attribute column file.
It should be noted that, the usage data of newly-increased user is only present in the attribute column file of new time cycle.
Concrete, if the ID that Adds User is 4, then this user ID is increased to the afterbody of subscriber identity information attribute column file, the usage data being 4 by this user ID in May is stored in corresponding attribute column file respectively.
Utilize the date storage method of this enforcement, by user data, temporally the cycle carries out subregion, each subregion is by carrying out column split, the usage data of different attribute is stored in different attribute row file, such data storage method can manage the usage data of large-scale user under the less page, is convenient to carry out fast data exchange with internal memory, when carrying out data query, distributed type assemblies can be adopted to process respectively, and effective guarantee obtains Query Result fast.
Fig. 5 utilizes the date storage method shown in Fig. 1 or Fig. 3 to realize the process flow diagram of the embodiment of the method one of data query for the present invention, and as shown in Figure 5, the method for the present embodiment can comprise:
Step 501, acquisition querying condition, described querying condition comprises at least one attribute conditions and at least one time conditions.
For example, this querying condition can be that the voice toll charge in March is greater than 30 and short-message fee is greater than 20, and so the time conditions of this querying condition is March, and attribute conditions is that voice toll charge is greater than 30, short-message fee is greater than 20.The inquiry request that querying condition inputs according to interface, foreground user and obtaining.
Step 502, search the data partition corresponding with each time conditions according at least one time conditions described, the data attribute row file corresponding with each attribute conditions is obtained according at least one attribute conditions described in each data partition, described data attribute row file comprises usage data attribute column file and preference usage data attribute column file, identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions, search the usage data of the user meeting described querying condition successively.
Concrete, first can obtain the data partition corresponding with time conditions according to time conditions, in this data partition, each data attribute row file all has a field definition metadata, this field definition metadata can comprise field contents, field type and field length etc., wherein field contents can be 4 months costs of the phone call, the different pieces of information attributes such as short-message fee, the data attribute row file corresponding with attribute conditions is obtained according to this attribute conditions, the data attribute row file of this correspondence is replaced in internal memory, the usage data of the user meeting querying condition is searched again in the data attribute row file of this correspondence.
Optionally, the first situation, if data attribute row file corresponding to described and each attribute conditions is usage data attribute column file, identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, be specifically as follows: will go together as dummy row mutually in the usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition, if so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.Wherein said record meets identity information corresponding to the usage data of the user in the dummy row of described querying condition, obtain Query Result customer group, be specifically as follows: the primary importance of usage data in described usage data attribute column file obtaining the user met in the dummy row of described querying condition, the identity information of the user of described first position is obtained in described subscriber identity information attribute, record the identity information of described user, obtain Query Result customer group.
Concrete, usage data attribute column file is the data file of a row multirow and in this data file, different rows stores is the usage data of different user, when the time conditions of querying condition obtains data partition corresponding to this time conditions, corresponding data attribute row file is obtained according to the attribute conditions of querying condition in this data partition, this data attribute row file is usage data attribute column file, then using the colleague mutually in each usage data attribute column file as dummy row, obtain the usage data in each dummy row, judge whether to meet querying condition, the identity information meeting each usage data attribute column in the dummy row of querying condition corresponding is the identity information of same subscriber, then according to the positional information of the usage data in this qualified dummy row at usage data attribute column file, obtain the identity information of the user at same position place in subscriber identity information attribute column file, this identity information is the identity information of the user of the usage data meeting querying condition, the usage data meeting querying condition can be multiple, obtain the identity information of the user of the usage data respectively meeting querying condition so respectively, the identity information composition Query Result customer group of each user.The usage data of the user of different attribute is comprised due to the dummy row in the present embodiment, this is different from the mode of searching that traditional column stores, traditional approach needs to search different attribute row respectively, again common factor or union are got to different attribute row, and the querying method of the present embodiment using the colleague mutually of different attribute row file as dummy row, application query condition in each dummy row, search efficiency is high.Be stored in by identity information corresponding for the usage data of the user meeting the dummy row of querying condition in system records user group, this system records user group comprises the identity information of the user respectively meeting querying condition.
Optionally, the second situation, if data attribute row file corresponding to described and each attribute conditions is preference usage data attribute column file, identical all row of the described data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, be specifically as follows: will go together as dummy row mutually in the preference usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition, if so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.Wherein, described record meets identity information corresponding to the usage data of the user in the dummy row of described querying condition, obtain Query Result customer group, can for obtaining the second place of usage data in described preference usage data attribute column file of the user met in the dummy row of described querying condition, the identity information of the user of described second position is obtained in described multidimensional data user identity attribute column file, record the identity information of described user, obtain Query Result customer group.
Concrete, the second situation and the first situation unlike, data attribute row file corresponding to each attribute conditions in querying condition is preference usage data attribute column file, the difference of preference usage data attribute column file and usage data attribute column file is, different rows in preference usage data attribute column file can be the usage data of same subscriber, because in preference usage data attribute column file, the storage mode of data is different with the storage mode in usage data attribute column file, so, when getting the usage data of the user met in the dummy row of querying condition, behind the position of the usage data getting the user in this dummy row in preference usage data attribute column file, need again the identity information obtaining the user of this position in multidimensional data user identity attribute column file, thus obtain Query Result customer group.
Optionally, the third situation, if data attribute row file corresponding to described and each attribute conditions comprises usage data attribute column file and preference usage data attribute column file, identical all row of the described data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, be specifically as follows: respectively going together mutually as first in the usage data attribute column file corresponding with attribute conditions of each data partition is fictitiously planned to implement, respectively going together mutually as second in the preference usage data attribute column file corresponding with attribute conditions in each data partition is fictitiously planned to implement, if the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition, if so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.Wherein, if identical with the described second fictitious identity information planning to implement corresponding user at the described first fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second, also comprise: in described subscriber identity information attribute column file, obtain the described first fictitious identity information planning to implement corresponding user before, in described multidimensional data user identity attribute column file, obtain the described second fictitious identity information planning to implement corresponding user; Judge that whether the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user.
Concrete, in the third situation, be with above-mentioned two situations difference, because the third situation is that data attribute row file existing usage data attribute column file corresponding to each attribute conditions in querying condition has again preference usage data attribute column file, in order to ensure that the usage data of the user in dummy row is same user, different from above-mentioned two situations, here need to be divided into two parts, be about to going together mutually in the usage data attribute column file corresponding with attribute conditions fictitiously to plan to implement as first, going together mutually as second in the preference usage data attribute column file corresponding with attribute conditions is fictitiously planned to implement, according to each first fictitious plan to implement corresponding user identity information and each second fictitious identity information planning to implement corresponding user judge whether this first fictitious planning to implement fictitiously with second to be planned to implement composition dummy row, and then whether the usage data application query condition judgment of user in this dummy row meets querying condition, concrete, when the first fictitious identity information planning to implement corresponding user is identical with the second fictitious identity information planning to implement corresponding user, then this first fictitious planning to implement fictitiously with second is planned to implement composition dummy row.
Further, above-mentioned positional information can be specially each data of offset units.
Optionally, after getting Query Result customer group, can also carry out statistical study further to this Query Result customer group, concrete, obtain the attributive analysis request of described Query Result customer group, described attributive analysis request comprises at least one attribute information; Data attribute row file corresponding to described attribute is obtained according to the attribute information in described attributive analysis request, in described data attribute row file, obtain the usage data that the identity information of each user in described Query Result customer group is corresponding, statistical study is carried out to described usage data and obtains user property analysis result.
The present embodiment, by obtaining querying condition, the data partition corresponding with each time conditions is obtained according at least one time conditions in querying condition, the data attribute row file corresponding with each attribute conditions is obtained according at least one attribute conditions in querying condition in each data partition, the usage data of the user meeting querying condition is searched in data attribute row file in each data partition, the present embodiment first filters out according to querying condition the data attribute row file met, the data attribute row file met is replaced in internal memory, in this data attribute row file, utilize querying condition to obtain the usage data meeting the user of querying condition again, inquiry velocity is fast, and it is lower to the requirement of memory size.
Adopt a specific embodiment below, the technical scheme of embodiment of the method shown in Fig. 5 is described in detail.
Fig. 6 is the process flow diagram of data enquire method embodiment two of the present invention, Fig. 7 be data enquire method of the present invention illustrate a schematic diagram, Fig. 8 be data enquire method of the present invention illustrate two schematic diagram, Fig. 9 be data enquire method of the present invention illustrate three schematic diagram, as shown in Figure 6, the method for the present embodiment can comprise:
S601, according to querying condition determination scan columns.
Concrete, querying condition comprises time conditions and attribute conditions, first determines data partition according to time conditions, then according to attribute conditions determination scan columns in data partition.As shown in Figure 7, for example, querying condition for (May short-message fee be greater than 30, voice toll charge is greater than 20, April, voice toll charge was greater than 40), determine the data partition of these two time periods in (May, April) according to time conditions, then according to attribute conditions (May short-message fee be greater than 30, voice toll charge is greater than 20, April voice toll charge be greater than 40) determine as shown in Figure 7 three scan columns.Namely according to querying condition, using the attribute column file in querying condition as scan columns, if relate to multiple periods of an attribute in querying condition, then the attribute column file in the data partition that each period is corresponding is also as a scan columns.
The citing of Fig. 7 only only comprises the situation of usage data attribute column file with the data attribute row file that attribute conditions is corresponding, another kind of situation, data attribute row file corresponding to attribute conditions only comprises preference usage data attribute column file, as shown in Figure 8, for example, querying condition is for (May, talk times was greater than 30, May, note number of times was greater than 50), the data partition in May is determined May according to time conditions, two scan columns are as shown in Figure 8 being determined according to attribute conditions (May, talk times was greater than 30, and May, note number of times was greater than 50).
Another situation, data attribute row file corresponding to attribute conditions not only comprises usage data attribute column file but also comprise preference usage data attribute column file, as shown in Figure 9, for example, querying condition for (May short-message fee be greater than 30, voice toll charge is greater than 20, May, talk times was greater than 30), according to the data partition determining May May, determining three scan columns as shown in Figure 9 according to attribute conditions condition.
S602, the dummy row of the data attribute row file of each data partition to be lined by line scan.
Concrete, do with above-mentioned citing and illustrate further, on the basis of schematic diagram shown in Fig. 7, scan columns is loaded in internal memory, usage data due to the user in scan columns is all that fixed length stores, and the identity information of the corresponding user that goes together mutually of each scan columns is identical, therefore each scan columns all only need record line number, often scan one and be about to this record line number from increasing, the corresponding dummy row of identical line number of each scan columns, the usage data of the user in this dummy row is virtual objects, utilizes querying condition to carry out the operations such as judgement on each dummy row.For example, as shown in Figure 7, the virtual objects of the usage data composition dummy row of the user of the identical line number of each attribute column, the virtual objects of such as the first row is (64,87,3).
On the basis of schematic diagram shown in Fig. 8, illustrate identical with above-mentioned Fig. 7, the identity information of the corresponding user that goes together mutually of each scan columns is identical, as shown in Figure 8, the virtual objects of the dummy row of the usage data composition of the user of the identical line number of each scan columns, the virtual objects of such as the first row is (94,51)
On the basis of schematic diagram shown in Fig. 9, illustrate different from above-mentioned Fig. 7 and Fig. 8, the identity information of the user that the identical line number of each scan columns is corresponding there will be not identical situation, this is the identity information of the user just needing each row in the scan columns to usage data attribute column file, compare with the identity information of the user of each row of the scan columns of preference usage data attribute column file and judge, by row composition dummy row identical for the identity information of user, obtain the virtual objects of dummy row, whether virtual objects meets querying condition to utilize querying condition to judge, the virtual objects of such as dummy row is (87, 3.94).
S603, acquisition scanning result.
Concrete, after scan columns is lined by line scan, obtain the identity information of user corresponding to the user data that meets querying condition, be recorded in system records user group.
After obtaining system records user group, secondary inquiry or component analysis can also be carried out based on this system records user group, at this moment, carry out having lined by line scan with regard to not needing, according to the identity information quick position of the user in system records user group to usage data corresponding to the identity information of this user, carry out secondary inquiry or component analysis to the usage data of this user, use such method can scanner section branch, search efficiency promotes further.
The data enquire method of the embodiment of the present invention, utilizes dummy row to carry out query analysis, and eliminate the consumption of table association when inquiring about in prior art, search efficiency is higher, thus can meet the mutual needs of UI.
Adopt a specific embodiment below, the technical scheme of above-described embodiment is described in detail.
The embodiment of the present invention be utilize the date storage method of above-mentioned enforcement and querying method to carry out adding up the user meeting specified requirements, and analyze qualified user and form situation.
Application background is certain telecom operators existence 1,000 ten thousand clients, and each client has Demographics, the 300 remainder attributes such as consumption, expense, consulting complaint, order, content-preference, position preference, industrial preference.
The data that Figure 10 provides for the embodiment of the present invention store, the particular flow sheet of querying method, and as shown in Figure 8, the method for the present embodiment can comprise:
S801: store customer data.
Concrete, set up the Customer ID row of 1,000 ten thousand clients, build Hash index.
Wherein, temporally cycle subregion, day to gather, day snapshot, gather by the moon, the moon snapshot each data attribute of each data partition build a data attribute row file, non-preference usage data attribute column file each data attribute row file arranges the position of identical line number with Customer ID, preserve the usage data value of this client.
S802: extract the client meeting querying condition.
Querying condition for analyzing Global Link brand, nearly three months monthly flow be all greater than 1G, and with the client of IT industry communication note number more than 10 times.
Concrete, analysis and consult condition, according to row definition, find the flow attribution row file in May, April, March, nearest client's brand generic row file, store and terminate positional information attribute column file, industry short message number of times attribute column file and industrial preference classification logotype attribute column file are scan columns, replace internal memory from disk.
According to row definition, obtain the data memory length of each row file, represent with length.
From the line number 1 of Customer ID attribute column file, read the value that scan columns is corresponding.Represent line number with n, length represents fixed-length data Type Length.Get the value of seven row: usage data attribute column file (comprises the flow attribution row file in May, April, March here simultaneously, nearest client's brand generic row file) get row head and start, (n-1) data of * length+1 to n*length position, n is line number, when getting the usage data in industry short message number of times attribute column file, first obtain n line number and store the value N terminated in positional information attribute column file, scan the data that N in industry categories of preferences identity property row file is capable again, if these data are IT ID, then obtain the line number of these data in industry categories of preferences identity property row file, obtain the data of going together mutually in industry short message number of times attribute column file, utilize these data and above-mentioned May, April, the flow attribution row file in March, the row composition dummy row of common identity information in nearest client's brand generic row file.
Application query condition in dummy row, judges whether client meets querying condition.
Continue scanning, until Customer ID column scan is complete, all qualified line numbers of system log (SYSLOG), are stored as customers.According to the information that client-requested returns, return client's number, or Customer ID list.
S803: analyze qualified client and form situation.
Analyze the client obtained in S802, the distribution in each flow section.
Concrete, according to the line number of system log (SYSLOG) customers, system can scan the partial row of flow row, skips most row.Read the client properties value of the upper corresponding line number of flow row successively, search the flow section that property value should belong to, the client's number in this flow section adds up.
After customers' line number has traveled through, system obtains the client's number in each flow section, and whole process generally completes in 3s.
It should be noted that, various embodiments of the present invention can be applied to statistical study, Data Mining, such as can be applied to marketing objectives customers to analyze, solve in millions and above client amount, and when across any number of data cycle, utilize the client properties precipitated in IT system, express-analysis, finally form the problem of marketing objectives customers accurately.Contribute to the precision promoting marketing, reduce cost of marketing, reduce harassing and wrecking client.
It should be noted that simultaneously, various embodiments of the present invention can be deployed to above common personal computer server (PC Server), low (not needing very high internal memory etc.) is required to host performance, practical effect simultaneously in business intelligence (Business Intelligence is called for short BI) data analysis field is higher than industry like product.
Figure 11 is the structural representation of data storage device embodiment one of the present invention, this data storage device can be applied in computer server, as shown in figure 11, the device of the present embodiment can comprise: identity information memory module 11 and data memory module 12, wherein, identity information memory module 11 is for for being stored in subscriber identity information attribute column file by the identity information of multiple user, in described subscriber identity information attribute column file, every a line stores the identity information of a user, data memory module 12 is for being stored in the usage data attribute column file of different data partitions according to the time cycle respectively by the usage data of multiple user, in described usage data attribute column file, every a line stores the usage data of an attribute of a described user.
Wherein, described data partition comprises at least one usage data attribute column file, the storage space of every a line of each usage data attribute column file is a regular length, the usage data of the different attribute of each user is stored in usage data attribute column files different in described data partition respectively, and the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
Further, the storage order of the usage data of institute's usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically comprises: the capable corresponding offset units lattice quantity storing the usage data of described user is identical with the capable corresponding offset units lattice quantity of the identity information storing described user.
Optionally, if the usage data of at least one attribute of described multiple user comprises the usage data of multiple difference preference's classification, described data memory module also for: be stored in successively by the usage data of described multiple difference preference's classifications of each user in the preference usage data attribute column file of described attribute, in the preference usage data attribute column file of described attribute, every a line stores the usage data of a categories of preferences of a described user; Categories of preferences corresponding for the usage data of described multiple difference preference's classifications of each user mark is stored in categories of preferences identity property row file successively, and the identity information of the user of the usage data of described multiple difference preference's classification is stored in multidimensional data user identity attribute column file; Wherein, in described preference usage data attribute column file, the storage order of the usage data of categories of preferences is identical with the storage order that categories of preferences in described categories of preferences identity property row file identifies, and identical with the storage order of the identity information of user in described multidimensional data user identity attribute column file; The storage obtaining the usage data of described multiple difference preference's classifications of described user according to the number of the usage data of described multiple difference preference's classifications of each user terminates positional information, the storage of each user being terminated positional information is stored in storage end positional information attribute column file, and described storage terminates the storage end positional information that every a line in positional information attribute column file stores a described user; Wherein, the storage order terminating positional information is stored in described storage end positional information attribute column file identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
Described storage terminates to store in positional information attribute column file that the storage order terminating positional information is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically to be comprised: the storage storing described user terminate offset units lattice quantity corresponding to the row of positional information with store described user identity information to go corresponding offset units lattice quantity identical.
Further, described data memory module 11 also for: by the newly-increased data of described multiple user, temporally the cycle is stored in newly-increased data partition, described newly-increased data partition comprises at least one newly-increased usage data attribute column file, the different attribute data of described newly-increased usage data be stored in respectively in each newly-increased usage data attribute column, in described newly-increased usage data attribute column file, every a line stores the newly-increased usage data of an attribute of a described user; Wherein, the storage space of every a line of each newly-increased usage data attribute column file is a regular length, and in described newly-increased data partition, the storage order of the newly-increased usage data of newly-increased usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
Optionally, if there is the usage data Added User within the time cycle belonging to described newly-increased data partition, described identity information memory module 11 also for the identity information Added User being stored in the afterbody of described subscriber identity information attribute column file, obtains new subscriber identity information attribute column file;
Described data memory module 12 also for the described usage data Added User is stored in described newly-increased data partition, described in the different attribute data of usage data that Add User be stored in described newly-increased usage data attribute column file respectively;
Wherein, the storage order of the newly-increased usage data of described newly-increased usage data attribute column file is identical with the identity information storage order of user in described new subscriber identity information attribute column file, and described newly-increased usage data comprises the newly-increased usage data of described multiple user and the described newly-increased usage data Added User
Further, described identity information memory module 11 also for the user index setting up described user according to the identity information of the multiple users in described identity information attribute column file, each subscriber identity information one_to_one corresponding in each user index and described subscriber identity information attribute column file; Obtain the identity information Added User, whether the identity information Added User described in judging according to the user index of described each user is present in described subscriber identity information attribute column file; If not, then the described identity information Added User is stored in the afterbody of described subscriber identity information attribute column file, the described data that Add User Added User are stored in newly-increased data partition.
The device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 1 or Fig. 2, it realizes principle and technique effect is similar, repeats no more herein.
Figure 12 is the structural representation of data query arrangement embodiment one of the present invention, as shown in figure 12, the device of the present embodiment can comprise: acquisition module 21 and processing module 22, wherein, acquisition module 21, for obtaining querying condition, described querying condition comprises at least one attribute conditions and at least one time conditions, processing module 22, for searching the data partition corresponding with each time conditions according at least one time conditions described, the data attribute row file corresponding with each attribute conditions is obtained according at least one attribute conditions described in each data partition, described data attribute row file comprises usage data attribute column file and preference usage data attribute column file, identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions, search the usage data of the user meeting described querying condition successively.
Optionally, if data attribute row file corresponding to described and each attribute conditions is usage data attribute column file, processing module 22 is for traveling through identical all row of described corresponding with each attribute conditions data attribute row file successively, search the usage data of the user meeting described querying condition, specifically comprise: will go together as dummy row mutually in the usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition, if so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
Wherein, described record meets identity information corresponding to the usage data of the user in the dummy row of described querying condition, obtain Query Result customer group, comprise: the primary importance of usage data in described usage data attribute column file obtaining the user met in the dummy row of described querying condition, the identity information of the user of described first position is obtained in described subscriber identity information attribute, record the identity information of described user, obtain Query Result customer group.
If data attribute row file corresponding to described and each attribute conditions is preference usage data attribute column file, described processing module 22 is for traveling through identical all row of described corresponding with each attribute conditions data attribute row file successively, search the usage data of the user meeting described querying condition, specifically comprise: will go together as dummy row mutually in the preference usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition, if so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
Wherein, the identity information that the usage data that described record meets the user in the dummy row of described querying condition is corresponding, obtain Query Result customer group, comprise: the second place of usage data in described preference usage data attribute column file obtaining the user met in the dummy row of described querying condition, the identity information of the user of described second position is obtained in described multidimensional data user identity attribute column file, record the identity information of described user, obtain Query Result customer group.
If data attribute row file corresponding to described and each attribute conditions comprises usage data attribute column file and preference usage data attribute column file, described processing module 22 is for traveling through identical all row of described corresponding with each attribute conditions data attribute row file successively, search the usage data of the user meeting described querying condition, specifically comprise: respectively going together mutually as first in the usage data attribute column file corresponding with attribute conditions of each data partition is fictitiously planned to implement, respectively going together mutually as second in the preference usage data attribute column file corresponding with attribute conditions in each data partition is fictitiously planned to implement, if the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition, if so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
Optionally, if described processing module 22 is identical with the described second fictitious identity information planning to implement corresponding user for the described first fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second, before also for: in described subscriber identity information attribute column file, obtain the described first fictitious identity information planning to implement corresponding user, in described multidimensional data user identity attribute column file, obtain the described second fictitious identity information planning to implement corresponding user; Judge that whether the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user.
Optionally, described primary importance or the described second place are offset units lattice quantity.
Further, processing module 22 can also be used for the attributive analysis request obtaining described Query Result customer group, and described attributive analysis request comprises at least one attribute information; Data attribute row file corresponding to described attribute is obtained according to the attribute information in described attributive analysis request, in described data attribute row file, obtain the usage data that the identity information of each user in described Query Result customer group is corresponding, statistical study is carried out to described usage data and obtains user property analysis result.
The device of the present embodiment, may be used for the technical scheme performing embodiment of the method shown in Fig. 5 or Fig. 6, it realizes principle and technique effect is similar, repeats no more herein.
One of ordinary skill in the art will appreciate that: all or part of step realizing above-mentioned each embodiment of the method can have been come by the hardware that programmed instruction is relevant.Aforesaid program can be stored in a computer read/write memory medium.This program, when performing, performs the step comprising above-mentioned each embodiment of the method; And aforesaid storage medium comprises: ROM, RAM, magnetic disc or CD etc. various can be program code stored medium.
Last it is noted that above each embodiment is only in order to illustrate technical scheme of the present invention, be not intended to limit; Although with reference to foregoing embodiments to invention has been detailed description, those of ordinary skill in the art is to be understood that: it still can be modified to the technical scheme described in foregoing embodiments, or carries out equivalent replacement to wherein some or all of technical characteristic; And these amendments or replacement, do not make the essence of appropriate technical solution depart from the scope of various embodiments of the present invention technical scheme.
Claims (32)
1. a date storage method, is characterized in that, comprising:
The identity information of multiple user is stored in subscriber identity information attribute column file, in described subscriber identity information attribute column file, every a line stores the identity information of a user;
Be stored in the usage data attribute column file of different data partitions according to the time cycle respectively by the usage data of described multiple user, in described usage data attribute column file, every a line stores the usage data of an attribute of a described user;
Wherein, described data partition comprises at least one usage data attribute column file, the storage space of every a line of each usage data attribute column file is a regular length, the usage data of the different attribute of each user is stored in usage data attribute column files different in described data partition respectively, and the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
2. method according to claim 1, is characterized in that, the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically to be comprised:
The offset units lattice quantity storing the row correspondence of the usage data of described user is identical with the offset units lattice quantity of the row correspondence of the identity information storing described user.
3. method according to claim 1, is characterized in that, if the usage data of at least one attribute of described multiple user comprises the usage data of multiple difference preference's classification, described method also comprises:
Be stored in successively by the usage data of described multiple difference preference's classifications of each user in the preference usage data attribute column file of described attribute, in the preference usage data attribute column file of described attribute, every a line stores the usage data of a categories of preferences of a described user;
Categories of preferences corresponding for the usage data of described multiple difference preference's classifications of each user mark is stored in categories of preferences identity property row file successively, and the identity information of the user of the usage data of described multiple difference preference's classification is stored in multidimensional data user identity attribute column file;
Wherein, in described preference usage data attribute column file, the storage order of the usage data of categories of preferences is identical with the storage order that categories of preferences in described categories of preferences identity property row file identifies, and identical with the storage order of the identity information of user in described multidimensional data user identity attribute column file;
The storage obtaining the usage data of described multiple difference preference's classifications of described user according to the number of the usage data of described multiple difference preference's classifications of each user terminates positional information, the storage of each user being terminated positional information is stored in storage end positional information attribute column file, and described storage terminates the storage end positional information that every a line in positional information attribute column file stores a described user;
Wherein, the storage order terminating positional information is stored in described storage end positional information attribute column file identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
4. method according to claim 3, it is characterized in that, the storage order storing end positional information in described storage end positional information attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically to be comprised:
The offset units lattice quantity of row correspondence of the identity information that the storage storing described user terminates offset units lattice quantity corresponding to the row of positional information and stores described user is identical.
5. the method according to any one of Claims 1-4, is characterized in that, described method also comprises:
By the newly-increased usage data of described multiple user, temporally the cycle is stored in newly-increased data partition, described newly-increased data partition comprises at least one newly-increased usage data attribute column file, the different attribute data of described newly-increased usage data be stored in each newly-increased usage data attribute column file respectively, in described newly-increased usage data attribute column file, every a line stores the newly-increased usage data of an attribute of a described user;
Wherein, the storage space of every a line of each newly-increased usage data attribute column file is a regular length, and in described newly-increased data partition, the storage order of the newly-increased usage data of newly-increased usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
6. method according to claim 5, is characterized in that, if there is the usage data Added User within the time cycle belonging to described newly-increased data partition, described method also comprises:
The identity information Added User is stored in the afterbody of described subscriber identity information attribute column file, obtains new subscriber identity information attribute column file;
The described usage data Added User is stored in described newly-increased data partition, described in the different attribute data of usage data that Add User be stored in described newly-increased usage data attribute column file respectively;
Wherein, the storage order of the newly-increased usage data of described newly-increased usage data attribute column file is identical with the identity information storage order of user in described new subscriber identity information attribute column file, and described newly-increased usage data comprises the newly-increased usage data of described multiple user and the described newly-increased usage data Added User.
7. method according to claim 1, is characterized in that, described method also comprises:
The user index of described user is set up, the identity information one_to_one corresponding of each user in each user index and described subscriber identity information attribute column file according to the identity information of the multiple users in described subscriber identity information attribute column file;
Obtain the identity information Added User, whether the identity information Added User described in judging according to the user index of described each user is present in described subscriber identity information attribute column file;
If not, then the described identity information Added User is stored in the afterbody of described subscriber identity information attribute column file, the described newly-increased usage data Added User is stored in newly-increased data partition.
8. utilize the date storage method described in any one of claim 1 to 7 to realize a method for data query, it is characterized in that, comprising:
Obtain querying condition, described querying condition comprises at least one attribute conditions and at least one time conditions;
The data partition corresponding with each time conditions is searched according at least one time conditions described, in each data partition, obtain the data attribute row file corresponding with each attribute conditions according at least one attribute conditions described, described data attribute row file comprises usage data attribute column file and preference usage data attribute column file;
Identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions, search the usage data of the user meeting described querying condition successively.
9. method according to claim 8, it is characterized in that, if data attribute row file corresponding to described and each attribute conditions is usage data attribute column file, identical all row of the described data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
To go together as dummy row mutually in the usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
10. method according to claim 9, is characterized in that, described record meets identity information corresponding to the usage data of the user in the dummy row of described querying condition, obtains Query Result customer group, comprising:
Obtain the primary importance of usage data in described usage data attribute column file of the user met in the dummy row of described querying condition, the identity information of the user of described first position is obtained in described subscriber identity information attribute, record the identity information of described user, obtain Query Result customer group.
11. methods according to claim 8, it is characterized in that, if data attribute row file corresponding to described and each attribute conditions is preference usage data attribute column file, identical all row of the described data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
To go together as dummy row mutually in the preference usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
12. methods according to claim 11, is characterized in that, described record meets identity information corresponding to the usage data of the user in the dummy row of described querying condition, obtains Query Result customer group, comprising:
Obtain the second place of usage data in described preference usage data attribute column file of the user met in the dummy row of described querying condition, the identity information of the user of described second position is obtained in described multidimensional data user identity attribute column file, record the identity information of described user, obtain Query Result customer group.
13. methods according to claim 8, it is characterized in that, if data attribute row file corresponding to described and each attribute conditions comprises usage data attribute column file and preference usage data attribute column file, identical all row of the described data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
Respectively going together mutually as first in the usage data attribute column file corresponding with attribute conditions of each data partition is fictitiously planned to implement, respectively going together mutually as second in the preference usage data attribute column file corresponding with attribute conditions in each data partition is fictitiously planned to implement;
If the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second;
Obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
14. methods according to claim 13, it is characterized in that, if the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second, also comprises before:
In described subscriber identity information attribute column file, obtain the described first fictitious identity information planning to implement corresponding user, in described multidimensional data user identity attribute column file, obtain the described second fictitious identity information planning to implement corresponding user;
Judge that whether the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user.
15. methods according to claim 10 or 12, it is characterized in that, described primary importance or the described second place are offset units lattice quantity.
Method described in 16. any one of according to Claim 8 to 14, it is characterized in that, described method also comprises:
Obtain the attributive analysis request of described Query Result customer group, described attributive analysis request comprises at least one attribute information;
Data attribute row file corresponding to described attribute is obtained according to the attribute information in described attributive analysis request, in described data attribute row file, obtain the usage data that the identity information of each user in described Query Result customer group is corresponding, statistical study is carried out to described usage data and obtains user property analysis result.
17. 1 kinds of data storage devices, is characterized in that, comprising:
Identity information memory module, for the identity information of multiple user is stored in subscriber identity information attribute column file, in described subscriber identity information attribute column file, every a line stores the identity information of a user;
Data memory module, for being stored in the usage data attribute column file of different data partitions according to the time cycle respectively by the usage data of multiple user, in described usage data attribute column file, every a line stores the usage data of an attribute of a described user;
Wherein, described data partition comprises at least one usage data attribute column file, the storage space of every a line of each usage data attribute column file is a regular length, the usage data of the different attribute of each user is stored in usage data attribute column files different in described data partition respectively, and the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
18. devices according to claim 17, is characterized in that, the storage order of the usage data of described usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically to be comprised:
The offset units lattice quantity storing the row correspondence of the usage data of described user is identical with the offset units lattice quantity of the row correspondence of the identity information storing described user.
19. devices according to claim 17, is characterized in that, if the usage data of at least one attribute of described multiple user comprises the usage data of multiple difference preference's classification, described data memory module also for:
Be stored in successively by the usage data of described multiple difference preference's classifications of each user in the preference usage data attribute column file of described attribute, in the preference usage data attribute column file of described attribute, every a line stores the usage data of a categories of preferences of a described user;
Categories of preferences corresponding for the usage data of described multiple difference preference's classifications of each user mark is stored in categories of preferences identity property row file successively, and the identity information of the user of the usage data of described multiple difference preference's classification is stored in multidimensional data user identity attribute column file;
Wherein, in described preference usage data attribute column file, the storage order of the usage data of categories of preferences is identical with the storage order that categories of preferences in described categories of preferences identity property row file identifies, and identical with the storage order of the identity information of user in described multidimensional data user identity attribute column file;
The storage obtaining the usage data of described multiple difference preference's classifications of described user according to the number of the usage data of described multiple difference preference's classifications of each user terminates positional information, the storage of each user being terminated positional information is stored in storage end positional information attribute column file, and described storage terminates the storage end positional information that every a line in positional information attribute column file stores a described user;
Wherein, the storage order terminating positional information is stored in described storage end positional information attribute column file identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
20. devices according to claim 19, it is characterized in that, the storage order storing end positional information in described storage end positional information attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file specifically to be comprised:
The offset units lattice quantity of row correspondence of the identity information that the storage storing described user terminates offset units lattice quantity corresponding to the row of positional information and stores described user is identical.
21., according to claim 17 to the device described in 20 any one, is characterized in that, described data memory module also for:
By the newly-increased usage data of described multiple user, temporally the cycle is stored in newly-increased data partition, described newly-increased data partition comprises at least one newly-increased usage data attribute column file, the different attribute data of described newly-increased usage data be stored in each newly-increased usage data attribute column file respectively, in described newly-increased usage data attribute column file, every a line stores the newly-increased usage data of an attribute of a described user;
Wherein, the storage space of every a line of each newly-increased usage data attribute column file is a regular length, and in described newly-increased data partition, the storage order of the newly-increased usage data of newly-increased usage data attribute column file is identical with the storage order of the identity information of user in described subscriber identity information attribute column file.
22. devices according to claim 21, it is characterized in that, if there is the usage data Added User within the time cycle belonging to described newly-increased data partition, described identity information memory module also for the identity information Added User being stored in the afterbody of described subscriber identity information attribute column file, obtains new subscriber identity information attribute column file;
Described data memory module also for the described usage data Added User is stored in described newly-increased data partition, described in the different attribute data of usage data that Add User be stored in described newly-increased usage data attribute column file respectively;
Wherein, the storage order of the newly-increased usage data of described newly-increased usage data attribute column file is identical with the identity information storage order of user in described new subscriber identity information attribute column file, and described newly-increased usage data comprises the newly-increased usage data of described multiple user and the described newly-increased usage data Added User.
23. devices according to claim 17, is characterized in that, described identity information memory module also for:
The user index of described user is set up, the identity information one_to_one corresponding of each user in each user index and described subscriber identity information attribute column file according to the identity information of the multiple users in described subscriber identity information attribute column file;
Obtain the identity information Added User, whether the identity information Added User described in judging according to the user index of described each user is present in described subscriber identity information attribute column file;
If not, then the described identity information Added User is stored in the afterbody of described subscriber identity information attribute column file, the described newly-increased usage data Added User is stored in newly-increased data partition.
24. 1 kinds of devices utilizing the data storage device described in any one of claim 17 to 23 to realize data query, is characterized in that, comprising:
Acquisition module, for obtaining querying condition, described querying condition comprises at least one attribute conditions and at least one time conditions;
Processing module, for searching the data partition corresponding with each time conditions according at least one time conditions described, in each data partition, obtain the data attribute row file corresponding with each attribute conditions according at least one attribute conditions described, described data attribute row file comprises usage data attribute column file and preference usage data attribute column file; Identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions, search the usage data of the user meeting described querying condition successively.
25. devices according to claim 24, it is characterized in that, if data attribute row file corresponding to described and each attribute conditions is usage data attribute column file, described processing module is specifically for traveling through identical all row of described corresponding with each attribute conditions data attribute row file successively, search the usage data of the user meeting described querying condition, specifically comprise:
To go together as dummy row mutually in the usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
26. devices according to claim 25, is characterized in that, described record meets identity information corresponding to the usage data of the user in the dummy row of described querying condition, obtains Query Result customer group, comprising:
Obtain the primary importance of usage data in described usage data attribute column file of the user met in the dummy row of described querying condition, the identity information of the user of described first position is obtained in described subscriber identity information attribute, record the identity information of described user, obtain Query Result customer group.
27. devices according to claim 24, it is characterized in that, if data attribute row file corresponding to described and each attribute conditions is preference usage data attribute column file, described processing module is used for identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
To go together as dummy row mutually in the preference usage data attribute column file corresponding with each attribute conditions of each data partition respectively, obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
28. devices according to claim 27, is characterized in that, the identity information that the usage data that described record meets the user in the dummy row of described querying condition is corresponding, obtain Query Result customer group, comprising:
Obtain the second place of usage data in described preference usage data attribute column file of the user met in the dummy row of described querying condition, the identity information of the user of described second position is obtained in described multidimensional data user identity attribute column file, record the identity information of described user, obtain Query Result customer group.
29. devices according to claim 24, it is characterized in that, if data attribute row file corresponding to described and each attribute conditions comprises usage data attribute column file and preference usage data attribute column file, described processing module is used for identical all row of the data attribute row file that traversal is described corresponding with each attribute conditions successively, search the usage data of the user meeting described querying condition, specifically comprise:
Respectively going together mutually as first in the usage data attribute column file corresponding with attribute conditions of each data partition is fictitiously planned to implement, respectively going together mutually as second in the preference usage data attribute column file corresponding with attribute conditions in each data partition is fictitiously planned to implement;
If the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second;
Obtain the usage data of the user in each dummy row, judge whether the usage data of the user in described each dummy row meets described querying condition;
If so, then record the identity information that the usage data of the user met in the dummy row of described querying condition is corresponding, obtain Query Result customer group.
30. devices according to claim 29, it is characterized in that, if it is identical with the described second fictitious identity information planning to implement corresponding user that described processing module is used for the described first fictitious identity information planning to implement corresponding user, then described first fictitious planning to implement fictitiously is planned to implement as dummy row with described second, before also for:
In described subscriber identity information attribute column file, obtain the described first fictitious identity information planning to implement corresponding user, in described multidimensional data user identity attribute column file, obtain the described second fictitious identity information planning to implement corresponding user;
Judge that whether the described first fictitious identity information planning to implement corresponding user is identical with the described second fictitious identity information planning to implement corresponding user.
31. devices according to claim 26 or 28, it is characterized in that, described primary importance or the described second place are offset units lattice quantity.
32. devices according to any one of claim 24 to 31, is characterized in that, described processing module also for:
Obtain the attributive analysis request of described Query Result customer group, described attributive analysis request comprises at least one attribute information;
Data attribute row file corresponding to described attribute is obtained according to the attribute information in described attributive analysis request, in described data attribute row file, obtain the usage data that the identity information of each user in described Query Result customer group is corresponding, statistical study is carried out to described usage data and obtains user property analysis result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510053228.1A CN104574159B (en) | 2015-01-30 | 2015-01-30 | Data storage, querying method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201510053228.1A CN104574159B (en) | 2015-01-30 | 2015-01-30 | Data storage, querying method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN104574159A true CN104574159A (en) | 2015-04-29 |
CN104574159B CN104574159B (en) | 2018-01-23 |
Family
ID=53090156
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201510053228.1A Active CN104574159B (en) | 2015-01-30 | 2015-01-30 | Data storage, querying method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN104574159B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106471501A (en) * | 2016-03-24 | 2017-03-01 | 华为技术有限公司 | The method of data query, the storage method data system of data object |
CN107391506A (en) * | 2016-05-16 | 2017-11-24 | 华为软件技术有限公司 | Method and apparatus for inquiring about data |
CN109447694A (en) * | 2018-10-11 | 2019-03-08 | 上海瀚之友信息技术服务有限公司 | A kind of user feature analysis method and its system |
CN111652433A (en) * | 2020-06-02 | 2020-09-11 | 泰康保险集团股份有限公司 | Endowment expense measuring and calculating device |
CN116069260A (en) * | 2023-02-23 | 2023-05-05 | 摩尔线程智能科技(北京)有限责任公司 | Data processing apparatus, data processing method, computer device, and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799634A (en) * | 2012-06-26 | 2012-11-28 | 中国农业银行股份有限公司 | Data storage method and device |
CN103761316A (en) * | 2014-01-26 | 2014-04-30 | 北京中电普华信息技术有限公司 | Data compression storage method and device based on sparse matrix |
CN103902544A (en) * | 2012-12-25 | 2014-07-02 | 中国移动通信集团公司 | Data processing method and system |
-
2015
- 2015-01-30 CN CN201510053228.1A patent/CN104574159B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102799634A (en) * | 2012-06-26 | 2012-11-28 | 中国农业银行股份有限公司 | Data storage method and device |
CN103902544A (en) * | 2012-12-25 | 2014-07-02 | 中国移动通信集团公司 | Data processing method and system |
CN103761316A (en) * | 2014-01-26 | 2014-04-30 | 北京中电普华信息技术有限公司 | Data compression storage method and device based on sparse matrix |
Cited By (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106471501A (en) * | 2016-03-24 | 2017-03-01 | 华为技术有限公司 | The method of data query, the storage method data system of data object |
WO2017161540A1 (en) * | 2016-03-24 | 2017-09-28 | 华为技术有限公司 | Data query method, data object storage method and data system |
CN106471501B (en) * | 2016-03-24 | 2020-04-14 | 华为技术有限公司 | Data query method, data object storage method and data system |
CN107391506A (en) * | 2016-05-16 | 2017-11-24 | 华为软件技术有限公司 | Method and apparatus for inquiring about data |
CN109447694A (en) * | 2018-10-11 | 2019-03-08 | 上海瀚之友信息技术服务有限公司 | A kind of user feature analysis method and its system |
CN109447694B (en) * | 2018-10-11 | 2022-04-12 | 上海瀚之友信息技术服务有限公司 | User characteristic analysis method and system |
CN111652433A (en) * | 2020-06-02 | 2020-09-11 | 泰康保险集团股份有限公司 | Endowment expense measuring and calculating device |
CN111652433B (en) * | 2020-06-02 | 2023-04-18 | 泰康保险集团股份有限公司 | Endowment expense measuring and calculating device |
CN116069260A (en) * | 2023-02-23 | 2023-05-05 | 摩尔线程智能科技(北京)有限责任公司 | Data processing apparatus, data processing method, computer device, and storage medium |
CN116069260B (en) * | 2023-02-23 | 2024-03-22 | 摩尔线程智能科技(北京)有限责任公司 | Data processing apparatus, data processing method, computer device, and storage medium |
Also Published As
Publication number | Publication date |
---|---|
CN104574159B (en) | 2018-01-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN100504866C (en) | Integrative searching result sequencing system and method | |
CN104574159A (en) | Data storage and query method and device | |
CN102867071B (en) | Management method for massive network management historical data | |
US20140101167A1 (en) | Creation of Inverted Index System, and Data Processing Method and Apparatus | |
CN107861989A (en) | Partitioned storage method, apparatus, computer equipment and the storage medium of data | |
CN107807932B (en) | Hierarchical data management method and system based on path enumeration | |
CN106649602B (en) | Business object data processing method, device and server | |
CN105956123A (en) | Local updating software-based data processing method and apparatus | |
CN105868421A (en) | Data management method and data management device | |
KR20140093535A (en) | Method for parallel mining of temporal relations in large event file | |
CN105989102A (en) | Method and device for deleting backup data | |
CN102081649B (en) | Method and system for searching computer files | |
CN102779138A (en) | Hard disk access method of real time data | |
CN111506569A (en) | Data storage method and device and electronic device | |
CN106933836A (en) | A kind of date storage method and system based on point table | |
CN107844271A (en) | A kind of method, apparatus and computer-readable recording medium for being classified storage | |
CN105630934A (en) | Data statistic method and system | |
CN103064908A (en) | Method for rapidly removing repeated list through a memory | |
CN106970856A (en) | Data are backed up, recover and carry data management system and method | |
CN102567528B (en) | Method and device for reading mass data | |
CN106033438A (en) | Public sentiment data storage method and server | |
CN101963993A (en) | Method for fast searching database sheet table record | |
CN104881475A (en) | Method and system for randomly sampling big data | |
CN106570005A (en) | Database cleaning method and device | |
CN107665116A (en) | Page resource position information processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |