The preprocess method of buffer area data and system
Technical field
The present invention relates to a kind of preprocess method and system of data, particularly relate to a kind of preprocess method and the system that are applied to buffer area data.
Background technology
A kind of technology of employing that current Data Preprocessing Technology is mainly simple, and electronic commerce data has sudden strong and instantaneous data and is responsible for the features such as exception is heavy, a kind for the treatment of technology of simple use can cause very large data processing load, can not meet the demand of ecommerce.
First Input First Output is a kind of traditional manner of execution according to the order of sequence, and when buffer area data are full, the data/commands entering buffer area at first first completes and performs and leave buffer area, and then just performs Article 2 data/commands.It is a kind of data buffer of first in first out, the difference of he and normal memory does not have exterior read-write address wire, use very simple like this, but shortcoming can only be sequentially written in data exactly, the sense data of order, its data address is read and write pointer by inside and is automatically added 1 and complete, and can not can be determined read or write certain address of specifying by address wire as normal memory, and it accurately can not estimate user's query time in electronic commerce data system, the residence time, query contents; Statistical method, utilize mathematical statistics method, the system frequency of statistics, preferably be there is buffer area in any active ues information, be buffered in by color register in the buffer area of answering with the Color pair of the region of memory of the physical address of current accessed in buffer, the service efficiency of buffer memory can be improved, improve system performance, but the method still cannot meet the feature of electronic commerce data.
The invention provides a kind of preprocess method of buffer area data, the method is by the method for machine learning, the code of conduct of research user, prediction user query time, each working time and query contents etc., system will arrange buffer area data in advance according to information of forecasting, thus make the inquiry of user experience optimization.
Summary of the invention
Embodiments provide a kind of preprocess method of buffer area data, the method is by the method for machine learning, the code of conduct of research user, prediction user query time, each working time and query contents etc., system will arrange buffer area data in advance according to information of forecasting, thus make the inquiry of user experience optimization.
For achieving the above object, embodiments of the invention adopt following technical scheme:
First aspect present invention provides a kind of buffer area data preprocessing method, comprising:
Record structure foundation data, to basic data pre-service;
Set up LEAST SQUARES MODELS FITTING modelling customer behavior, the data relationship between parameter such as prediction user job time and query contents etc.;
Store and input the data of reception to buffer area from buffer memory, export from described buffer area according to first in first out order.
Preferably, according to first aspect, described record structure foundation data, specifically comprise:
Basic data refers to user's query time TimeUserQuery, user residence time TimeUserStand and user's query contents ContentUserQuery.Structure TimeUserQuery, TimeUserStand and ContentUserQuery interface function obtains the query time of client user from initial server end, the residence time and query contents; Preset timer Timer in described TimeUserQuery and TimeUserStand function, and adopt cookie ActiveX Techniques, obtain query time and the residence time of user in current behavior; The data collected are sent to destination server end by the mode that GET, POST are asynchronous; Described basic data is shown to described destination server end by interface with JSON form.
Preferably, described user's query contents ContentUserQuery, specifically comprises:
The manipulable all query contents of systemic presupposition user have the one in Loading, Unloading, Cargo, Carrier and Route or its combination in any (different industries and the predeterminable different query contents of demand), the parameter of ContentUserQuery interface function is Loading, Unloading, Cargo, Carrier and Route, according to the different operating behavior of user, return different with the parameter value shown, the parameter rreturn value of having carried out described query contents is set to 1, and the parameter rreturn value of not carrying out described query contents is set to 0.
Preferably, according to first aspect, described to basic data pre-service, specifically comprise:
After described destination server receives rreturn value and returned content, system uses the Parse method of JObject or JArray that JSON character string is converted to JSON object, extract described basic data by the mode of described JSON object, namely the association analyzed between described basic data query contents and query time constructs the graph of a relation of Loading, Unloading, Cargo, Carrier, Route and TimeUserQuery and TimeUserStand.
Preferably, according to first aspect, the graph of a relation of described structure Loading, Unloading, Cargo, Carrier, Route and TimeUserQuery and TimeUserStand, a kind of possibility implementation be:
Preferably, in described graph of a relation, TimeUserQuery and TimeUserStand separately as dependent variable and Loading, Unloading, Cargo, Carrier, Route as independent variable, observe figure and find to have certain linear regression trend, consideration least square method is made prediction.
Preferably, least square method is a kind of mathematical optimization techniques, it finds the optimal function coupling of data by the quadratic sum of minimum error, utilize least square method can try to achieve unknown data easily, and the quadratic sum of error is minimum between the data making these try to achieve and real data, can in the hope of the optimal value of objective function.
Step 1: described destination server receives the repeatedly query manipulation of a user, and described user queried the one or more of described query contents, if query contents is n, the time that user inquires about each described query contents is designated as respectively:
T=(t
1,t
2,t
3,...t
i...,t
n) (1)
Wherein t
iexpression user inquires about described query time during described i-th query contents.
Step 2: the described query time of m the described query contents of inquiry of a user is expressed as:
y(t
1,K,t
n;x
0,x
1,K,x
n)=x
0+x
1t
1+Λ+x
nt
n(2)
Wherein y representative of consumer inquires about the working time of described query contents, x
0, x
1, K, x
nrepresent model parameter, this parameter makes the quadratic sum of actual value and observed difference minimum, usually gets x
0=1, be expressed as with system of linear equations:
y
1=x
0+x
1t
11+Λ+x
jt
1j+Λ+x
nt
1n
y
2=x
0+x
1t
21+Λ+x
jt
2j+Λ+x
nt
2n
MM
y
i=x
0+x
1t
i1+Λ+x
jt
ij+Λ+x
nt
in(3)
M
y
m=x
0+x
1t
m1+Λ+x
jt
mj+Λ+x
nt
mn
Wherein y
irepresent that described user inquires about described query contents query time used for i-th time, t
ijrepresent that described user inquires about described jth item query contents query time used for i-th time.
Usually by t
ijbe denoted as data matrix A, described model parameter x
ibe denoted as parameter vector X, query time y described in user
ibe denoted as Y, then system of linear equations can be expressed as:
I.e. AX=Y (4)
Wherein,
Step 3: the value of the query time of matching real user behavior and the described model parameter matrix X of query contents is:
The observability estimate value of a described query contents is inquired about by LEAST SQUARES MODELS FITTING definable user
with the estimated value of described model parameter
Wherein i=1,2, Λ, n, k=1,2, Λ, m.(6)
Obtain:
Wherein
So obtain the estimated value system of equations with described model parameter:
Obtain user according to (8) (9) to inquire about the observed reading of described query contents time used and estimated value and close and be:
According to the principle of least square, the value of described model parameter is:
The estimated value finally obtaining described model parameter is:
Step 4: the TimeUserQuery time predicting described user:
wherein t
iexpression user inquires about described query time during described i-th query contents.X
irepresent the described model parameter that i-th described query contents is corresponding, wherein x
0=1.If user only carries out Cargo operation, prediction Cargo query time is:
y
3=x
0+t
3x
3。(13)
Wherein a SessionId is set in tables of data respectively for described query contents Loading, Unloading, Cargo, Carrier, Route.In above-mentioned steps 4, directly obtain related parameter values by described SessionId, and the data obtained are inputted the raw data of data as buffer area.
Preferably, second aspect, provides a kind of buffer area data preprocessing method, also comprises:
Master cache district is arranged to the data storing and receive from buffer memory input, cache controller is used for optionally being routed to buffer area for subsequent use from described buffer zone by described reception data, makes the described data received from buffer memory input from described buffer area for subsequent use, described reception data can be outputted to described buffer memory according to FIFO order and exports.
Preferably, described buffer memory for subsequent use for storing the described reception data of the input of described buffer memory or the reception of storage master cache, and exports described reception data are outputted to described buffer memory with the order in described master cache identical reception data.
Preferably, the effect of described cache controller is when described master cache is empty data mode, described master cache is from buffer memory input to described buffer memory transmission data for subsequent use, or when described buffer memory for subsequent use is full data mode, described buffer memory for subsequent use is from buffer memory input to described master cache transmission data, or when described master cache data mode is not empty, described reception data are from buffer memory input to described master cache transmission data.
Preferably, described master cache and buffer memory for subsequent use can store the independently fifo queue of different types of data and the data space of master cache is greater than the data space of buffer memory for subsequent use.
Preferably, the third aspect, provides a kind of buffer area data pretreatment, comprising:
Conveyer: send the data to buffer area; Buffer area: for receiving data from conveyer, and according to the order of first-in first-out, the data received are sent to receiving trap; Receiving trap: for receiving the data come from buffer area.
Wherein, first described system is trained data and is processed, and because data volume is comparatively large, first puts it into buffer area by transmitting device.
Preferably, according to the one of the third aspect may implementation be:
Buffer area comprises master cache and buffer memory for subsequent use, and described master cache is configured to be mainly used in storing the data received from buffer memory input; Described buffer memory for subsequent use is mainly used for the described reception data storing the input of described buffer memory or store master cache reception, and exports described reception data are outputted to described buffer memory with the order in described master cache identical reception data.
Preferably, described buffer area also comprises cache controller, it is when described master cache is for full data mode, described master cache is from buffer memory input to described buffer memory transmission data for subsequent use, or when described buffer memory for subsequent use is full data mode, described buffer memory for subsequent use is from buffer memory input to described master cache transmission data, or when described data cached state for subsequent use is discontented with, described master cache is from buffer memory input to described buffer memory transmission data for subsequent use.
Preferably, according to the third aspect, the second mode in the cards is:
Secondly in order to improve the performance of described system, first adopt least square method constantly to train and pre-service data, be the multiple buffer area of system configuration, the data space of last master cache is greater than the storage space of buffer memory for subsequent use.
Accompanying drawing explanation
In order to be illustrated more clearly in embodiments of the invention or technical scheme of the prior art, be briefly described to the accompanying drawing used required in embodiment or description of the prior art below, the accompanying drawing in the following describes is only some embodiments of the present invention.
The partial function interface diagram of a kind of buffer area data preprocessing method that Fig. 1 provides for embodiments of the invention;
Fig. 2 provides least square method to ask the schematic flow sheet of model parameter for embodiments of the invention;
A kind of buffer area data preprocessing method schematic flow sheet that Fig. 3 provides for embodiments of the invention;
The structural representation of a kind of buffer area data pretreatment that Fig. 4 provides for embodiments of the invention.
Embodiment
For making the technical problem to be solved in the present invention, technical scheme and advantage clearly, below in conjunction with the accompanying drawings and the specific embodiments.
Embodiments of the invention provide a kind of buffer area data preprocessing method and system.The present invention can be used for buffer area data prediction, first for the parameter such as behavior record query contents and working time of certain user on platform, based on data carry out recording and pre-service, set up LEAST SQUARES MODELS FITTING according to described pre-service basic data and carry out modelling customer behavior, data relationship between parameter such as prediction user's query time and query contents etc., the data obtained distribute to buffer area as the data received from buffer memory input, export from described buffer area according to first in first out order.
Concrete, embodiments of the invention provide a kind of buffer area data preprocessing method and system, according to parameters such as the behavior record query contents of certain user on platform and working times with reference to shown in Fig. 1, comprise following content:
The query contents of user described in the behavior record of recording user on platform and working time, based on data, specifically comprise:
Basic data refers to user's query time TimeUserQuery, user residence time TimeUserStand and user's query contents ContentUserQuery.Structure TimeUserQuery, TimeUserStand and ContentUserQuery interface function obtains the query time of client user from initial server end, the residence time and query contents; Preset timer Timer in described TimeUserQuery and TimeUserStand function, and adopt cookie ActiveX Techniques, obtain query time and the residence time of user in current behavior; The data collected are sent to destination server end by the mode that GET, POST are asynchronous; Described basic data is shown to described destination server end by interface with JSON form.
Described user's query contents ContentUserQuery, specifically comprise: the manipulable all query contents of systemic presupposition user have Loading, Unloading, Cargo, Carrier and Route (different industries and the predeterminable different query contents of demand), the parameter of ContentUserQuery interface function is Loading, Unloading, Cargo, Carrier and Route, according to the different operating behavior of user, return different with the parameter value shown, the parameter rreturn value of having carried out described query contents is set to 1, the parameter rreturn value of not carrying out described query contents is set to 0.
After recording described basic data, pre-service is carried out to described basic data, specifically comprise: after described destination server receives rreturn value and returned content, system uses the Parse method of JObject or JArray that JSON character string is converted to JSON object, extract described basic data by the mode of described JSON object, namely the association analyzed between described basic data query contents and query time constructs the graph of a relation of Loading, Unloading, Cargo, Carrier, Route and TimeUserQuery and TimeUserStand.The graph of a relation of structure Loading, Unloading, Cargo, Carrier, Route and TimeUserQuery and TimeUserStand, a kind of possibility implementation be:
In described graph of a relation, TimeUserQuery and TimeUserStand separately as dependent variable and Loading, Unloading, Cargo, Carrier, Route as independent variable, observe figure and find that there is certain linear regression trend, consider to use least square method model and forecast.
The embodiment provides the model and forecast flow process of least square method, and try to achieve the optimum solution of model parameter, with reference to shown in Fig. 2, comprise the following steps:
Least square method is a kind of mathematical optimization techniques, it finds the optimal function coupling of data by the quadratic sum of minimum error, utilize least square method can try to achieve unknown data easily, and the quadratic sum of error is minimum between the data making these try to achieve and real data, can in the hope of the optimal value of objective function.
Step 1: described destination server receives the repeatedly query manipulation of a user, and described user queried the one or more of described query contents, if query contents is n, the time that user inquires about each described query contents is designated as respectively:
T=(t
1,t
2,t
3,...t
i...,t
n) (1)
Wherein t
iexpression user inquires about described query time during described i-th query contents.
Step 2: the described query time of m the described query contents of inquiry of a user is expressed as:
y(t
1,K,t
n;x
0,x
1,K,x
n)=x
0+x
1t
1+Λ+x
nt
n(2)
Wherein y representative of consumer inquires about the working time of described query contents, x
0, x
1, K, x
nrepresent model parameter, this parameter makes the quadratic sum of actual value and observed difference minimum, usually gets x
0=1, be expressed as with system of linear equations:
y
1=x
0+x
1t
11+Λ+x
jt
1j+Λ+x
nt
1n
y
2=x
0+x
1t
21+Λ+x
jt
2j+Λ+x
nt
2n
MM
y
i=x
0+x
1t
i1+Λ+x
jt
ij+Λ+x
nt
in(3)
M
y
m=x
0+x
1t
m1+Λ+x
jt
mj+Λ+x
nt
mn
Wherein y
irepresent that described user inquires about described query contents query time used for i-th time, t
ijrepresent that described user inquires about described jth item query contents query time used for i-th time.
Usually by t
ijbe denoted as data matrix A, described model parameter x
ibe denoted as parameter vector X, query time y described in user
ibe denoted as Y, then system of linear equations can be expressed as:
I.e. AX=Y (4)
Wherein,
Step 3: the value of the query time of matching real user behavior and the described model parameter matrix X of query contents is:
The observability estimate value of a described query contents is inquired about by LEAST SQUARES MODELS FITTING definable user
with the estimated value of described model parameter
Wherein i=1,2, Λ, n, k=1,2, Λ, m.(6)
Obtain:
Wherein
So obtain the estimated value system of equations with described model parameter:
Obtain user according to (8) (9) to inquire about the observed reading of described query contents time used and estimated value and close and be:
According to the principle of least square, the value of described model parameter is:
The estimated value finally obtaining described model parameter is:
Step 4: the TimeUserQuery time predicting described user:
Wherein t
iexpression user inquires about described query time during described i-th query contents.X
irepresent the described model parameter that i-th described query contents is corresponding, wherein x
0=1.If user only carries out Cargo operation, prediction Cargo query time is:
y
3=x
0+t
3x
3。
Wherein a SessionId is set in tables of data respectively for described query contents Loading, Unloading, Cargo, Carrier, Route.In above-mentioned steps 4, directly obtain related parameter values by described SessionId, and the data obtained are inputted the raw data of data as buffer area.
The embodiment provides a kind of buffer area data preprocessing method, the data that described pretreated basic data receives as described buffer memory input, the operational scheme in platform, with reference to shown in Fig. 3, comprises following content:
Master cache district is arranged to the data storing and receive from buffer memory input, cache controller is used for optionally being routed to buffer area for subsequent use from described buffer zone by described reception data, makes the described data received from buffer memory input from described buffer area for subsequent use, described reception data can be outputted to described buffer memory according to FIFO order and exports.
Described buffer memory for subsequent use for storing the described reception data of the input of described buffer memory or the reception of storage master cache, and exports described reception data are outputted to described buffer memory with the order in described master cache identical reception data.
The effect of described cache controller is when described master cache is empty data mode, and described master cache is from buffer memory input to described buffer memory transmission data for subsequent use;
Or;
When described buffer memory for subsequent use is full data mode, described buffer memory for subsequent use is from buffer memory input to described master cache transmission data;
Or;
When described master cache is not empty data mode, described reception data are from buffer memory input to described master cache transmission data.
Described master cache and buffer memory for subsequent use can store the independently fifo queue of different types of data and the data space of master cache is greater than the data space of buffer memory for subsequent use.
Upgrade the store status of buffer area, receive request of data;
It is complete that buffer area data are set in advance.
The embodiment provides a kind of buffer area data pretreatment, with reference to shown in Fig. 4, comprise following content:
Conveyer: send the data to buffer area; Buffer area: for receiving data from conveyer, and according to the order of first-in first-out, the data received are sent to receiving trap; Receiving trap: for receiving the data come from buffer area.
First described a kind of buffer area data preprocessing method is trained data and is processed, and because data volume is comparatively large, first puts it into buffer area; Described buffer area receives data from conveyer, and according to the order of first-in first-out, the data received is sent to receiving trap.
Buffer area comprises master cache and buffer memory for subsequent use, and described master cache is configured to be mainly used in storing the data received from buffer memory input; Described buffer memory for subsequent use is mainly used for the described reception data storing the input of described buffer memory or store master cache reception, and exports described reception data are outputted to described buffer memory with the order in described master cache identical reception data.
Described buffer area also comprises cache controller, it is when described master cache is empty data mode, described master cache is from buffer memory input to described buffer memory transmission data for subsequent use, or when described buffer memory for subsequent use is full data mode, described buffer memory for subsequent use is from buffer memory input to described master cache transmission data, or when described master cache data mode is not empty, described reception data are from buffer memory input to described master cache transmission data.
Secondly in order to improve the performance of described system, first adopt least square method constantly to train and pre-service data, be the multiple buffer memory of system configuration, the data space of last master cache is greater than the storage space of buffer memory for subsequent use.
The above is the preferred embodiment of the present invention; should be understood that; for the middle-and-high-ranking technical user of the art; under the prerequisite not departing from principle of the present invention; some improvements and modifications can also be made; these improvements and modifications are exhibition results before the certainty in our invention, also should be considered as protection scope of the present invention.