CN106096638A - A kind of data processing method and device - Google Patents
A kind of data processing method and device Download PDFInfo
- Publication number
- CN106096638A CN106096638A CN201610394934.7A CN201610394934A CN106096638A CN 106096638 A CN106096638 A CN 106096638A CN 201610394934 A CN201610394934 A CN 201610394934A CN 106096638 A CN106096638 A CN 106096638A
- Authority
- CN
- China
- Prior art keywords
- social behaviors
- type
- data stream
- behaviors data
- observation
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/10—Pre-processing; Data cleansing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Artificial Intelligence (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the present invention provides a kind of data processing method and device, and method therein mays include: and obtains pending Social behaviors data stream;Described Social behaviors data stream is pre-processed, described Social behaviors data stream is converted to low dimensional feature space vector from data space;The limited Boltzmann machine RBM stack of described low dimensional feature space vector input to multi-layer is carried out calculating process, to complete the extraction to hidden feature in described Social behaviors data stream.Use the embodiment of the present invention can automatically extract out hidden feature abstract in Social behaviors data stream by the RBM stack of multi-layer, improve efficiency, reduce R&D costs.
Description
Technical field
The present invention relates to communication technical field, be specifically related to a kind of data processing method and device.
Background technology
It under a lot of scenes, is required for carrying out classifying or predicting to data by modeling, and modeling technique has one
Important feature, it is simply that need to extract the feature of great amount of samples data, and Mathematical Modeling is mainly responsible for classification or prediction.At model
Utilization not amiss under the premise of, the quality of the feature extracted just becomes the bottleneck of whole system performance, therefore, usual one
In individual development teams, more manpower is to put into the more preferable feature of excavation up.
Traditional feature extracting method is typically artificial and sets characteristic type selected characteristic, and this is accomplished by sturdy priori
Knowledge, and people is limited in the cognitive ability of moment, the feature obtaining is easily unilateral or cannot to build deep layer potential
Feature;Big for Social behaviors data scale, the abundant problem of dimension, existing method can not meet extraction effectively
The task of feature, therefore, engineer's sample characteristics is not an extendible approach.
Content of the invention
The embodiment of the present invention provides a kind of data processing method and device, can be by the RBM (Restricted of multi-layer
Boltzmann Machines, limited Boltzmann machine) stack automatically extracts out hidden feature abstract in Social behaviors data stream,
Improve efficiency, reduce R&D costs.
First aspect present invention provides a kind of data processing method, comprising:
Obtain pending Social behaviors data stream;
Described Social behaviors data stream is pre-processed, described Social behaviors data stream is converted to from data space low
Dimensional feature space vector;
The limited Boltzmann machine RBM stack of described low dimensional feature space vector input to multi-layer is carried out calculating process,
To complete the extraction to hidden feature in described Social behaviors data stream.
Second aspect present invention provides a kind of data processing equipment, comprising:
Acquisition module, for obtaining pending Social behaviors data stream;
Pretreatment module, for described Social behaviors data stream is pre-processed, by described Social behaviors data stream from
Data space is converted to low dimensional feature space vector;
Computing module, for the limited Boltzmann machine RBM stack by described low dimensional feature space vector input to multi-layer
Carry out calculating process, to complete the extraction to hidden feature in described Social behaviors data stream.
Implement the embodiment of the present invention, have the advantages that
The embodiment of the present invention, obtains pending Social behaviors data stream, pre-processes this Social behaviors data stream,
Social behaviors data stream is converted to low dimensional feature space vector from data space, by this low dimensional feature space vector input at most
The RBM stack of level carries out calculating process, and to complete the extraction to hidden feature in Social behaviors data stream, this mode is by advance
Process and Social behaviors stream compression is changed to the low dimensional feature space vector that RBM can identify, and automatically carried by RBM further
Take hidden feature abstract in Social behaviors data stream, improve efficiency, reduce R&D costs.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing
In having technology to describe, the accompanying drawing of required use is briefly described, it should be apparent that, the accompanying drawing in describing below is only this
Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, all right
Obtain other accompanying drawing according to these accompanying drawings.
The flow chart of a kind of data processing method that Fig. 1 provides for the embodiment of the present invention;
A kind of feature own coding installation drawing that Fig. 2 provides for the embodiment of the present invention;
A kind of pretreatment unit figure that Fig. 3 provides for the embodiment of the present invention;
A kind of output matrix form schematic diagram that Fig. 4 provides for the embodiment of the present invention;
The one single RBM structural representation that Fig. 5 provides for the embodiment of the present invention;
The structural representation of a kind of RBM stack that Fig. 6 provides for the embodiment of the present invention;
The structural representation of a kind of data processing equipment that Fig. 7 provides for the embodiment of the present invention;
The structural representation of a kind of pretreatment module that Fig. 8 provides for the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete
Describe, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments wholely.Based on
Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of not making creative work
Embodiment, broadly falls into the scope of protection of the invention.
Below in conjunction with accompanying drawing 1-accompanying drawing 6, the data processing method providing the embodiment of the present invention describes in detail.
Refer to Fig. 1, for the flow chart of a kind of data processing method that the embodiment of the present invention provides;The method can include with
Lower step S100-step S102.
S100, obtains pending Social behaviors data stream;
In the embodiment of the present invention, Social behaviors data stream is stream data, and this data stream is each user of system acquisition
The Social behaviors data carrying out in various clients, such as, splitting glass opaque that each user is carried out in network application, add emerging
Interest likes the behavioral data such as group's behavior and mutual-action behavior, and/or each user utilizes various payment to apply the payment carrying out
Behavioral data, and each user is in various shopping Shopping Behaviors data of carrying out of application etc., this is not construed as limiting by the present invention.
Various Social behaviors data constitute data stream according to behavior time of origin.
It should be noted that this Social behaviors data stream includes the Social behaviors data of multiple type, can be according to society
The function handing over behavioral data carries out Type division, and for example, Social behaviors data stream can include the Social behaviors number of interaction style
According to, the Social behaviors data of type of payment, Social behaviors data of type of play etc..
S101, pre-processes to described Social behaviors data stream, turns described Social behaviors data stream from data space
It is changed to low dimensional feature space vector;
In the embodiment of the present invention, this Social behaviors data stream is pre-processed, thus obtain this Social behaviors data stream
Low dimensional feature space vector, RBM is merely able to identify low dimensional feature space vector, it is impossible to identify initial data, it is therefore desirable to enter
This conversion of row, optionally, this low dimensional feature space vector includes but is not limited to the number of times of Social behaviors data generation, number of days etc.
Deng.
Optionally, to Social behaviors data stream from data space be converted to low dimensional feature space vector before, also include
Following steps S10;
Described Social behaviors data stream is carried out classification process, it is thus achieved that the Social behaviors data of the plurality of type by S10
Stream;
Concrete, Social behaviors data stream includes the Social behaviors data of multiple type, the Social behaviors of each type
Arranging according to time interleaving between data, forming data stream, in this Social behaviors data stream, each Social behaviors data all identify
Having the content of the application identities producing this Social behaviors data and this Social behaviors data, system is according to this Social behaviors of generation
This Social behaviors data can be carried out the division process of type by market demand mark and the content of Social behaviors data, than
As the Social behaviors data that certain network application produces are the behavior that user sends comment, then determine this Social behaviors data
Social behaviors data for interaction style.It should be noted that by belong to all Social behaviors data of some type according to
Time of origin arranges, and can form the Social behaviors data stream of the type.
Further alternative, this Social behaviors data stream is carried out pretreatment and may comprise steps of S11;
The Social behaviors data stream of each type in the Social behaviors data stream of the plurality of type is carried out pre-place by S11
Reason, it is thus achieved that normalization matrix, to realize from data space, described Social behaviors data stream is converted to low dimensional feature space vector;
Concrete, by each type in the Social behaviors data stream of multiple types in pending Social behaviors data stream
Social behaviors data stream pre-processes, it is thus achieved that normalization matrix, and this normalization matrix is i.e. low dimensional feature space vector, optional
, this normalization matrix includes the normalization statistics of each type Social behaviors data in the plurality of type Social behaviors data
Amount, this preprocessing process can include carrying out statistical disposition and normalized to the Social behaviors data stream of each type.
As it is shown on figure 3, data preprocessing module is made up of some submodules, as shown in the figure;Data preprocessing module inputs
It for the stream data of Social behaviors, is output as the normalization matrix M through conversion;Data preprocessing module comprises observation device
Module, normalized device submodule and reconstruct submodule.
Optionally, the Social behaviors data stream of each type in the Social behaviors data stream of the plurality of type is carried out pre-place
Reason may comprise steps of:
Step one, for the described Social behaviors data stream of each type, uses the first observation function, to described social row
Carry out the first inspection process for data stream, it is thus achieved that corresponding first observation of the type Social behaviors data, described first observation
Function is the statistical function of the type Social behaviors data.
Concrete, for the Social behaviors data stream of each type, use the first observation function to Social behaviors data stream
Carry out the first inspection process, it should be noted that the Social behaviors data stream of a type uses a kind of first observation function to enter
Row is processed, as it is shown on figure 3, Social behaviors data stream enters pretreatment module, after carrying out classification process, the social row of a type
It for the corresponding first observation function f of data stream, as it can be seen, the number of the first observation function exists n, is f respectively1、f2、
f3…fn.After the Social behaviors data stream of one type carries out the first inspection process, it is possible to obtain the type Social behaviors data pair
The first observation answered, be i.e. the first observation function f output for the first observation after Social behaviors data stream is processed
Value.
As it is shown on figure 3, observation device submodule F comprises one group of observation function (f1,f2,f3,…fn), F supports longitudinally to expand
Exhibition, wherein fnFor streaming data at the observation function of specific function point (particular type Social behaviors data), fnOutput valve is for seeing
The observation of measuring point, including but not limited to observations such as the number of times of particular type Social behaviors, number of days.
Optionally, the first observation function is statistical function, and the first observation is the statistic after statistical function is processed,
First observation function of each type can be different, and the Social behaviors data stream of certain type is after the first observation function is processed
The first observation obtaining is i.e. the statistic of the type Social behaviors data.
Step 2, for the first observation of the described Social behaviors of each type, uses the second observation function, to described
First observation of Social behaviors data carries out the second inspection process, it is thus achieved that corresponding second observation of the type Social behaviors data
Value, described second observation function is the normalized function of the type Social behaviors data.
Concrete, for the first observation of the Social behaviors data of each type, the second observation can be used further
Function, carries out the second inspection process to the statistic of the Social behaviors data of the type, it is thus achieved that the type Social behaviors data pair
The second observation answered.Corresponding a kind of second observation function of the Social behaviors data of one type, as it is shown on figure 3, a type
Social behaviors data stream by a kind of first observation function process after, it is thus achieved that the type Social behaviors data first observation
Value, then the first observation of the type Social behaviors data is inputted a kind of second observation function, obtain the type Social behaviors
Second observation of data, after being i.e. the Social behaviors data stream input processing of a type, obtains a kind of second observation.
Optionally, this second observation function can be the normalized function of the type Social behaviors data, the second observation
Normalization statistic for the type Social behaviors data.
As it is shown on figure 3, normalized device submodule F' comprises one group of observation function (f1',f2',f3',…fn'), F' supports
Longitudinal Extension, wherein fn' it is specific observation station normalized function, receive the first observation of observation device F output, export normalizing
The second observation after change;f'nWith fnTransformational relation as follows:
Second observation reconstruct of each type Social behaviors data in the plurality of type is formed normalization by step 3
Matrix.
Concrete, the second observation of each type Social behaviors data in the plurality of type is formed normalization matrix,
As it is shown on figure 3, all types of Social behaviors data are after the first observation function and the second observation function are processed, input weight
Structure submodule is normalized the formation of matrix, finally exports normalization matrix M.
Further alternative, described the second observation reconstruct by each type Social behaviors data in the plurality of type
Form normalization matrix, comprising:
With the number of types of the plurality of type as matrix column, by each type Social behaviors number in the plurality of type
According to second observation reconstruct formed normalization matrix.
Concrete, the second observation reconstruct of the Social behaviors data of each type in multiple types is being formed normalization
Matrix, generation type can have multiple, below with two kinds of optional embodiments as an example:
As the optional embodiment of one, with the number of types of the plurality of type as matrix column, with described difference
The amount of cycles of collection period is the row of matrix, by the second observation of each type Social behaviors data in the plurality of type
Reconstruct forms normalization matrix.
Or, as the optional embodiment of another kind, with the number of types of the plurality of type as matrix column, formed
The normalization matrix of the amount of cycles number in described different acquisition cycle, each class of multiple types described in a kind of collection period
Second observation reconstruct of type Social behaviors data forms a normalization matrix.
Concrete, as it is shown on figure 3, n second after reconstruct submodule receives normalized device F' output normalization is seen
These second observations are melted into a n-dimensional vector according to the numbering of observation function, sequence, are converted to this n-dimensional vector by measured value
Normalization matrix.
Reconstructed module when generating normalization matrix generally in the following ways, if Social behaviors data stream comprising multiple
The different acquisition cycle gather Social behaviors data, by Social behaviors data stream according to some cycles (the such as second, point, when, day,
Week, the moon etc.) collect many groups different time dimension normalization statistic serializing vector;Fig. 4 gives a kind of output
The form of normalization matrix, row represents different acquisition periods dimension, and different types is shown in list, is i.e. the row of this normalization matrix
The quantity that number is the different acquisition cycle, the n-dimensional vector that the such as first behavior is formed with week for the data that collection period gathers, second
The n-dimensional vector that behavior is formed with the moon for the data that collection period gathers, the third line is the data shape gathering for collection period with week
The n-dimensional vector becoming, each provisional capital has n to arrange, and each row represent a type.
Optionally, it is also possible to by the data reconstruction in different acquisition cycle to multiple different matrixes, as with sky for gathering
In the sequence vector in cycle to normalization matrix, the sequence vectorization with the moon as collection period to another one normalizes
In matrix.
The limited Boltzmann machine RBM stack of described low dimensional feature space vector input to multi-layer is calculated by S102
Process, to complete the extraction to hidden feature in described Social behaviors data stream.
Optionally, the described limited Boltzmann machine RBM by described low dimensional feature space vector input to multi-layer is carried out
Calculating process, comprising:
The every time row element to the described normalization matrix of RBM stack input of described multi-layer, corresponding one of element
Input;
Successively calculating process is carried out to described normalization matrix by the plurality of RBM of series connection, to extract described social activity
Hidden feature in behavioral data stream.
In the embodiment of the present invention, hidden feature can include rule implicit in Social behaviors data stream, and this rule can
The essence of reflection Social behaviors data, can be advantageously in the accuracy of data modeling by extracting this hidden feature.
In the embodiment of the present invention, the RBM stack of described multi-layer is in series by multiple RBM.As shown in Figure 6, RBM stack is by many
Individual (RBM1, RBM2, RBM3, RBM4...RBMn) is in series.
Initial data is transformed into the low dimensional feature space that RBM can identify by data preprocessing module;The RBM of this multi-layer
Stack receives the low dimensional feature space vector of data preprocessing module output, and Fig. 5 illustrates the structure of single RBM.RBM is a class tool
Having double-layer structure, symmetrical connection and the stochastic neural net model without self feed back, interlayer connects entirely, connectionless in layer.Every time to
One row element of the RBM stack input normalization matrix of described multi-layer, wherein one of row element element correspondence one is defeated
Enter end, i as shown in Figure 51~in;Successively each row element of described normalization matrix is entered by the plurality of RBM of series connection
Row calculating process, to extract the hidden feature in described Social behaviors data stream, is i.e. that each row element of normalization matrix is equal
Successively can process through the RBM stack of this multi-layer.
The embodiment of the present invention is by the RBM of the characteristic variable expression application multiple-level stack based on degree of depth learning hierarchy assembly
Social behaviors data are converted into the data sequence that degree of depth learning network can identify by stack, and (above-mentioned is pre-to utilize feature coding device
The RBM stack of processing module and multi-layer) by Social behaviors data from data space map to feature space, automatically discovery hide
Pattern in data and rule, extract abstract hidden feature, is automatically performed the task of feature representation.
Degree of depth study is the new field of one of machine learning research, and its motivation is to set up, simulation human brain is carried out point
Analysing the neutral net of study, the mechanism that it imitates human brain explains data, such as image, sound and text.Degree of depth study is nothing
The one of supervised learning.The concept of degree of depth study comes from the research of artificial neural network.Multilayer perceptron containing many hidden layers is exactly
A kind of degree of depth study structure.
Degree of depth study forms more abstract high-rise expression attribute classification or feature by combination low-level feature, to find number
According to distributed nature represent.The valuable value contained in big data becomes the driving force that people process big data, utilizes big number
The descriptive power that more data dimension strengthens weak related data can be collected according to technology.Degree of depth study is had very by building
The machine learning model of many hidden layers and the training data of magnanimity, learn more useful feature, thus finally promote classification or pre-
The accuracy surveyed.I.e. by the means of " depth model ", realize that " feature learning " is purpose.
RBM is the infrastructure component building degree of depth learning model;Network based on RBM composition uses the side of unsupervised learning
Method, the matching input data of maximum possible.By successively eigentransformation, the character representation in former space for the sample is transformed to one
New feature space, so that classification or prediction are more prone to.Compared with the method for manual construction feature, utilize degree of depth learning hierarchy
Assembly learns big data characteristics, more can portray the abundant internal information of data.
The embodiment of the present invention is based on each layer RBM of fast learning algorithm training to sdpecific dispersion;Its training process is as follows: first
Step trains up first RBM, fixes weight and the side-play amount of first RBM;Second step uses the state of its recessive neuron,
Input vector as second RBM;After 3rd step trains up second RBM, second RBM is stacked on first RBM
Top;As shown in Figure 6, the RMB stack output of several RBM stacking is obtained hidden feature through unsupervised learning.
The embodiment of the present invention, obtains pending Social behaviors data stream, pre-processes this Social behaviors data stream,
Social behaviors data stream is converted to low dimensional feature space vector from data space, by this low dimensional feature space vector input at most
The RBM stack of level carries out calculating process, and to complete the extraction to hidden feature in Social behaviors data stream, this mode is by advance
Process and Social behaviors stream compression is changed to the low dimensional feature space vector that RBM can identify, and automatically carried by RBM further
Take hidden feature abstract in Social behaviors data stream, improve efficiency, reduce R&D costs.
Realize the feature own coding device of technical solution of the present invention as in figure 2 it is shown, feature own coding device mainly to comprise data pre-
The RBM stack composition of processor and multi-layer;Preprocessor mainly comprises pretreatment module, and pretreatment module is for Social behaviors
Data stream pre-processes, and concrete structure can use the structure shown in Fig. 3.RBM stack is made up of the RBM of some stackings;Mainly
The data using degree of depth learning algorithm to export preprocessor calculate, and obtain the hidden feature of Social behaviors data stream, should
Hidden feature uses for data modeling.
Below in conjunction with accompanying drawing 7-accompanying drawing 8, a kind of data processing equipment providing the embodiment of the present invention is situated between in detail
Continue.
Refer to Fig. 7, the structural representation of a kind of data processing equipment providing for the embodiment of the present invention, as it can be seen,
This data processing equipment includes acquisition module the 100th, pretreatment module 101 and computing module 102.
Acquisition module 100, for obtaining pending Social behaviors data stream.
In the embodiment of the present invention, Social behaviors data stream is stream data, and this data stream is each user of system acquisition
The Social behaviors data carrying out in various clients, such as, splitting glass opaque that each user is carried out in network application, add emerging
Interest likes the behavioral data such as group's behavior and mutual-action behavior, and/or each user utilizes various payment to apply the payment carrying out
Behavioral data, and each user is in various shopping Shopping Behaviors data of carrying out of application etc., this is not construed as limiting by the present invention.
Various Social behaviors data constitute data stream according to behavior time of origin.
It should be noted that this Social behaviors data stream includes the Social behaviors data of multiple type, can be according to society
The function handing over behavioral data carries out Type division, and for example, Social behaviors data stream can include the Social behaviors number of interaction style
According to, the Social behaviors data of type of payment, Social behaviors data of type of play etc..
Pretreatment module 101, for pre-processing to described Social behaviors data stream, by described Social behaviors data stream
Be converted to low dimensional feature space vector from data space.
In the embodiment of the present invention, this Social behaviors data stream is pre-processed, thus obtain this Social behaviors data stream
Low dimensional feature space vector, RBM is merely able to identify low dimensional feature space vector, it is impossible to identify initial data, it is therefore desirable to enter
This conversion of row, optionally, this low dimensional feature space vector includes but is not limited to the number of times of Social behaviors data generation, number of days etc.
Deng.
Optionally, the data processing equipment of the embodiment of the present invention also includes sort module 103.
Sort module 103, for carrying out classification process by described Social behaviors data stream, it is thus achieved that the social row of multiple types
For data stream.
Concrete, Social behaviors data stream includes the Social behaviors data of multiple type, the Social behaviors of each type
Arranging according to time interleaving between data, forming data stream, in this Social behaviors data stream, each Social behaviors data all identify
Having the content of the application identities producing this Social behaviors data and this Social behaviors data, system is according to this Social behaviors of generation
This Social behaviors data can be carried out the division process of type by market demand mark and the content of Social behaviors data, than
As the Social behaviors data that certain network produces are the behavior that user sends comment, then be defined as this Social behaviors data
The Social behaviors data of interaction style.It should be noted that all Social behaviors data of some type will be belonged to according to sending out
Raw Time alignment, can form the Social behaviors data stream of the type.
Described pretreatment module 101 is specifically for the society of each type in the Social behaviors data stream by the plurality of type
Behavioral data stream is handed over to pre-process, it is thus achieved that normalization matrix, to realize turning described Social behaviors data stream from data space
It is changed to low dimensional feature space vector.
Concrete, by each type in the Social behaviors data stream of multiple types in pending Social behaviors data stream
Social behaviors data stream pre-processes, it is thus achieved that normalization matrix, and optionally, this normalization matrix includes the plurality of type society
Handing over the normalization statistic of each type Social behaviors data in behavioral data, this preprocessing process can include to each type
Social behaviors data stream carry out statistical disposition and normalized.
As it is shown on figure 3, data preprocessing module is made up of some submodules, as shown in the figure;Data prediction device inputs
It for the stream data of Social behaviors, is output as the normalization matrix through conversion;Data preprocessing module comprises observation device
Module, normalized device submodule and reconstruct submodule.
Further alternative, as shown in Figure 8, pretreatment module 101 can include that the 1010th, observation device submodule normalizes
Device submodule 1011 and reconstruct submodule 1012.
Observation device submodule 1010, for the described Social behaviors data stream for each type, uses the first observation
Function, carries out the first inspection process to described Social behaviors data stream, it is thus achieved that the type Social behaviors data corresponding first are seen
Measured value, described first observation function is the statistical function of the type Social behaviors data.
Concrete, for the Social behaviors data stream of each type, use the first observation function to Social behaviors data stream
Carry out the first inspection process, it should be noted that the Social behaviors data stream of a type uses a kind of first observation function to enter
Row is processed, as it is shown on figure 3, Social behaviors data stream enters pretreatment module, after carrying out classification process, the social row of a type
It for the corresponding first observation function f of data stream, as it can be seen, the number of the first observation function exists n, is f respectively1、f2、
f3…fn.After the Social behaviors data stream of one type carries out the first inspection process, it is possible to obtain the type Social behaviors data pair
The first observation answered, be i.e. the first observation function f output for the first observation after Social behaviors data stream is processed
Value.
As it is shown on figure 3, observation device submodule F comprises one group of observation function (f1,f2,f3,…fn), F supports longitudinally to expand
Exhibition, wherein fnFor streaming data at the observation function of specific function point (particular type Social behaviors data), fnOutput valve is for seeing
The observation of measuring point, including but not limited to observations such as the number of times of particular type Social behaviors, number of days.
Optionally, the first observation function is statistical function, and the first observation is the statistic after statistical function is processed,
First observation function of each type can be different, and the Social behaviors data stream of certain type is after the first observation function is processed
The first observation obtaining is i.e. the statistic of the type Social behaviors data.
Second process subelement 10111, for the first observation of the described Social behaviors data for each type, adopts
Use the second observation function, the second inspection process is carried out to the first observation of described Social behaviors data, it is thus achieved that the type is social
Corresponding second observation of behavioral data, described second observation function is the normalized function of the type Social behaviors data;
Concrete, for the first observation of the Social behaviors data of each type, the second observation can be used further
Function, carries out the second inspection process to the statistic of the Social behaviors data of the type, it is thus achieved that the type Social behaviors data pair
The second observation answered.Corresponding a kind of second observation function of the Social behaviors data of one type, as it is shown on figure 3, a type
Social behaviors data stream by a kind of first observation function process after, it is thus achieved that the type Social behaviors data first observation
Value, then the first observation of the type Social behaviors data is inputted a kind of second observation function, obtain the type Social behaviors
Second observation of data, after being i.e. the Social behaviors data stream input processing of a type, obtains a kind of second observation.
Optionally, this second observation function can be the normalized function of the type Social behaviors data, the second observation
Normalization statistic for the type Social behaviors data.
As it is shown on figure 3, normalized device submodule F' comprises one group of observation function (f1',f2',f3',…fn'), F' supports
Longitudinal Extension, wherein fn' it is specific observation station normalized function, receive the first observation of observation device F output, export normalizing
The second observation after change;fn' and fnTransformational relation as follows:
Reconstruct submodule 1012, for by the second observation weight of each type Social behaviors data in the plurality of type
It is configured to normalization matrix.
Concrete, the second observation reconstruct of each type Social behaviors data in the plurality of type is formed normalized moments
Battle array, as it is shown on figure 3, all types of Social behaviors data are after the first observation function and the second observation function are processed, defeated
Enter to reconstruct submodule and be normalized the formation of matrix, finally export normalization matrix M.
Described reconstruct submodule 1012 is specifically for the number of types of the plurality of type as matrix column, by described many
Second observation reconstruct of each type Social behaviors data in individual type forms normalization matrix.
With the number of types of the plurality of type as matrix column, by each type Social behaviors number in the plurality of type
According to second observation reconstruct formed normalization matrix.
Concrete, the second observation reconstruct of the Social behaviors data of each type in multiple types is being formed normalization
Matrix, generation type can have multiple, below with two kinds of optional embodiments as an example:
As the optional embodiment of one, with the number of types of the plurality of type as matrix column, with described difference
The amount of cycles of collection period is the row of matrix, by the second observation of each type Social behaviors data in the plurality of type
Reconstruct forms normalization matrix;Or,
As the optional embodiment of another kind, with the number of types of the plurality of type as matrix column, formed described
The normalization matrix of the amount of cycles number in different acquisition cycle, each type society of multiple types described in a kind of collection period
The the second observation reconstruct handing over behavioral data forms a normalization matrix.
Concrete, as it is shown on figure 3, n second after reconstruct submodule receives normalized device F' output normalization is seen
These second observations are melted into a n-dimensional vector according to the numbering of observation function, sequence, are converted to this n-dimensional vector by measured value
Normalization matrix.
Reconstructed module when generating normalization matrix generally in the following ways, if Social behaviors data stream comprising multiple
The different acquisition cycle gather Social behaviors data, by Social behaviors data stream according to some cycles (the such as second, point, when, day,
Week, the moon etc.) collect many groups different time dimension normalization statistic serializing vector;Fig. 4 gives a kind of output
The form of normalization matrix, row represents different acquisition periods dimension, and different types is shown in list, is i.e. the row of this normalization matrix
The quantity that number is the different acquisition cycle, the n-dimensional vector that the such as first behavior is formed with week for the data that collection period gathers, second
The n-dimensional vector that behavior is formed with the moon for the data that collection period gathers, the third line is the data shape gathering for collection period with week
The n-dimensional vector becoming, each provisional capital has n to arrange, and each row represent a type.
Optionally, it is also possible to by the data reconstruction in different acquisition cycle to multiple different matrixes, as with sky for gathering
In the sequence vector in cycle to normalization matrix, the sequence vectorization with the moon as collection period to another one normalizes
In matrix.
Computing module 102 is specifically for the limited Boltzmann machine of described low dimensional feature space vector input to multi-layer
RBM carries out calculating process, to complete the extraction to hidden feature in described Social behaviors data stream.
Optionally, the described limited Boltzmann machine RBM stack by described low dimensional feature space vector input to multi-layer enters
Row calculating process, comprising:
The every time row element to the described normalization matrix of RBM stack input of described multi-layer, corresponding one of element
Input;
Successively calculating process is carried out to described normalization matrix by the plurality of RBM of series connection, to extract described social activity
Hidden feature in behavioral data stream.
In the embodiment of the present invention, hidden feature can include rule implicit in Social behaviors data stream, and this rule can
The essence of reflection Social behaviors data, can be advantageously in the accuracy of data modeling by extracting this hidden feature.
In the embodiment of the present invention, limited Boltzmann machine (the Restricted Boltzmann of described multi-layer
Machines, RBM) stack is in series by multiple RBM.As shown in Figure 6, RBM stack by multiple (RBM1, RBM2, RBM3,
RBM4...RBMn) in series.
Initial data is transformed into the low dimensional feature space that RBM can identify by data preprocessing module;The RBM of this multi-layer
Stack receives the low dimensional feature space vector of data preprocessing module output, and Fig. 5 illustrates the structure of single RBM.RBM is a class tool
Having double-layer structure, symmetrical connection and the stochastic neural net model without self feed back, interlayer connects entirely, connectionless in layer.Every time to
One row element of the RBM stack input normalization matrix of described multi-layer, wherein one of row element element correspondence one is defeated
Enter end, i as shown in Figure 51~in;Successively each row element of described normalization matrix is entered by the plurality of RBM of series connection
Row calculating process, to extract the hidden feature in described Social behaviors data stream, is i.e. that each row element of normalization matrix is equal
Successively can process through this multi-layer RBM stack.
The embodiment of the present invention is by the characteristic variable expression application multiple-level stack based on degree of depth learning hierarchy assembly
Social behaviors data are converted into the data sequence that degree of depth learning network can identify by RBM, utilize feature coding device (above-mentioned
The RBM stack of pretreatment module and multi-layer) by Social behaviors data from data space map to feature space, automatically find hidden
Ensconce the pattern in data and rule, extract abstract hidden feature, be automatically performed the task of feature representation.
Degree of depth study is the new field of one of machine learning research, and its motivation is to set up, simulation human brain is carried out point
Analysing the neutral net of study, the mechanism that it imitates human brain explains data, such as image, sound and text.Degree of depth study is nothing
The one of supervised learning.The concept of degree of depth study comes from the research of artificial neural network.Multilayer perceptron containing many hidden layers is exactly
A kind of degree of depth study structure.
Degree of depth study forms more abstract high-rise expression attribute classification or feature by combination low-level feature, to find number
According to distributed nature represent.The valuable value contained in big data becomes the driving force that people process big data, utilizes big number
The descriptive power that more data dimension strengthens weak related data can be collected according to technology.Degree of depth study is had very by building
The machine learning model of many hidden layers and the training data of magnanimity, learn more useful feature, thus finally promote classification or pre-
The accuracy surveyed.I.e. by the means of " depth model ", realize that " feature learning " is purpose.
RBM is the infrastructure component building degree of depth learning model;Network based on RBM composition uses the side of unsupervised learning
Method, the matching input data of maximum possible.By successively eigentransformation, the character representation in former space for the sample is transformed to one
New feature space, so that classification or prediction are more prone to.Compared with the method for manual construction feature, utilize degree of depth learning hierarchy
Assembly learns big data characteristics, more can portray the abundant internal information of data.
The embodiment of the present invention is based on each layer RBM of fast learning algorithm training to sdpecific dispersion;Its training process is as follows: first
Step trains up first RBM, fixes weight and the side-play amount of first RBM;Second step uses the state of its recessive neuron,
Input vector as second RBM;After 3rd step trains up second RBM, second RBM is stacked on first RBM
Top;As shown in Figure 6, the RMB stack output of several RBM stacking is obtained hidden feature through unsupervised learning.
The embodiment of the present invention, obtains pending Social behaviors data stream, pre-processes this Social behaviors data stream,
Social behaviors data stream is converted to low dimensional feature space vector from data space, by this low dimensional feature space vector input at most
The RBM stack of level carries out calculating process, and to complete the extraction to hidden feature in Social behaviors data stream, this mode is by advance
Process and Social behaviors stream compression is changed to the low dimensional feature space vector that RBM can identify, and automatically carried by RBM further
Take hidden feature abstract in Social behaviors data stream, improve efficiency, reduce R&D costs.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, be permissible
Instructing related hardware by computer program to complete, described program can be stored in a computer read/write memory medium
In, shown in accompanying drawing 7-accompanying drawing 8, the corresponding program of data processing equipment is storable in the readable storage medium storing program for executing of equipment, and is set by this
At least one processor in Bei performs, and to realize above-mentioned data processing method, the method includes in Fig. 1 described in embodiment of the method
Flow process.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM)
Or random store-memory body (Random Access Memory, RAM) etc..
The above disclosed present pre-ferred embodiments that is only, can not limit the right model of the present invention with this certainly
Enclose, the equivalent variations therefore made according to the claims in the present invention, still belong to the scope that the present invention is covered.
Claims (12)
1. a data processing method, it is characterised in that include:
Obtain pending Social behaviors data stream;
Described Social behaviors data stream is pre-processed, described Social behaviors data stream is converted to low-dimensional from data space special
Levy space vector;
The vector input of described low dimensional feature space is carried out calculating process to the limited Boltzmann machine RBM stack of multi-layer, with complete
The extraction of hidden feature in paired described Social behaviors data stream.
2. the method for claim 1, it is characterised in that described pre-process described Social behaviors data stream, will
Described Social behaviors data stream also included before data space is converted to low dimensional feature space vector:
Described Social behaviors data stream is carried out classification process, it is thus achieved that the Social behaviors data stream of multiple types;
Described described Social behaviors data stream is pre-processed, described Social behaviors data stream is converted to from data space low
Dimensional feature space vector, comprising:
The Social behaviors data stream of each type in the Social behaviors data stream of the plurality of type is pre-processed, it is thus achieved that return
One change matrix, to realize from data space, described Social behaviors data stream is converted to low dimensional feature space vector.
3. method as claimed in claim 2, it is characterised in that described by every in the Social behaviors data stream of the plurality of type
The Social behaviors data stream of individual type pre-processes, it is thus achieved that normalization matrix, comprising:
For the described Social behaviors data stream of each type, use the first observation function, described Social behaviors data are flow to
Row the first inspection process, it is thus achieved that corresponding first observation of the type Social behaviors data, described first observation function is such
The statistical function of type Social behaviors data;
For described first observation of the described Social behaviors data of each type, use the second observation function, to described society
Described first observation handing over behavioral data carries out the second inspection process, it is thus achieved that the type Social behaviors data corresponding second are seen
Measured value, described second observation function is the normalized function of the type Social behaviors data;
Second observation of each type Social behaviors data in the plurality of type is reconstructed formation normalization matrix.
4. method as claimed in claim 3, it is characterised in that described by each type Social behaviors number in the plurality of type
According to the second observation be reconstructed formation normalization matrix, comprising:
With the number of types of the plurality of type as matrix column, by each type Social behaviors data in the plurality of type
Second observation reconstruct forms normalization matrix.
5. method as claimed in claim 4, it is characterised in that if comprising multiple different acquisition in described Social behaviors data stream
The Social behaviors data that cycle gathers;
The described number of types with the plurality of type is as matrix column, by each type Social behaviors number in the plurality of type
According to second observation reconstruct formed normalization matrix, comprising:
With the number of types of the plurality of type as matrix column, with the amount of cycles in described different acquisition cycle as matrix
OK, the second observation reconstruct of each type Social behaviors data in the plurality of type is formed normalization matrix;Or,
With the number of types of the plurality of type as matrix column, form the returning of amount of cycles number in described different acquisition cycle
One change matrix, the second observation reconstruct of each type Social behaviors data of multiple types described in a kind of collection period is formed
One normalization matrix.
6. the method as described in claim 4 or 5, it is characterised in that the RBM stack of described multi-layer includes multiple RBM series connection structure
Become;
The described limited Boltzmann machine RBM stack by described low dimensional feature space vector input to multi-layer carries out calculating process,
Including:
A row element to the described normalization matrix of RBM stack input of described multi-layer every time, element one input of correspondence
End;
Successively calculating process is carried out to described normalization matrix by the plurality of RBM of series connection, to extract described Social behaviors
Hidden feature in data stream.
7. a data processing equipment, it is characterised in that include:
Acquisition module, for obtaining pending Social behaviors data stream;
Pretreatment module, for pre-processing to described Social behaviors data stream, by described Social behaviors data stream from data
Space is converted to low dimensional feature space vector;
Computing module, for carrying out the limited Boltzmann machine RBM stack of described low dimensional feature space vector input to multi-layer
Calculating process, to complete the extraction to hidden feature in described Social behaviors data stream.
8. device as claimed in claim 7, it is characterised in that described device also includes:
Sort module, for carrying out classification process by described Social behaviors data stream, it is thus achieved that the Social behaviors data of multiple types
Stream;
Described pretreatment module is specifically for the Social behaviors of each type in the Social behaviors data stream by the plurality of type
Data stream pre-processes, it is thus achieved that normalization matrix, to realize being converted to low from data space by described Social behaviors data stream
Dimensional feature space vector.
9. device as claimed in claim 8, it is characterised in that described pretreatment module includes:
Observation device submodule, for the described Social behaviors data stream for each type, uses the first observation function, to institute
State Social behaviors data stream and carry out the first inspection process, it is thus achieved that corresponding first observation of the type Social behaviors data, described
First observation function is the statistical function of the type Social behaviors data;
Normalized device submodule, for described first observation of the described Social behaviors data for each type, uses
Second observation function, carries out the second inspection process to described first observation of described Social behaviors data, it is thus achieved that the type society
Handing over corresponding second observation of behavioral data, described second observation function is the normalized function of the type Social behaviors data;
Reconstruct submodule, for returning the second observation reconstruct formation of each type Social behaviors data in the plurality of type
One change matrix.
10. device as claimed in claim 9, it is characterised in that described reconstruct submodule is specifically for the plurality of type
Number of types be matrix column, by the plurality of type each type Social behaviors data second observation reconstruct formed
Normalization matrix.
11. devices as claimed in claim 10, it is characterised in that adopt if comprising multiple difference in described Social behaviors data stream
The Social behaviors data that the collection cycle gathers;
Described reconstruct submodule is specifically for the number of types of the plurality of type as matrix column, with described different acquisition week
The amount of cycles of phase is the row of matrix, by the second observation reconstruct shape of each type Social behaviors data in the plurality of type
Become normalization matrix;Or,
With the number of types of the plurality of type as matrix column, form the returning of amount of cycles number in described different acquisition cycle
One change matrix, the second observation reconstruct of each type Social behaviors data of multiple types described in a kind of collection period is formed
One normalization matrix.
12. devices as described in claim 10 or 11, it is characterised in that the RBM stack of described multi-layer includes that multiple RBM connects
Constitute;
Described computing module specifically for every time to a row element of the described normalization matrix of RBM stack input of described multi-layer,
One element one input of correspondence;
Successively calculating process is carried out to described normalization matrix by the plurality of RBM of series connection, to extract described Social behaviors
Hidden feature in data stream.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610394934.7A CN106096638B (en) | 2016-06-03 | 2016-06-03 | A kind of data processing method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610394934.7A CN106096638B (en) | 2016-06-03 | 2016-06-03 | A kind of data processing method and device |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106096638A true CN106096638A (en) | 2016-11-09 |
CN106096638B CN106096638B (en) | 2018-08-07 |
Family
ID=57448313
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610394934.7A Active CN106096638B (en) | 2016-06-03 | 2016-06-03 | A kind of data processing method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106096638B (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510071A (en) * | 2017-05-10 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Feature extracting method, device and the computer readable storage medium of data |
CN111414384A (en) * | 2020-02-26 | 2020-07-14 | 有米科技股份有限公司 | Mass streaming data processing method and device |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345656A (en) * | 2013-07-17 | 2013-10-09 | 中国科学院自动化研究所 | Method and device for data identification based on multitask deep neural network |
CN103440352A (en) * | 2013-09-24 | 2013-12-11 | 中国科学院自动化研究所 | Method and device for analyzing correlation among objects based on deep learning |
CN105045857A (en) * | 2015-07-09 | 2015-11-11 | 中国科学院计算技术研究所 | Social network rumor recognition method and system |
-
2016
- 2016-06-03 CN CN201610394934.7A patent/CN106096638B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345656A (en) * | 2013-07-17 | 2013-10-09 | 中国科学院自动化研究所 | Method and device for data identification based on multitask deep neural network |
CN103440352A (en) * | 2013-09-24 | 2013-12-11 | 中国科学院自动化研究所 | Method and device for analyzing correlation among objects based on deep learning |
CN105045857A (en) * | 2015-07-09 | 2015-11-11 | 中国科学院计算技术研究所 | Social network rumor recognition method and system |
Non-Patent Citations (3)
Title |
---|
HUGO LAROCHELLE等: "Learning Algorithms for the Classification Restricted Boltzmann Machine", 《JOURNAL OF MACHINE LEARNING RESEARCH》 * |
吴证等: "结合主元成分分析的受限玻耳兹曼机神经网络的降维方法", 《上海交通大学学报》 * |
张春霞等: "受限波尔兹曼机", 《工程数学学报》 * |
Cited By (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108510071A (en) * | 2017-05-10 | 2018-09-07 | 腾讯科技(深圳)有限公司 | Feature extracting method, device and the computer readable storage medium of data |
CN108510071B (en) * | 2017-05-10 | 2020-01-10 | 腾讯科技(深圳)有限公司 | Data feature extraction method and device and computer readable storage medium |
CN111414384A (en) * | 2020-02-26 | 2020-07-14 | 有米科技股份有限公司 | Mass streaming data processing method and device |
Also Published As
Publication number | Publication date |
---|---|
CN106096638B (en) | 2018-08-07 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106156003B (en) | A kind of question sentence understanding method in question answering system | |
CN110032635B (en) | Problem pair matching method and device based on depth feature fusion neural network | |
CN107516110A (en) | A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding | |
CN102929942B (en) | The overlapping community discovery method of a kind of community network based on integrated study | |
CN106934352A (en) | A kind of video presentation method based on two-way fractal net work and LSTM | |
CN108596039A (en) | A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks | |
CN110532436A (en) | Across social network user personal identification method based on community structure | |
CN106202489A (en) | A kind of agricultural pest intelligent diagnosis system based on big data | |
CN106991374A (en) | Handwritten Digit Recognition method based on convolutional neural networks and random forest | |
CN105931116A (en) | Automated credit scoring system and method based on depth learning mechanism | |
CN106295799A (en) | A kind of implementation method of degree of depth study multilayer neural network | |
CN109272332B (en) | Client loss prediction method based on recurrent neural network | |
CN105183841A (en) | Recommendation method in combination with frequent item set and deep learning under big data environment | |
CN107240136A (en) | A kind of Still Image Compression Methods based on deep learning model | |
CN110807122A (en) | Image-text cross-modal feature disentanglement method based on depth mutual information constraint | |
CN110377689A (en) | Paper intelligent generation method, device, computer equipment and storage medium | |
CN106355210B (en) | Insulator Infrared Image feature representation method based on depth neuron response modes | |
CN106959946A (en) | A kind of text semantic feature generation optimization method based on deep learning | |
CN104036242B (en) | The object identification method of Boltzmann machine is limited based on Centering Trick convolution | |
CN104156464A (en) | Micro-video retrieval method and device based on micro-video feature database | |
CN109960732A (en) | A kind of discrete Hash cross-module state search method of depth and system based on robust supervision | |
CN106096638A (en) | A kind of data processing method and device | |
CN108595527A (en) | A kind of personalized recommendation method and system of the multi-source heterogeneous information of fusion | |
Li et al. | Regional network education information collection platform for smart classrooms based on big data technology | |
CN106126578B (en) | A kind of web service recommendation method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |