CN106096638A - A kind of data processing method and device - Google Patents

A kind of data processing method and device Download PDF

Info

Publication number
CN106096638A
CN106096638A CN201610394934.7A CN201610394934A CN106096638A CN 106096638 A CN106096638 A CN 106096638A CN 201610394934 A CN201610394934 A CN 201610394934A CN 106096638 A CN106096638 A CN 106096638A
Authority
CN
China
Prior art keywords
social behaviors
type
data stream
behaviors data
observation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201610394934.7A
Other languages
Chinese (zh)
Other versions
CN106096638B (en
Inventor
段培
陈谦
刘志斌
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tencent Technology Shenzhen Co Ltd
Original Assignee
Tencent Technology Shenzhen Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tencent Technology Shenzhen Co Ltd filed Critical Tencent Technology Shenzhen Co Ltd
Priority to CN201610394934.7A priority Critical patent/CN106096638B/en
Publication of CN106096638A publication Critical patent/CN106096638A/en
Application granted granted Critical
Publication of CN106096638B publication Critical patent/CN106096638B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/10Pre-processing; Data cleansing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the present invention provides a kind of data processing method and device, and method therein mays include: and obtains pending Social behaviors data stream;Described Social behaviors data stream is pre-processed, described Social behaviors data stream is converted to low dimensional feature space vector from data space;The limited Boltzmann machine RBM stack of described low dimensional feature space vector input to multi-layer is carried out calculating process, to complete the extraction to hidden feature in described Social behaviors data stream.Use the embodiment of the present invention can automatically extract out hidden feature abstract in Social behaviors data stream by the RBM stack of multi-layer, improve efficiency, reduce R&D costs.

Description

A kind of data processing method and device
Technical field
The present invention relates to communication technical field, be specifically related to a kind of data processing method and device.
Background technology
It under a lot of scenes, is required for carrying out classifying or predicting to data by modeling, and modeling technique has one Important feature, it is simply that need to extract the feature of great amount of samples data, and Mathematical Modeling is mainly responsible for classification or prediction.At model Utilization not amiss under the premise of, the quality of the feature extracted just becomes the bottleneck of whole system performance, therefore, usual one In individual development teams, more manpower is to put into the more preferable feature of excavation up.
Traditional feature extracting method is typically artificial and sets characteristic type selected characteristic, and this is accomplished by sturdy priori Knowledge, and people is limited in the cognitive ability of moment, the feature obtaining is easily unilateral or cannot to build deep layer potential Feature;Big for Social behaviors data scale, the abundant problem of dimension, existing method can not meet extraction effectively The task of feature, therefore, engineer's sample characteristics is not an extendible approach.
Content of the invention
The embodiment of the present invention provides a kind of data processing method and device, can be by the RBM (Restricted of multi-layer Boltzmann Machines, limited Boltzmann machine) stack automatically extracts out hidden feature abstract in Social behaviors data stream, Improve efficiency, reduce R&D costs.
First aspect present invention provides a kind of data processing method, comprising:
Obtain pending Social behaviors data stream;
Described Social behaviors data stream is pre-processed, described Social behaviors data stream is converted to from data space low Dimensional feature space vector;
The limited Boltzmann machine RBM stack of described low dimensional feature space vector input to multi-layer is carried out calculating process, To complete the extraction to hidden feature in described Social behaviors data stream.
Second aspect present invention provides a kind of data processing equipment, comprising:
Acquisition module, for obtaining pending Social behaviors data stream;
Pretreatment module, for described Social behaviors data stream is pre-processed, by described Social behaviors data stream from Data space is converted to low dimensional feature space vector;
Computing module, for the limited Boltzmann machine RBM stack by described low dimensional feature space vector input to multi-layer Carry out calculating process, to complete the extraction to hidden feature in described Social behaviors data stream.
Implement the embodiment of the present invention, have the advantages that
The embodiment of the present invention, obtains pending Social behaviors data stream, pre-processes this Social behaviors data stream, Social behaviors data stream is converted to low dimensional feature space vector from data space, by this low dimensional feature space vector input at most The RBM stack of level carries out calculating process, and to complete the extraction to hidden feature in Social behaviors data stream, this mode is by advance Process and Social behaviors stream compression is changed to the low dimensional feature space vector that RBM can identify, and automatically carried by RBM further Take hidden feature abstract in Social behaviors data stream, improve efficiency, reduce R&D costs.
Brief description
In order to be illustrated more clearly that the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing In having technology to describe, the accompanying drawing of required use is briefly described, it should be apparent that, the accompanying drawing in describing below is only this Some embodiments of invention, for those of ordinary skill in the art, on the premise of not paying creative work, all right Obtain other accompanying drawing according to these accompanying drawings.
The flow chart of a kind of data processing method that Fig. 1 provides for the embodiment of the present invention;
A kind of feature own coding installation drawing that Fig. 2 provides for the embodiment of the present invention;
A kind of pretreatment unit figure that Fig. 3 provides for the embodiment of the present invention;
A kind of output matrix form schematic diagram that Fig. 4 provides for the embodiment of the present invention;
The one single RBM structural representation that Fig. 5 provides for the embodiment of the present invention;
The structural representation of a kind of RBM stack that Fig. 6 provides for the embodiment of the present invention;
The structural representation of a kind of data processing equipment that Fig. 7 provides for the embodiment of the present invention;
The structural representation of a kind of pretreatment module that Fig. 8 provides for the embodiment of the present invention.
Detailed description of the invention
Below in conjunction with the accompanying drawing in the embodiment of the present invention, the technical scheme in the embodiment of the present invention is carried out clear, complete Describe, it is clear that described embodiment is only a part of embodiment of the present invention, rather than whole embodiments wholely.Based on Embodiment in the present invention, it is every other that those of ordinary skill in the art are obtained under the premise of not making creative work Embodiment, broadly falls into the scope of protection of the invention.
Below in conjunction with accompanying drawing 1-accompanying drawing 6, the data processing method providing the embodiment of the present invention describes in detail.
Refer to Fig. 1, for the flow chart of a kind of data processing method that the embodiment of the present invention provides;The method can include with Lower step S100-step S102.
S100, obtains pending Social behaviors data stream;
In the embodiment of the present invention, Social behaviors data stream is stream data, and this data stream is each user of system acquisition The Social behaviors data carrying out in various clients, such as, splitting glass opaque that each user is carried out in network application, add emerging Interest likes the behavioral data such as group's behavior and mutual-action behavior, and/or each user utilizes various payment to apply the payment carrying out Behavioral data, and each user is in various shopping Shopping Behaviors data of carrying out of application etc., this is not construed as limiting by the present invention. Various Social behaviors data constitute data stream according to behavior time of origin.
It should be noted that this Social behaviors data stream includes the Social behaviors data of multiple type, can be according to society The function handing over behavioral data carries out Type division, and for example, Social behaviors data stream can include the Social behaviors number of interaction style According to, the Social behaviors data of type of payment, Social behaviors data of type of play etc..
S101, pre-processes to described Social behaviors data stream, turns described Social behaviors data stream from data space It is changed to low dimensional feature space vector;
In the embodiment of the present invention, this Social behaviors data stream is pre-processed, thus obtain this Social behaviors data stream Low dimensional feature space vector, RBM is merely able to identify low dimensional feature space vector, it is impossible to identify initial data, it is therefore desirable to enter This conversion of row, optionally, this low dimensional feature space vector includes but is not limited to the number of times of Social behaviors data generation, number of days etc. Deng.
Optionally, to Social behaviors data stream from data space be converted to low dimensional feature space vector before, also include Following steps S10;
Described Social behaviors data stream is carried out classification process, it is thus achieved that the Social behaviors data of the plurality of type by S10 Stream;
Concrete, Social behaviors data stream includes the Social behaviors data of multiple type, the Social behaviors of each type Arranging according to time interleaving between data, forming data stream, in this Social behaviors data stream, each Social behaviors data all identify Having the content of the application identities producing this Social behaviors data and this Social behaviors data, system is according to this Social behaviors of generation This Social behaviors data can be carried out the division process of type by market demand mark and the content of Social behaviors data, than As the Social behaviors data that certain network application produces are the behavior that user sends comment, then determine this Social behaviors data Social behaviors data for interaction style.It should be noted that by belong to all Social behaviors data of some type according to Time of origin arranges, and can form the Social behaviors data stream of the type.
Further alternative, this Social behaviors data stream is carried out pretreatment and may comprise steps of S11;
The Social behaviors data stream of each type in the Social behaviors data stream of the plurality of type is carried out pre-place by S11 Reason, it is thus achieved that normalization matrix, to realize from data space, described Social behaviors data stream is converted to low dimensional feature space vector;
Concrete, by each type in the Social behaviors data stream of multiple types in pending Social behaviors data stream Social behaviors data stream pre-processes, it is thus achieved that normalization matrix, and this normalization matrix is i.e. low dimensional feature space vector, optional , this normalization matrix includes the normalization statistics of each type Social behaviors data in the plurality of type Social behaviors data Amount, this preprocessing process can include carrying out statistical disposition and normalized to the Social behaviors data stream of each type.
As it is shown on figure 3, data preprocessing module is made up of some submodules, as shown in the figure;Data preprocessing module inputs It for the stream data of Social behaviors, is output as the normalization matrix M through conversion;Data preprocessing module comprises observation device Module, normalized device submodule and reconstruct submodule.
Optionally, the Social behaviors data stream of each type in the Social behaviors data stream of the plurality of type is carried out pre-place Reason may comprise steps of:
Step one, for the described Social behaviors data stream of each type, uses the first observation function, to described social row Carry out the first inspection process for data stream, it is thus achieved that corresponding first observation of the type Social behaviors data, described first observation Function is the statistical function of the type Social behaviors data.
Concrete, for the Social behaviors data stream of each type, use the first observation function to Social behaviors data stream Carry out the first inspection process, it should be noted that the Social behaviors data stream of a type uses a kind of first observation function to enter Row is processed, as it is shown on figure 3, Social behaviors data stream enters pretreatment module, after carrying out classification process, the social row of a type It for the corresponding first observation function f of data stream, as it can be seen, the number of the first observation function exists n, is f respectively1、f2、 f3…fn.After the Social behaviors data stream of one type carries out the first inspection process, it is possible to obtain the type Social behaviors data pair The first observation answered, be i.e. the first observation function f output for the first observation after Social behaviors data stream is processed Value.
As it is shown on figure 3, observation device submodule F comprises one group of observation function (f1,f2,f3,…fn), F supports longitudinally to expand Exhibition, wherein fnFor streaming data at the observation function of specific function point (particular type Social behaviors data), fnOutput valve is for seeing The observation of measuring point, including but not limited to observations such as the number of times of particular type Social behaviors, number of days.
Optionally, the first observation function is statistical function, and the first observation is the statistic after statistical function is processed, First observation function of each type can be different, and the Social behaviors data stream of certain type is after the first observation function is processed The first observation obtaining is i.e. the statistic of the type Social behaviors data.
Step 2, for the first observation of the described Social behaviors of each type, uses the second observation function, to described First observation of Social behaviors data carries out the second inspection process, it is thus achieved that corresponding second observation of the type Social behaviors data Value, described second observation function is the normalized function of the type Social behaviors data.
Concrete, for the first observation of the Social behaviors data of each type, the second observation can be used further Function, carries out the second inspection process to the statistic of the Social behaviors data of the type, it is thus achieved that the type Social behaviors data pair The second observation answered.Corresponding a kind of second observation function of the Social behaviors data of one type, as it is shown on figure 3, a type Social behaviors data stream by a kind of first observation function process after, it is thus achieved that the type Social behaviors data first observation Value, then the first observation of the type Social behaviors data is inputted a kind of second observation function, obtain the type Social behaviors Second observation of data, after being i.e. the Social behaviors data stream input processing of a type, obtains a kind of second observation.
Optionally, this second observation function can be the normalized function of the type Social behaviors data, the second observation Normalization statistic for the type Social behaviors data.
As it is shown on figure 3, normalized device submodule F' comprises one group of observation function (f1',f2',f3',…fn'), F' supports Longitudinal Extension, wherein fn' it is specific observation station normalized function, receive the first observation of observation device F output, export normalizing The second observation after change;f'nWith fnTransformational relation as follows:
f n ′ = f n - m i n i ( f n ) max i ( f n ) - m i n i ( f n )
Second observation reconstruct of each type Social behaviors data in the plurality of type is formed normalization by step 3 Matrix.
Concrete, the second observation of each type Social behaviors data in the plurality of type is formed normalization matrix, As it is shown on figure 3, all types of Social behaviors data are after the first observation function and the second observation function are processed, input weight Structure submodule is normalized the formation of matrix, finally exports normalization matrix M.
Further alternative, described the second observation reconstruct by each type Social behaviors data in the plurality of type Form normalization matrix, comprising:
With the number of types of the plurality of type as matrix column, by each type Social behaviors number in the plurality of type According to second observation reconstruct formed normalization matrix.
Concrete, the second observation reconstruct of the Social behaviors data of each type in multiple types is being formed normalization Matrix, generation type can have multiple, below with two kinds of optional embodiments as an example:
As the optional embodiment of one, with the number of types of the plurality of type as matrix column, with described difference The amount of cycles of collection period is the row of matrix, by the second observation of each type Social behaviors data in the plurality of type Reconstruct forms normalization matrix.
Or, as the optional embodiment of another kind, with the number of types of the plurality of type as matrix column, formed The normalization matrix of the amount of cycles number in described different acquisition cycle, each class of multiple types described in a kind of collection period Second observation reconstruct of type Social behaviors data forms a normalization matrix.
Concrete, as it is shown on figure 3, n second after reconstruct submodule receives normalized device F' output normalization is seen These second observations are melted into a n-dimensional vector according to the numbering of observation function, sequence, are converted to this n-dimensional vector by measured value Normalization matrix.
Reconstructed module when generating normalization matrix generally in the following ways, if Social behaviors data stream comprising multiple The different acquisition cycle gather Social behaviors data, by Social behaviors data stream according to some cycles (the such as second, point, when, day, Week, the moon etc.) collect many groups different time dimension normalization statistic serializing vector;Fig. 4 gives a kind of output The form of normalization matrix, row represents different acquisition periods dimension, and different types is shown in list, is i.e. the row of this normalization matrix The quantity that number is the different acquisition cycle, the n-dimensional vector that the such as first behavior is formed with week for the data that collection period gathers, second The n-dimensional vector that behavior is formed with the moon for the data that collection period gathers, the third line is the data shape gathering for collection period with week The n-dimensional vector becoming, each provisional capital has n to arrange, and each row represent a type.
Optionally, it is also possible to by the data reconstruction in different acquisition cycle to multiple different matrixes, as with sky for gathering In the sequence vector in cycle to normalization matrix, the sequence vectorization with the moon as collection period to another one normalizes In matrix.
The limited Boltzmann machine RBM stack of described low dimensional feature space vector input to multi-layer is calculated by S102 Process, to complete the extraction to hidden feature in described Social behaviors data stream.
Optionally, the described limited Boltzmann machine RBM by described low dimensional feature space vector input to multi-layer is carried out Calculating process, comprising:
The every time row element to the described normalization matrix of RBM stack input of described multi-layer, corresponding one of element Input;
Successively calculating process is carried out to described normalization matrix by the plurality of RBM of series connection, to extract described social activity Hidden feature in behavioral data stream.
In the embodiment of the present invention, hidden feature can include rule implicit in Social behaviors data stream, and this rule can The essence of reflection Social behaviors data, can be advantageously in the accuracy of data modeling by extracting this hidden feature.
In the embodiment of the present invention, the RBM stack of described multi-layer is in series by multiple RBM.As shown in Figure 6, RBM stack is by many Individual (RBM1, RBM2, RBM3, RBM4...RBMn) is in series.
Initial data is transformed into the low dimensional feature space that RBM can identify by data preprocessing module;The RBM of this multi-layer Stack receives the low dimensional feature space vector of data preprocessing module output, and Fig. 5 illustrates the structure of single RBM.RBM is a class tool Having double-layer structure, symmetrical connection and the stochastic neural net model without self feed back, interlayer connects entirely, connectionless in layer.Every time to One row element of the RBM stack input normalization matrix of described multi-layer, wherein one of row element element correspondence one is defeated Enter end, i as shown in Figure 51~in;Successively each row element of described normalization matrix is entered by the plurality of RBM of series connection Row calculating process, to extract the hidden feature in described Social behaviors data stream, is i.e. that each row element of normalization matrix is equal Successively can process through the RBM stack of this multi-layer.
The embodiment of the present invention is by the RBM of the characteristic variable expression application multiple-level stack based on degree of depth learning hierarchy assembly Social behaviors data are converted into the data sequence that degree of depth learning network can identify by stack, and (above-mentioned is pre-to utilize feature coding device The RBM stack of processing module and multi-layer) by Social behaviors data from data space map to feature space, automatically discovery hide Pattern in data and rule, extract abstract hidden feature, is automatically performed the task of feature representation.
Degree of depth study is the new field of one of machine learning research, and its motivation is to set up, simulation human brain is carried out point Analysing the neutral net of study, the mechanism that it imitates human brain explains data, such as image, sound and text.Degree of depth study is nothing The one of supervised learning.The concept of degree of depth study comes from the research of artificial neural network.Multilayer perceptron containing many hidden layers is exactly A kind of degree of depth study structure.
Degree of depth study forms more abstract high-rise expression attribute classification or feature by combination low-level feature, to find number According to distributed nature represent.The valuable value contained in big data becomes the driving force that people process big data, utilizes big number The descriptive power that more data dimension strengthens weak related data can be collected according to technology.Degree of depth study is had very by building The machine learning model of many hidden layers and the training data of magnanimity, learn more useful feature, thus finally promote classification or pre- The accuracy surveyed.I.e. by the means of " depth model ", realize that " feature learning " is purpose.
RBM is the infrastructure component building degree of depth learning model;Network based on RBM composition uses the side of unsupervised learning Method, the matching input data of maximum possible.By successively eigentransformation, the character representation in former space for the sample is transformed to one New feature space, so that classification or prediction are more prone to.Compared with the method for manual construction feature, utilize degree of depth learning hierarchy Assembly learns big data characteristics, more can portray the abundant internal information of data.
The embodiment of the present invention is based on each layer RBM of fast learning algorithm training to sdpecific dispersion;Its training process is as follows: first Step trains up first RBM, fixes weight and the side-play amount of first RBM;Second step uses the state of its recessive neuron, Input vector as second RBM;After 3rd step trains up second RBM, second RBM is stacked on first RBM Top;As shown in Figure 6, the RMB stack output of several RBM stacking is obtained hidden feature through unsupervised learning.
The embodiment of the present invention, obtains pending Social behaviors data stream, pre-processes this Social behaviors data stream, Social behaviors data stream is converted to low dimensional feature space vector from data space, by this low dimensional feature space vector input at most The RBM stack of level carries out calculating process, and to complete the extraction to hidden feature in Social behaviors data stream, this mode is by advance Process and Social behaviors stream compression is changed to the low dimensional feature space vector that RBM can identify, and automatically carried by RBM further Take hidden feature abstract in Social behaviors data stream, improve efficiency, reduce R&D costs.
Realize the feature own coding device of technical solution of the present invention as in figure 2 it is shown, feature own coding device mainly to comprise data pre- The RBM stack composition of processor and multi-layer;Preprocessor mainly comprises pretreatment module, and pretreatment module is for Social behaviors Data stream pre-processes, and concrete structure can use the structure shown in Fig. 3.RBM stack is made up of the RBM of some stackings;Mainly The data using degree of depth learning algorithm to export preprocessor calculate, and obtain the hidden feature of Social behaviors data stream, should Hidden feature uses for data modeling.
Below in conjunction with accompanying drawing 7-accompanying drawing 8, a kind of data processing equipment providing the embodiment of the present invention is situated between in detail Continue.
Refer to Fig. 7, the structural representation of a kind of data processing equipment providing for the embodiment of the present invention, as it can be seen, This data processing equipment includes acquisition module the 100th, pretreatment module 101 and computing module 102.
Acquisition module 100, for obtaining pending Social behaviors data stream.
In the embodiment of the present invention, Social behaviors data stream is stream data, and this data stream is each user of system acquisition The Social behaviors data carrying out in various clients, such as, splitting glass opaque that each user is carried out in network application, add emerging Interest likes the behavioral data such as group's behavior and mutual-action behavior, and/or each user utilizes various payment to apply the payment carrying out Behavioral data, and each user is in various shopping Shopping Behaviors data of carrying out of application etc., this is not construed as limiting by the present invention. Various Social behaviors data constitute data stream according to behavior time of origin.
It should be noted that this Social behaviors data stream includes the Social behaviors data of multiple type, can be according to society The function handing over behavioral data carries out Type division, and for example, Social behaviors data stream can include the Social behaviors number of interaction style According to, the Social behaviors data of type of payment, Social behaviors data of type of play etc..
Pretreatment module 101, for pre-processing to described Social behaviors data stream, by described Social behaviors data stream Be converted to low dimensional feature space vector from data space.
In the embodiment of the present invention, this Social behaviors data stream is pre-processed, thus obtain this Social behaviors data stream Low dimensional feature space vector, RBM is merely able to identify low dimensional feature space vector, it is impossible to identify initial data, it is therefore desirable to enter This conversion of row, optionally, this low dimensional feature space vector includes but is not limited to the number of times of Social behaviors data generation, number of days etc. Deng.
Optionally, the data processing equipment of the embodiment of the present invention also includes sort module 103.
Sort module 103, for carrying out classification process by described Social behaviors data stream, it is thus achieved that the social row of multiple types For data stream.
Concrete, Social behaviors data stream includes the Social behaviors data of multiple type, the Social behaviors of each type Arranging according to time interleaving between data, forming data stream, in this Social behaviors data stream, each Social behaviors data all identify Having the content of the application identities producing this Social behaviors data and this Social behaviors data, system is according to this Social behaviors of generation This Social behaviors data can be carried out the division process of type by market demand mark and the content of Social behaviors data, than As the Social behaviors data that certain network produces are the behavior that user sends comment, then be defined as this Social behaviors data The Social behaviors data of interaction style.It should be noted that all Social behaviors data of some type will be belonged to according to sending out Raw Time alignment, can form the Social behaviors data stream of the type.
Described pretreatment module 101 is specifically for the society of each type in the Social behaviors data stream by the plurality of type Behavioral data stream is handed over to pre-process, it is thus achieved that normalization matrix, to realize turning described Social behaviors data stream from data space It is changed to low dimensional feature space vector.
Concrete, by each type in the Social behaviors data stream of multiple types in pending Social behaviors data stream Social behaviors data stream pre-processes, it is thus achieved that normalization matrix, and optionally, this normalization matrix includes the plurality of type society Handing over the normalization statistic of each type Social behaviors data in behavioral data, this preprocessing process can include to each type Social behaviors data stream carry out statistical disposition and normalized.
As it is shown on figure 3, data preprocessing module is made up of some submodules, as shown in the figure;Data prediction device inputs It for the stream data of Social behaviors, is output as the normalization matrix through conversion;Data preprocessing module comprises observation device Module, normalized device submodule and reconstruct submodule.
Further alternative, as shown in Figure 8, pretreatment module 101 can include that the 1010th, observation device submodule normalizes Device submodule 1011 and reconstruct submodule 1012.
Observation device submodule 1010, for the described Social behaviors data stream for each type, uses the first observation Function, carries out the first inspection process to described Social behaviors data stream, it is thus achieved that the type Social behaviors data corresponding first are seen Measured value, described first observation function is the statistical function of the type Social behaviors data.
Concrete, for the Social behaviors data stream of each type, use the first observation function to Social behaviors data stream Carry out the first inspection process, it should be noted that the Social behaviors data stream of a type uses a kind of first observation function to enter Row is processed, as it is shown on figure 3, Social behaviors data stream enters pretreatment module, after carrying out classification process, the social row of a type It for the corresponding first observation function f of data stream, as it can be seen, the number of the first observation function exists n, is f respectively1、f2、 f3…fn.After the Social behaviors data stream of one type carries out the first inspection process, it is possible to obtain the type Social behaviors data pair The first observation answered, be i.e. the first observation function f output for the first observation after Social behaviors data stream is processed Value.
As it is shown on figure 3, observation device submodule F comprises one group of observation function (f1,f2,f3,…fn), F supports longitudinally to expand Exhibition, wherein fnFor streaming data at the observation function of specific function point (particular type Social behaviors data), fnOutput valve is for seeing The observation of measuring point, including but not limited to observations such as the number of times of particular type Social behaviors, number of days.
Optionally, the first observation function is statistical function, and the first observation is the statistic after statistical function is processed, First observation function of each type can be different, and the Social behaviors data stream of certain type is after the first observation function is processed The first observation obtaining is i.e. the statistic of the type Social behaviors data.
Second process subelement 10111, for the first observation of the described Social behaviors data for each type, adopts Use the second observation function, the second inspection process is carried out to the first observation of described Social behaviors data, it is thus achieved that the type is social Corresponding second observation of behavioral data, described second observation function is the normalized function of the type Social behaviors data;
Concrete, for the first observation of the Social behaviors data of each type, the second observation can be used further Function, carries out the second inspection process to the statistic of the Social behaviors data of the type, it is thus achieved that the type Social behaviors data pair The second observation answered.Corresponding a kind of second observation function of the Social behaviors data of one type, as it is shown on figure 3, a type Social behaviors data stream by a kind of first observation function process after, it is thus achieved that the type Social behaviors data first observation Value, then the first observation of the type Social behaviors data is inputted a kind of second observation function, obtain the type Social behaviors Second observation of data, after being i.e. the Social behaviors data stream input processing of a type, obtains a kind of second observation.
Optionally, this second observation function can be the normalized function of the type Social behaviors data, the second observation Normalization statistic for the type Social behaviors data.
As it is shown on figure 3, normalized device submodule F' comprises one group of observation function (f1',f2',f3',…fn'), F' supports Longitudinal Extension, wherein fn' it is specific observation station normalized function, receive the first observation of observation device F output, export normalizing The second observation after change;fn' and fnTransformational relation as follows:
f n ′ = f n - m i n i ( f n ) max i ( f n ) - m i n i ( f n )
Reconstruct submodule 1012, for by the second observation weight of each type Social behaviors data in the plurality of type It is configured to normalization matrix.
Concrete, the second observation reconstruct of each type Social behaviors data in the plurality of type is formed normalized moments Battle array, as it is shown on figure 3, all types of Social behaviors data are after the first observation function and the second observation function are processed, defeated Enter to reconstruct submodule and be normalized the formation of matrix, finally export normalization matrix M.
Described reconstruct submodule 1012 is specifically for the number of types of the plurality of type as matrix column, by described many Second observation reconstruct of each type Social behaviors data in individual type forms normalization matrix.
With the number of types of the plurality of type as matrix column, by each type Social behaviors number in the plurality of type According to second observation reconstruct formed normalization matrix.
Concrete, the second observation reconstruct of the Social behaviors data of each type in multiple types is being formed normalization Matrix, generation type can have multiple, below with two kinds of optional embodiments as an example:
As the optional embodiment of one, with the number of types of the plurality of type as matrix column, with described difference The amount of cycles of collection period is the row of matrix, by the second observation of each type Social behaviors data in the plurality of type Reconstruct forms normalization matrix;Or,
As the optional embodiment of another kind, with the number of types of the plurality of type as matrix column, formed described The normalization matrix of the amount of cycles number in different acquisition cycle, each type society of multiple types described in a kind of collection period The the second observation reconstruct handing over behavioral data forms a normalization matrix.
Concrete, as it is shown on figure 3, n second after reconstruct submodule receives normalized device F' output normalization is seen These second observations are melted into a n-dimensional vector according to the numbering of observation function, sequence, are converted to this n-dimensional vector by measured value Normalization matrix.
Reconstructed module when generating normalization matrix generally in the following ways, if Social behaviors data stream comprising multiple The different acquisition cycle gather Social behaviors data, by Social behaviors data stream according to some cycles (the such as second, point, when, day, Week, the moon etc.) collect many groups different time dimension normalization statistic serializing vector;Fig. 4 gives a kind of output The form of normalization matrix, row represents different acquisition periods dimension, and different types is shown in list, is i.e. the row of this normalization matrix The quantity that number is the different acquisition cycle, the n-dimensional vector that the such as first behavior is formed with week for the data that collection period gathers, second The n-dimensional vector that behavior is formed with the moon for the data that collection period gathers, the third line is the data shape gathering for collection period with week The n-dimensional vector becoming, each provisional capital has n to arrange, and each row represent a type.
Optionally, it is also possible to by the data reconstruction in different acquisition cycle to multiple different matrixes, as with sky for gathering In the sequence vector in cycle to normalization matrix, the sequence vectorization with the moon as collection period to another one normalizes In matrix.
Computing module 102 is specifically for the limited Boltzmann machine of described low dimensional feature space vector input to multi-layer RBM carries out calculating process, to complete the extraction to hidden feature in described Social behaviors data stream.
Optionally, the described limited Boltzmann machine RBM stack by described low dimensional feature space vector input to multi-layer enters Row calculating process, comprising:
The every time row element to the described normalization matrix of RBM stack input of described multi-layer, corresponding one of element Input;
Successively calculating process is carried out to described normalization matrix by the plurality of RBM of series connection, to extract described social activity Hidden feature in behavioral data stream.
In the embodiment of the present invention, hidden feature can include rule implicit in Social behaviors data stream, and this rule can The essence of reflection Social behaviors data, can be advantageously in the accuracy of data modeling by extracting this hidden feature.
In the embodiment of the present invention, limited Boltzmann machine (the Restricted Boltzmann of described multi-layer Machines, RBM) stack is in series by multiple RBM.As shown in Figure 6, RBM stack by multiple (RBM1, RBM2, RBM3, RBM4...RBMn) in series.
Initial data is transformed into the low dimensional feature space that RBM can identify by data preprocessing module;The RBM of this multi-layer Stack receives the low dimensional feature space vector of data preprocessing module output, and Fig. 5 illustrates the structure of single RBM.RBM is a class tool Having double-layer structure, symmetrical connection and the stochastic neural net model without self feed back, interlayer connects entirely, connectionless in layer.Every time to One row element of the RBM stack input normalization matrix of described multi-layer, wherein one of row element element correspondence one is defeated Enter end, i as shown in Figure 51~in;Successively each row element of described normalization matrix is entered by the plurality of RBM of series connection Row calculating process, to extract the hidden feature in described Social behaviors data stream, is i.e. that each row element of normalization matrix is equal Successively can process through this multi-layer RBM stack.
The embodiment of the present invention is by the characteristic variable expression application multiple-level stack based on degree of depth learning hierarchy assembly Social behaviors data are converted into the data sequence that degree of depth learning network can identify by RBM, utilize feature coding device (above-mentioned The RBM stack of pretreatment module and multi-layer) by Social behaviors data from data space map to feature space, automatically find hidden Ensconce the pattern in data and rule, extract abstract hidden feature, be automatically performed the task of feature representation.
Degree of depth study is the new field of one of machine learning research, and its motivation is to set up, simulation human brain is carried out point Analysing the neutral net of study, the mechanism that it imitates human brain explains data, such as image, sound and text.Degree of depth study is nothing The one of supervised learning.The concept of degree of depth study comes from the research of artificial neural network.Multilayer perceptron containing many hidden layers is exactly A kind of degree of depth study structure.
Degree of depth study forms more abstract high-rise expression attribute classification or feature by combination low-level feature, to find number According to distributed nature represent.The valuable value contained in big data becomes the driving force that people process big data, utilizes big number The descriptive power that more data dimension strengthens weak related data can be collected according to technology.Degree of depth study is had very by building The machine learning model of many hidden layers and the training data of magnanimity, learn more useful feature, thus finally promote classification or pre- The accuracy surveyed.I.e. by the means of " depth model ", realize that " feature learning " is purpose.
RBM is the infrastructure component building degree of depth learning model;Network based on RBM composition uses the side of unsupervised learning Method, the matching input data of maximum possible.By successively eigentransformation, the character representation in former space for the sample is transformed to one New feature space, so that classification or prediction are more prone to.Compared with the method for manual construction feature, utilize degree of depth learning hierarchy Assembly learns big data characteristics, more can portray the abundant internal information of data.
The embodiment of the present invention is based on each layer RBM of fast learning algorithm training to sdpecific dispersion;Its training process is as follows: first Step trains up first RBM, fixes weight and the side-play amount of first RBM;Second step uses the state of its recessive neuron, Input vector as second RBM;After 3rd step trains up second RBM, second RBM is stacked on first RBM Top;As shown in Figure 6, the RMB stack output of several RBM stacking is obtained hidden feature through unsupervised learning.
The embodiment of the present invention, obtains pending Social behaviors data stream, pre-processes this Social behaviors data stream, Social behaviors data stream is converted to low dimensional feature space vector from data space, by this low dimensional feature space vector input at most The RBM stack of level carries out calculating process, and to complete the extraction to hidden feature in Social behaviors data stream, this mode is by advance Process and Social behaviors stream compression is changed to the low dimensional feature space vector that RBM can identify, and automatically carried by RBM further Take hidden feature abstract in Social behaviors data stream, improve efficiency, reduce R&D costs.
One of ordinary skill in the art will appreciate that all or part of flow process realizing in above-described embodiment method, be permissible Instructing related hardware by computer program to complete, described program can be stored in a computer read/write memory medium In, shown in accompanying drawing 7-accompanying drawing 8, the corresponding program of data processing equipment is storable in the readable storage medium storing program for executing of equipment, and is set by this At least one processor in Bei performs, and to realize above-mentioned data processing method, the method includes in Fig. 1 described in embodiment of the method Flow process.Wherein, described storage medium can be magnetic disc, CD, read-only store-memory body (Read-Only Memory, ROM) Or random store-memory body (Random Access Memory, RAM) etc..
The above disclosed present pre-ferred embodiments that is only, can not limit the right model of the present invention with this certainly Enclose, the equivalent variations therefore made according to the claims in the present invention, still belong to the scope that the present invention is covered.

Claims (12)

1. a data processing method, it is characterised in that include:
Obtain pending Social behaviors data stream;
Described Social behaviors data stream is pre-processed, described Social behaviors data stream is converted to low-dimensional from data space special Levy space vector;
The vector input of described low dimensional feature space is carried out calculating process to the limited Boltzmann machine RBM stack of multi-layer, with complete The extraction of hidden feature in paired described Social behaviors data stream.
2. the method for claim 1, it is characterised in that described pre-process described Social behaviors data stream, will Described Social behaviors data stream also included before data space is converted to low dimensional feature space vector:
Described Social behaviors data stream is carried out classification process, it is thus achieved that the Social behaviors data stream of multiple types;
Described described Social behaviors data stream is pre-processed, described Social behaviors data stream is converted to from data space low Dimensional feature space vector, comprising:
The Social behaviors data stream of each type in the Social behaviors data stream of the plurality of type is pre-processed, it is thus achieved that return One change matrix, to realize from data space, described Social behaviors data stream is converted to low dimensional feature space vector.
3. method as claimed in claim 2, it is characterised in that described by every in the Social behaviors data stream of the plurality of type The Social behaviors data stream of individual type pre-processes, it is thus achieved that normalization matrix, comprising:
For the described Social behaviors data stream of each type, use the first observation function, described Social behaviors data are flow to Row the first inspection process, it is thus achieved that corresponding first observation of the type Social behaviors data, described first observation function is such The statistical function of type Social behaviors data;
For described first observation of the described Social behaviors data of each type, use the second observation function, to described society Described first observation handing over behavioral data carries out the second inspection process, it is thus achieved that the type Social behaviors data corresponding second are seen Measured value, described second observation function is the normalized function of the type Social behaviors data;
Second observation of each type Social behaviors data in the plurality of type is reconstructed formation normalization matrix.
4. method as claimed in claim 3, it is characterised in that described by each type Social behaviors number in the plurality of type According to the second observation be reconstructed formation normalization matrix, comprising:
With the number of types of the plurality of type as matrix column, by each type Social behaviors data in the plurality of type Second observation reconstruct forms normalization matrix.
5. method as claimed in claim 4, it is characterised in that if comprising multiple different acquisition in described Social behaviors data stream The Social behaviors data that cycle gathers;
The described number of types with the plurality of type is as matrix column, by each type Social behaviors number in the plurality of type According to second observation reconstruct formed normalization matrix, comprising:
With the number of types of the plurality of type as matrix column, with the amount of cycles in described different acquisition cycle as matrix OK, the second observation reconstruct of each type Social behaviors data in the plurality of type is formed normalization matrix;Or,
With the number of types of the plurality of type as matrix column, form the returning of amount of cycles number in described different acquisition cycle One change matrix, the second observation reconstruct of each type Social behaviors data of multiple types described in a kind of collection period is formed One normalization matrix.
6. the method as described in claim 4 or 5, it is characterised in that the RBM stack of described multi-layer includes multiple RBM series connection structure Become;
The described limited Boltzmann machine RBM stack by described low dimensional feature space vector input to multi-layer carries out calculating process, Including:
A row element to the described normalization matrix of RBM stack input of described multi-layer every time, element one input of correspondence End;
Successively calculating process is carried out to described normalization matrix by the plurality of RBM of series connection, to extract described Social behaviors Hidden feature in data stream.
7. a data processing equipment, it is characterised in that include:
Acquisition module, for obtaining pending Social behaviors data stream;
Pretreatment module, for pre-processing to described Social behaviors data stream, by described Social behaviors data stream from data Space is converted to low dimensional feature space vector;
Computing module, for carrying out the limited Boltzmann machine RBM stack of described low dimensional feature space vector input to multi-layer Calculating process, to complete the extraction to hidden feature in described Social behaviors data stream.
8. device as claimed in claim 7, it is characterised in that described device also includes:
Sort module, for carrying out classification process by described Social behaviors data stream, it is thus achieved that the Social behaviors data of multiple types Stream;
Described pretreatment module is specifically for the Social behaviors of each type in the Social behaviors data stream by the plurality of type Data stream pre-processes, it is thus achieved that normalization matrix, to realize being converted to low from data space by described Social behaviors data stream Dimensional feature space vector.
9. device as claimed in claim 8, it is characterised in that described pretreatment module includes:
Observation device submodule, for the described Social behaviors data stream for each type, uses the first observation function, to institute State Social behaviors data stream and carry out the first inspection process, it is thus achieved that corresponding first observation of the type Social behaviors data, described First observation function is the statistical function of the type Social behaviors data;
Normalized device submodule, for described first observation of the described Social behaviors data for each type, uses Second observation function, carries out the second inspection process to described first observation of described Social behaviors data, it is thus achieved that the type society Handing over corresponding second observation of behavioral data, described second observation function is the normalized function of the type Social behaviors data;
Reconstruct submodule, for returning the second observation reconstruct formation of each type Social behaviors data in the plurality of type One change matrix.
10. device as claimed in claim 9, it is characterised in that described reconstruct submodule is specifically for the plurality of type Number of types be matrix column, by the plurality of type each type Social behaviors data second observation reconstruct formed Normalization matrix.
11. devices as claimed in claim 10, it is characterised in that adopt if comprising multiple difference in described Social behaviors data stream The Social behaviors data that the collection cycle gathers;
Described reconstruct submodule is specifically for the number of types of the plurality of type as matrix column, with described different acquisition week The amount of cycles of phase is the row of matrix, by the second observation reconstruct shape of each type Social behaviors data in the plurality of type Become normalization matrix;Or,
With the number of types of the plurality of type as matrix column, form the returning of amount of cycles number in described different acquisition cycle One change matrix, the second observation reconstruct of each type Social behaviors data of multiple types described in a kind of collection period is formed One normalization matrix.
12. devices as described in claim 10 or 11, it is characterised in that the RBM stack of described multi-layer includes that multiple RBM connects Constitute;
Described computing module specifically for every time to a row element of the described normalization matrix of RBM stack input of described multi-layer, One element one input of correspondence;
Successively calculating process is carried out to described normalization matrix by the plurality of RBM of series connection, to extract described Social behaviors Hidden feature in data stream.
CN201610394934.7A 2016-06-03 2016-06-03 A kind of data processing method and device Active CN106096638B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610394934.7A CN106096638B (en) 2016-06-03 2016-06-03 A kind of data processing method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610394934.7A CN106096638B (en) 2016-06-03 2016-06-03 A kind of data processing method and device

Publications (2)

Publication Number Publication Date
CN106096638A true CN106096638A (en) 2016-11-09
CN106096638B CN106096638B (en) 2018-08-07

Family

ID=57448313

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610394934.7A Active CN106096638B (en) 2016-06-03 2016-06-03 A kind of data processing method and device

Country Status (1)

Country Link
CN (1) CN106096638B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510071A (en) * 2017-05-10 2018-09-07 腾讯科技(深圳)有限公司 Feature extracting method, device and the computer readable storage medium of data
CN111414384A (en) * 2020-02-26 2020-07-14 有米科技股份有限公司 Mass streaming data processing method and device

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345656A (en) * 2013-07-17 2013-10-09 中国科学院自动化研究所 Method and device for data identification based on multitask deep neural network
CN103440352A (en) * 2013-09-24 2013-12-11 中国科学院自动化研究所 Method and device for analyzing correlation among objects based on deep learning
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345656A (en) * 2013-07-17 2013-10-09 中国科学院自动化研究所 Method and device for data identification based on multitask deep neural network
CN103440352A (en) * 2013-09-24 2013-12-11 中国科学院自动化研究所 Method and device for analyzing correlation among objects based on deep learning
CN105045857A (en) * 2015-07-09 2015-11-11 中国科学院计算技术研究所 Social network rumor recognition method and system

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
HUGO LAROCHELLE等: "Learning Algorithms for the Classification Restricted Boltzmann Machine", 《JOURNAL OF MACHINE LEARNING RESEARCH》 *
吴证等: "结合主元成分分析的受限玻耳兹曼机神经网络的降维方法", 《上海交通大学学报》 *
张春霞等: "受限波尔兹曼机", 《工程数学学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108510071A (en) * 2017-05-10 2018-09-07 腾讯科技(深圳)有限公司 Feature extracting method, device and the computer readable storage medium of data
CN108510071B (en) * 2017-05-10 2020-01-10 腾讯科技(深圳)有限公司 Data feature extraction method and device and computer readable storage medium
CN111414384A (en) * 2020-02-26 2020-07-14 有米科技股份有限公司 Mass streaming data processing method and device

Also Published As

Publication number Publication date
CN106096638B (en) 2018-08-07

Similar Documents

Publication Publication Date Title
CN106156003B (en) A kind of question sentence understanding method in question answering system
CN110032635B (en) Problem pair matching method and device based on depth feature fusion neural network
CN107516110A (en) A kind of medical question and answer Semantic Clustering method based on integrated convolutional encoding
CN102929942B (en) The overlapping community discovery method of a kind of community network based on integrated study
CN106934352A (en) A kind of video presentation method based on two-way fractal net work and LSTM
CN108596039A (en) A kind of bimodal emotion recognition method and system based on 3D convolutional neural networks
CN110532436A (en) Across social network user personal identification method based on community structure
CN106202489A (en) A kind of agricultural pest intelligent diagnosis system based on big data
CN106991374A (en) Handwritten Digit Recognition method based on convolutional neural networks and random forest
CN105931116A (en) Automated credit scoring system and method based on depth learning mechanism
CN106295799A (en) A kind of implementation method of degree of depth study multilayer neural network
CN109272332B (en) Client loss prediction method based on recurrent neural network
CN105183841A (en) Recommendation method in combination with frequent item set and deep learning under big data environment
CN107240136A (en) A kind of Still Image Compression Methods based on deep learning model
CN110807122A (en) Image-text cross-modal feature disentanglement method based on depth mutual information constraint
CN110377689A (en) Paper intelligent generation method, device, computer equipment and storage medium
CN106355210B (en) Insulator Infrared Image feature representation method based on depth neuron response modes
CN106959946A (en) A kind of text semantic feature generation optimization method based on deep learning
CN104036242B (en) The object identification method of Boltzmann machine is limited based on Centering Trick convolution
CN104156464A (en) Micro-video retrieval method and device based on micro-video feature database
CN109960732A (en) A kind of discrete Hash cross-module state search method of depth and system based on robust supervision
CN106096638A (en) A kind of data processing method and device
CN108595527A (en) A kind of personalized recommendation method and system of the multi-source heterogeneous information of fusion
Li et al. Regional network education information collection platform for smart classrooms based on big data technology
CN106126578B (en) A kind of web service recommendation method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant