US20190370651A1 - Deep Co-Clustering - Google Patents
Deep Co-Clustering Download PDFInfo
- Publication number
- US20190370651A1 US20190370651A1 US16/429,425 US201916429425A US2019370651A1 US 20190370651 A1 US20190370651 A1 US 20190370651A1 US 201916429425 A US201916429425 A US 201916429425A US 2019370651 A1 US2019370651 A1 US 2019370651A1
- Authority
- US
- United States
- Prior art keywords
- features
- instances
- loss
- cross
- mutual information
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Abandoned
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/213—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods
- G06F18/2134—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis
- G06F18/21342—Feature extraction, e.g. by transforming the feature space; Summarisation; Mappings, e.g. subspace methods based on separation criteria, e.g. independent component analysis using statistical independence, i.e. minimising mutual information or maximising non-gaussianity
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
- G06N3/084—Backpropagation, e.g. using gradient descent
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N5/00—Computing arrangements using knowledge-based models
- G06N5/04—Inference or reasoning models
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N7/00—Computing arrangements based on specific mathematical models
- G06N7/01—Probabilistic graphical models, e.g. probabilistic networks
Definitions
- the present invention relates to co-clustering data and, more particularly, to co-clustering that uses neural networks.
- a method for co-clustering data includes reducing dimensionality for instances and features of an input dataset independently of one another.
- a mutual information loss is determined for the instances and the features independently of one another.
- the instances and the features are cross-correlated, based on the mutual information loss, to determine a cross-correlation loss.
- Co-clusters in the input data are determined based on the cross-correlation loss.
- a data co-clustering system includes an instance autoencoder configured to reduce a dimensionality for instances of an input dataset.
- a feature autoencoder is configured to reduce a dimensionality for features of an input dataset.
- An instance mutual information loss branch is configured to determining a mutual information loss for the instances.
- a feature mutual information loss branch is configured to determine a mutual information loss for the features.
- a processor is configured to cross-correlate the instances and the features based on the mutual information loss, to determine a cross-correlation loss and to determine co-clusters in the input data based on the cross-correlation loss.
- FIG. 1 is a block/flow diagram of a method/system for co-clustering data in accordance with an embodiment of the present invention
- FIG. 2 is a diagram of an exemplary neural network in accordance with an embodiment of the present invention.
- FIG. 3 is a block/flow diagram of a method for classifying documents based on co-clustering in accordance with an embodiment of the present invention
- FIG. 4 is a block diagram of a data co-clustering system in accordance with an embodiment of the present invention.
- FIG. 5 is a block diagram of a processing system in accordance with an embodiment of the present invention.
- systems and methods that perform co-clustering using deep neural networks.
- the present embodiments use a deep autoencoder to generate low-dimensional representations for instances and features, which are then used as input to respective inference paths, each including an inference network and a Gaussian mixture model (GMM).
- the GMM outputs are cross-correlated using mutual information loss.
- the present embodiments can optimize the parameters of the deep-autoencoder, the inference neural network, and the GMM jointly.
- Co-clustering is particularly advantageous for its identification of feature clusters based on instance clusters.
- One exemplary application for co-clustering is in text document classification, particularly when training labels are not used.
- Co-clustering identifies word clusters for each document cluster, making it easy to know the category of each document cluster from the words in the corresponding word cluster. Thus, once the major words in a document have been identified, co-clustering makes it possible to identify the category that a new document belongs to.
- FIG. 1 a block diagram is shown that illustrates the steps performed by the present embodiments.
- the instances and features are provided to separate paths.
- the duality between instances and features indicates that instances can be grouped based on features and that features can be grouped based on instances.
- the raw input is provided to a deep autoencoder 102 that reduces the dimensionality of the input.
- the deep autoencoder 102 performs an encoding from the original high-dimensional space to a low-dimensional space.
- the deep autoencoder 102 then decodes the low-dimensional encoding to reproduce the high-dimensional input to verify that the low-dimensional encoding maintains the information of the original input data.
- the encoded instances and features are then output by their respective autoencoders 102 .
- An inference network 104 and a GMM 106 provides cluster assignments for the instances and the features, providing a mutual information loss.
- Cross-correlation block 108 uses the mutual information loss to correlate the instances with the features, providing the co-clustered output.
- text document data can represent the documents as instances and the words within the documents as features Similar documents usually share similar word distributions, so that the instances of text document data can be grouped into clusters based on the features, while similar words often exist in similar documents. The features can then be clustered based on the instances.
- the instances and features can be represented as a data matrix.
- the instances and features can be reorganized into homogeneous blocks referred to herein as co-clusters.
- Co-clusters are subsets of an original data matrix and are characterized as a set of instances and a set of features, with values in a given subset being similar.
- Co-clusters reflect the structural information in the original data and can indicate relationships between instances and features.
- the present embodiments can be of particular use in fields relating to bioinformatics, recommendation systems, and image segmentation. Co-clustering is superior to traditional clustering in these fields because of its ability to use the relationships between instances and features.
- These instances and features are clustered into g instance clusters and g feature clusters. Co-clustering in the present embodiments therefore finds maps C r and C c :
- r and c designate rows (instances) and columns (features).
- the instances can be reordered such that instances that are grouped into the same cluster are arranged to be adjacent. Similar arrangements can be applied to features.
- the first step in performing co-clustering is to reduce the dimension of input data in block 102 .
- Some embodiments of the present invention use deep stacked autoencoders that perform unsupervised representation learning.
- the autoencoders 102 reduce both instances and features separately. Given the i th instance and the i th feature as x i and y j , the lower-dimension representations are denoted herein as:
- f r and f c denote encoding functions for instances and features, respectively, and ⁇ r and ⁇ c denote parameters of the autoencoders 102 .
- the encoding functions can be linear or nonlinear, depending on the domain data.
- the reconstruction losses of x i and y j are denoted as l(x i , g r (z i ; ⁇ r )) and l(y j , g c (w j ; ⁇ c )) separately, where g r and g c are decoding functions for instances and features, respectively.
- Deep neural networks are used as the inference neural networks 104 , using the low-dimensional representations as inputs.
- the outputs of the inference networks 104 are new representations of instances x i and y j , denoted as:
- v j ( v j1 , . . . ,v jm ) T
- g and m are the cluster numbers of instances and features, respectively. These representations can also be considered as clustering assignment probabilities when a softmax function is deployed as the last layer of the inference network.
- the posterior clustering assignment probability distributions of h i and v j are denoted as P ⁇ r (k
- the clustering assignment distributions of instances and features, based on the inference neural network 104 are denoted as Q ⁇ r (k
- the present embodiments jointly train the inference neural network 104 and GMM 106 in an end-to-end fashion. Similar training can be performed for both instances and features. Given the output of the autoencoders 102 , new representations based on the inference neural network 104 can be expressed as:
- Inf indicates the inference neural network 104 .
- N r n is the number of instances
- h ik is the value on the k th dimensionality of h i . If ⁇ r k , ⁇ r k , and ⁇ r k are given, the clustering probability of i th instance belonging to the k th cluster is:
- the present embodiments maximize the variational lower bound on the log-likelihood.
- the benefits are two-fold, making the distribution Q ⁇ r a better approximation to the distribution P ⁇ r by minimizing the KL divergence between them, and tightening the bound of the log-likelihood function to make the training process more effective.
- the variational lower bound on log-likelihood, r is defined as:
- h i ) ⁇ E Q (log(Q(k
- the clustering assignment probability for the j th feature belonging to the k th cluster is expressed as:
- ⁇ c k , ⁇ c k , and ⁇ c k are the mixture probability, mean, and covariance of the k th component in the GMM for the features, and m is the number of feature clusters.
- the variational lower bound on log-likelihood for features is:
- N c d is the number of features
- P ⁇ c and Q ⁇ c are denoted as P and Q for brevity.
- the present embodiments take ⁇ r and ⁇ c as the losses for clustering assignment of instances and features.
- the cross-loss block 108 uses mutual information to correlate the trainings of instances and features. Based on the clustering assignments, the present embodiments construct a joint probability distribution between instances and features as p(X, Y) and the joint probability distribution between instance clusters and feature clusters as p( ⁇ circumflex over (X) ⁇ , ⁇ ). Block 108 penalizes the mutual information loss be-tween the two joint probability distributions.
- the joint probability between the s th instance cluster, ⁇ circumflex over (x) ⁇ s , and the t th feature cluster, ⁇ t is calculated as:
- the dot product can be used for (•) because many use cases have equal numbers of instances and features and because there is a corresponding relationship between instance clusters and feature clusters, where similar instances share similar features.
- the function can be any appropriate function according to the needs of the application.
- KL(•) is the Kullback-Liebler divergence
- q ⁇ ( x i , y j ) p ⁇ ( x ⁇ s , y ⁇ t ) ⁇ ( p ⁇ ( x i ) p ⁇ ( x ⁇ s ) ) ⁇ ( p ⁇ ( y j ) p ⁇ ( y ⁇ t ) ) .
- each joint probability distribution is also greater than equal to zero, leaving the instance-feature cross loss as:
- the cross loss term shows that the difference between the joint probability distributions should not be significant for an optimal co-clustering.
- Co-clustering is then performed in block 110 using the cross loss. Co-clustering optimizes an objective function,
- J 3 ⁇ 8 ⁇ ( 1 - I ⁇ ( X ⁇ ; Y ⁇ ) I ⁇ ( X ; Y ) )
- l(x i , g r (z i )) and l(y j , g c (w j )) are reconstruction losses for the autoencoders 102
- P ae ( ⁇ r ) and P ae ( ⁇ r ) are the penalties for the parameters of the autoencoders 102
- the ⁇ factors are parameters used to balance different parts of the loss function
- r and c are the variational lower bounds.
- the A parameters are optimized by cross-validation.
- P inf ( ⁇ r ) and P int ( ⁇ c ) are the sum of the inverse of the diagonal entries of covariance matrices:
- d r and d c are the data dimensionality of the outputs of the autoencoders 102 .
- the P inf terms are used to avoid trivial solutions where diagonal entries in covariance matrices degenerate to zero.
- the output of the optimization is the clustering assignments of both samples and features.
- an artificial neural network (ANN) architecture 200 is shown. It should be understood that the present architecture is purely exemplary and that other architectures or types of neural network may be used instead. In the context of the present embodiments, it should be understood that additional layers will be used for the autoencoders 102 , inference networks 104 , and GMM networks 106 .
- the ANN embodiment described herein is included with the intent of illustrating general principles of neural network computation at a high level of generality and should not be construed as limiting in any way.
- layers of neurons described below and the weights connecting them are described in a general manner and can be replaced by any type of neural network layers with any appropriate degree or type of interconnectivity.
- layers can include convolutional layers, pooling layers, fully connected layers, softmax layers, or any other appropriate type of neural network layer.
- layers can be added or removed as needed and the weights can be omitted for more complicated forms of interconnection.
- a set of input neurons 202 each provide an input signal in parallel to a respective row of weights 204 .
- the weights 204 each have a respective settable value, such that a weight output passes from the weight 204 to a respective hidden neuron 206 to represent the weighted input to the hidden neuron 206 .
- the weights 204 may simply be represented as coefficient values that are multiplied against the relevant signals. The signals from each weight adds column-wise and flows to a hidden neuron 206 .
- the hidden neurons 206 use the signals from the array of weights 204 to perform some calculation.
- the hidden neurons 206 then output a signal of their own to another array of weights 204 .
- This array performs in the same way, with a column of weights 204 receiving a signal from their respective hidden neuron 206 to produce a weighted signal output that adds row-wise and is provided to the output neuron 208 .
- any number of these stages may be implemented, by interposing additional layers of arrays and hidden neurons 206 . It should also be noted that some neurons may be constant neurons 209 , which provide a constant output to the array. The constant neurons 209 can be present among the input neurons 202 and/or hidden neurons 206 and are only used during feed-forward operation.
- the output neurons 208 provide a signal back across the array of weights 204 .
- the output layer compares the generated network response to training data and computes an error.
- the error signal can be made proportional to the error value.
- a row of weights 204 receives a signal from a respective output neuron 208 in parallel and produces an output which adds column-wise to provide an input to hidden neurons 206 .
- the hidden neurons 206 combine the weighted feedback signal with a derivative of its feed-forward calculation and stores an error value before outputting a feedback signal to its respective column of weights 204 . This back propagation travels through the entire network 200 until all hidden neurons 206 and the input neurons 202 have stored an error value.
- the stored error values are used to update the settable values of the weights 204 .
- the weights 204 can be trained to adapt the neural network 200 to errors in its processing. It should be noted that the three modes of operation, feed forward, back propagation, and weight update, do not overlap with one another.
- Block 202 trains a co-clustering network in an end-to-end fashion.
- the network is described above, with separate branches being trained for the respective instances and features using an autoencoder 102 , an inference network 104 , and a GMM network 106 .
- the two branches are then cross-correlated to in block 108 and the cross correlation loss information is used in co-clustering to generate an output.
- the training process uses training data that includes a set of known inputs and their corresponding known co-clustered outputs, which can be supplied by any appropriate means.
- the training 302 uses discrepancies between the network's generated output and the expected output to provide adjustments to the weights 204 of the network.
- the entire co-clustering process is trained end-to-end, rather than training each segment in a piecewise fashion. This advantageously prevents the training process from stopping in local optima in the autoencoders 102 , helping improve overall co-clustering performance.
- Block 304 uses the trained network to perform clustering on input data that has dependencies between its rows and columns. As noted above, block 304 reduces the dimensionality of the data and then performs inferences on the rows and the columns before identifying a mutual information loss between the rows and the columns that can be used to co-cluster them.
- the output can be, for example, a matrix having one or more co-clusters within it, with the co-clusters representing groupings of data that have relationships between their column and row information.
- Block 306 uses the trained co-clustering network to identify clustered features of a new document.
- the new document can represent textual data, but it should be understood that other embodiments can include documents that represent any kind of data, such as graphical data, audio data, binary data, executable data, etc.
- Block 308 uses the network to identify document clusters based on how the identified features of the new document aligns with known feature clusters.
- the words in a text document can be mapped to word clusters for known documents. The word clusters thereby identify corresponding co-clustered document clusters, such that block 308 finds a classification for the new document.
- Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements.
- the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system.
- a computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device.
- the medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium.
- the medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
- Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein.
- the inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
- a data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus.
- the memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution.
- I/O devices including but not limited to keyboards, displays, pointing devices, etc. may be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks.
- Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- the system 400 includes a hardware processor 402 and memory 404 .
- a co-clustering neural network 406 is implemented as described above, with autoencoders 102 , inference networks 104 , and GMM networks 106 .
- the co-clustering neural network 406 also includes static functions, such as the cross-loss block 108 and the joint optimization performed by co-clustering 110 .
- a training module 408 can be implemented as software that is stored in the memory 404 and that is executed by the hardware processor. In other embodiments, the training module 408 can be implemented in one or more discrete hardware components such as, e.g., an application-specific integrated chip or a field programmable gate array. The training module 408 trains the neural network 406 in an end-to-end fashion using a provided set of training data.
- the processing system 500 includes at least one processor (CPU) 504 operatively coupled to other components via a system bus 502 .
- a cache 506 operatively coupled to the system bus 502 .
- ROM Read Only Memory
- RAM Random Access Memory
- I/O input/output
- sound adapter 530 operatively coupled to the system bus 502 .
- network adapter 540 operatively coupled to the system bus 502 .
- user interface adapter 550 operatively coupled to the system bus 502 .
- display adapter 560 are operatively coupled to the system bus 502 .
- a first storage device 522 is operatively coupled to system bus 502 by the I/O adapter 520 .
- the storage device 522 can be any of a disk storage device (e.g., a magnetic or optical disk storage device), a solid state magnetic device, and so forth.
- the storage device 522 can be the same type of storage device or different types of storage devices.
- a speaker 532 is operatively coupled to system bus 502 by the sound adapter 530 .
- a transceiver 542 is operatively coupled to system bus 502 by network adapter 540 .
- a display device 562 is operatively coupled to system bus 502 by display adapter 560 .
- a first user input device 552 is operatively coupled to system bus 502 by user interface adapter 550 .
- the user input device 552 can be any of a keyboard, a mouse, a keypad, an image capture device, a motion sensing device, a microphone, a device incorporating the functionality of at least two of the preceding devices, and so forth. Of course, other types of input devices can also be used, while maintaining the spirit of the present principles.
- the user input device 522 can be the same type of user input device or different types of user input devices.
- the user input device 552 is used to input and output information to and from system 500 .
- processing system 500 may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements.
- various other input devices and/or output devices can be included in processing system 500 , depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art.
- various types of wireless and/or wired input and/or output devices can be used.
- additional processors, controllers, memories, and so forth, in various configurations can also be utilized as readily appreciated by one of ordinary skill in the art.
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Artificial Intelligence (AREA)
- Evolutionary Computation (AREA)
- Software Systems (AREA)
- Mathematical Physics (AREA)
- Computing Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Databases & Information Systems (AREA)
- General Health & Medical Sciences (AREA)
- Molecular Biology (AREA)
- Biophysics (AREA)
- Biomedical Technology (AREA)
- Health & Medical Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Algebra (AREA)
- Computational Mathematics (AREA)
- Mathematical Analysis (AREA)
- Mathematical Optimization (AREA)
- Pure & Applied Mathematics (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
Methods and systems for co-clustering data include reducing dimensionality for instances and features of an input dataset independently of one another. A mutual information loss is determined for the instances and the features independently of one another. The instances and the features are cross-correlated, based on the mutual information loss, to determine a cross-correlation loss. Co-clusters in the input data are determined based on the cross-correlation loss.
Description
- This application claims priority to U.S. Provisional Patent Application No. 62/679,749, filed on Jun. 1, 2018, incorporated herein by reference herein its entirety.
- The present invention relates to co-clustering data and, more particularly, to co-clustering that uses neural networks.
- Co-clustering clusters both instances and features simultaneously. For example, when rating movies, people and their rating values can be considered as instances and features, respectively. Seen another way, data expressed in the rows and columns of a matrix can represent respective instances and features. The duality between instances and features indicates that instances can be grouped based on features, and that features can be grouped based on instances.
- A method for co-clustering data includes reducing dimensionality for instances and features of an input dataset independently of one another. A mutual information loss is determined for the instances and the features independently of one another. The instances and the features are cross-correlated, based on the mutual information loss, to determine a cross-correlation loss. Co-clusters in the input data are determined based on the cross-correlation loss.
- A data co-clustering system includes an instance autoencoder configured to reduce a dimensionality for instances of an input dataset. A feature autoencoder is configured to reduce a dimensionality for features of an input dataset. An instance mutual information loss branch is configured to determining a mutual information loss for the instances. A feature mutual information loss branch is configured to determine a mutual information loss for the features. A processor is configured to cross-correlate the instances and the features based on the mutual information loss, to determine a cross-correlation loss and to determine co-clusters in the input data based on the cross-correlation loss.
- These and other features and advantages will become apparent from the following detailed description of illustrative embodiments thereof, which is to be read in connection with the accompanying drawings.
- The disclosure will provide details in the following description of preferred embodiments with reference to the following figures wherein:
-
FIG. 1 is a block/flow diagram of a method/system for co-clustering data in accordance with an embodiment of the present invention; -
FIG. 2 is a diagram of an exemplary neural network in accordance with an embodiment of the present invention; -
FIG. 3 is a block/flow diagram of a method for classifying documents based on co-clustering in accordance with an embodiment of the present invention; -
FIG. 4 is a block diagram of a data co-clustering system in accordance with an embodiment of the present invention; and -
FIG. 5 is a block diagram of a processing system in accordance with an embodiment of the present invention. - In accordance with the present invention, systems and methods are provided that perform co-clustering using deep neural networks. The present embodiments use a deep autoencoder to generate low-dimensional representations for instances and features, which are then used as input to respective inference paths, each including an inference network and a Gaussian mixture model (GMM). The GMM outputs are cross-correlated using mutual information loss. The present embodiments can optimize the parameters of the deep-autoencoder, the inference neural network, and the GMM jointly.
- Co-clustering, as described herein, is particularly advantageous for its identification of feature clusters based on instance clusters. One exemplary application for co-clustering is in text document classification, particularly when training labels are not used. Co-clustering identifies word clusters for each document cluster, making it easy to know the category of each document cluster from the words in the corresponding word cluster. Thus, once the major words in a document have been identified, co-clustering makes it possible to identify the category that a new document belongs to.
- Referring now to
FIG. 1 , a block diagram is shown that illustrates the steps performed by the present embodiments. The instances and features are provided to separate paths. The duality between instances and features indicates that instances can be grouped based on features and that features can be grouped based on instances. - In each path, the raw input is provided to a
deep autoencoder 102 that reduces the dimensionality of the input. Thedeep autoencoder 102 performs an encoding from the original high-dimensional space to a low-dimensional space. Thedeep autoencoder 102 then decodes the low-dimensional encoding to reproduce the high-dimensional input to verify that the low-dimensional encoding maintains the information of the original input data. The encoded instances and features are then output by theirrespective autoencoders 102. - An
inference network 104 and a GMM 106 provides cluster assignments for the instances and the features, providing a mutual information loss.Cross-correlation block 108 uses the mutual information loss to correlate the instances with the features, providing the co-clustered output. - To use one example, text document data can represent the documents as instances and the words within the documents as features Similar documents usually share similar word distributions, so that the instances of text document data can be grouped into clusters based on the features, while similar words often exist in similar documents. The features can then be clustered based on the instances.
- In some embodiments, the instances and features can be represented as a data matrix. After clustering, the instances and features can be reorganized into homogeneous blocks referred to herein as co-clusters. Co-clusters are subsets of an original data matrix and are characterized as a set of instances and a set of features, with values in a given subset being similar. Co-clusters reflect the structural information in the original data and can indicate relationships between instances and features. Besides identifying similar documents, the present embodiments can be of particular use in fields relating to bioinformatics, recommendation systems, and image segmentation. Co-clustering is superior to traditional clustering in these fields because of its ability to use the relationships between instances and features.
- In the present embodiments, the instances are represented as {xi}i=1 n={x1, . . . , xn} and the features are represented as {yi}j=1 d={y1, . . . , yd}, with n being a number of instances and d being a number of features. These instances and features are clustered into g instance clusters and g feature clusters. Co-clustering in the present embodiments therefore finds maps Cr and Cc:
-
C r :{x 1 , . . . ,x n }→{{circumflex over (x)} 1 , . . . ,{circumflex over (x)} g} -
C c :{y 1 , . . . ,y d }→{ŷ 1 , . . . ,ŷ m} - where r and c designate rows (instances) and columns (features). The instances can be reordered such that instances that are grouped into the same cluster are arranged to be adjacent. Similar arrangements can be applied to features.
- The new data structure includes blocks of similar instances and features, referred to herein as co-clusters. If X and Y are two discrete, random variables taking values from the sets {xi}i=1 n and {yi}j=1 d separately, then the joint probability distribution between X and Y is denoted herein as p(X, Y). Similarly, if {circumflex over (X)} and Ŷ are two discrete random variables from the sets {{circumflex over (x)}i}s=1 g and {ŷi}t=1 m, where {{circumflex over (x)}i}s=1 g={{circumflex over (x)}1, . . . ,{circumflex over (x)}g} and {ŷi}t=1 m={ŷ1, . . . ,ŷm}, the joint probability distribution between {circumflex over (X)} and Ŷ is denoted as p({circumflex over (X)},Ŷ). {circumflex over (X)} and Ŷ indicate the partitions induced by X and Y−{circumflex over (X)}=Cr(X) and Ŷ=Cc(Y).
- As described above, the first step in performing co-clustering is to reduce the dimension of input data in
block 102. Some embodiments of the present invention use deep stacked autoencoders that perform unsupervised representation learning. Theautoencoders 102 reduce both instances and features separately. Given the ith instance and the ith feature as xi and yj, the lower-dimension representations are denoted herein as: -
z i =f r(x i;θr) -
w j =f c(y j;θc) - where fr and fc denote encoding functions for instances and features, respectively, and θr and θc denote parameters of the
autoencoders 102. The encoding functions can be linear or nonlinear, depending on the domain data. The reconstruction losses of xi and yj are denoted as l(xi, gr(zi; θr)) and l(yj, gc(wj;θc)) separately, where gr and gc are decoding functions for instances and features, respectively. - Using the low-dimensional representations produced by the
autoencoders 102, the present embodiments use variational inference to produce clustering assignment probabilities. Deep neural networks are used as the inferenceneural networks 104, using the low-dimensional representations as inputs. The outputs of theinference networks 104 are new representations of instances xi and yj, denoted as: -
h i =h i1 , . . . ,h ig)T -
v j=(v j1 , . . . ,v jm)T - where g and m are the cluster numbers of instances and features, respectively. These representations can also be considered as clustering assignment probabilities when a softmax function is deployed as the last layer of the inference network.
- These outputs are also generated by GMM blocks 106. The posterior clustering assignment probability distributions of hi and vj, based on GMM, are denoted as Pϕ
r (k|hi) and Pϕc (k|vj), where ϕr and ϕc are the parameters of GMM when dealing with instances and features separately. The clustering assignment distributions of instances and features, based on the inferenceneural network 104, are denoted as Qηr (k|hi) and Qηr (k|hj), where ηr and ηc denote the parameters of the inference networks 104. - Instead of applying a two-step strategy for GMM, the present embodiments jointly train the inference
neural network 104 andGMM 106 in an end-to-end fashion. Similar training can be performed for both instances and features. Given the output of theautoencoders 102, new representations based on the inferenceneural network 104 can be expressed as: -
h i=softmax(Inf(z i;ηr) - where Inf indicates the inference
neural network 104. The mixture probability, mean, and covariance of the kth component in the GMM (ϕr={πr k,μr k,Σr k) for instances can be estimated as: -
- where Nr=n is the number of instances, Nr k=Σi=1 N
r hik, and hik is the value on the kth dimensionality of hi. If πr k, μr k, and Σr k are given, the clustering probability of ith instance belonging to the kth cluster is: -
-
- Instead of maximizing the log-likelihood function directly, the present embodiments maximize the variational lower bound on the log-likelihood. The benefits are two-fold, making the distribution Qη
r a better approximation to the distribution Pϕr by minimizing the KL divergence between them, and tightening the bound of the log-likelihood function to make the training process more effective. The variational lower bound on log-likelihood, r is defined as: -
- where H(k|hi)=−EQ(log(Q(k|hi))) is the Shannon entropy and Pϕ
r and Qηr are represented as P and Q for brevity. - The clustering assignment probability for the jth feature belonging to the kth cluster is expressed as:
-
- where πc k, μc k, and Σc k are the mixture probability, mean, and covariance of the kth component in the GMM for the features, and m is the number of feature clusters. The variational lower bound on log-likelihood for features is:
-
- The
cross-loss block 108 uses mutual information to correlate the trainings of instances and features. Based on the clustering assignments, the present embodiments construct a joint probability distribution between instances and features as p(X, Y) and the joint probability distribution between instance clusters and feature clusters as p({circumflex over (X)}, Ŷ).Block 108 penalizes the mutual information loss be-tween the two joint probability distributions. - Given the clustering assignment probability of the ith instance as γr(i)=(γr(i) 1, . . . ,γr(i) g)T and the jth feature as γc(j)=(γc(j) 1, . . . ,γc(j) g)T, the joint probability between the ith instance and the jth feature is denoted as p(xi,yj)=(γr(i),γc(j)), where (•) is a function to calculate the joint probability, such as the dot product. The joint probability between the sth instance cluster, {circumflex over (x)}s, and the tth feature cluster, ŷt, is calculated as:
-
p({circumflex over (x)} s ,ŷ t)=Σ{p(x i ,y j)|x i ∈{circumflex over (x)} s ,y j ∈ŷ t} - The dot product can be used for (•) because many use cases have equal numbers of instances and features and because there is a corresponding relationship between instance clusters and feature clusters, where similar instances share similar features. Although the dot product is specifically contemplated, the function can be any appropriate function according to the needs of the application.
- Given the joint probability distributions p(X, Y) and p({circumflex over (X)}, Ŷ), the mutual information between X and Y and between {circumflex over (X)} and Ŷ are calculated as:
-
- where p(xi)=Σy
j p(xi,yj), p(yj)=Σxi p(xi,yj), p({circumflex over (x)}s)=Σŷt p({circumflex over (x)}s,ŷt), and p(ŷt)=Σ{circumflex over (x)}s p({circumflex over (x)}s,ŷt) The difference between I(X; Y)−I({circumflex over (X)}; Ŷ) is: -
KL(p(X;Y)∥q(X,Y)) - where KL(•) is the Kullback-Liebler divergence and
-
- The difference is greater than equal to zero, and each joint probability distribution is also greater than equal to zero, leaving the instance-feature cross loss as:
-
- The cross loss term shows that the difference between the joint probability distributions should not be significant for an optimal co-clustering.
- Co-clustering is then performed in block 110 using the cross loss. Co-clustering optimizes an objective function,
-
- to tend the parameters θr, θc, ηr, ηc, where J1 and J2 are the losses for the trainings of instances and feature, respectively, J3 is the instance-feature cross loss, θr and θc are the parameters of the
autoencoders 102, and ηr and ηc are the parameters of the inferenceneural networks 104. The parts of the objective function are broken down as follows: -
- where l(xi, gr (zi)) and l(yj, gc(wj)) are reconstruction losses for the
autoencoders 102, Pae(θr) and Pae (θr) are the penalties for the parameters of theautoencoders 102, the λ factors are parameters used to balance different parts of the loss function, and r and c are the variational lower bounds. The A parameters are optimized by cross-validation. The terms Pinf(Σr) and Pint(Σc) are the sum of the inverse of the diagonal entries of covariance matrices: -
- where dr and dc are the data dimensionality of the outputs of the
autoencoders 102. The Pinf terms are used to avoid trivial solutions where diagonal entries in covariance matrices degenerate to zero. The output of the optimization is the clustering assignments of both samples and features. - Referring now to
FIG. 2 , an artificial neural network (ANN)architecture 200 is shown. It should be understood that the present architecture is purely exemplary and that other architectures or types of neural network may be used instead. In the context of the present embodiments, it should be understood that additional layers will be used for theautoencoders 102,inference networks 104, and GMM networks 106. The ANN embodiment described herein is included with the intent of illustrating general principles of neural network computation at a high level of generality and should not be construed as limiting in any way. - Furthermore, the layers of neurons described below and the weights connecting them are described in a general manner and can be replaced by any type of neural network layers with any appropriate degree or type of interconnectivity. For example, layers can include convolutional layers, pooling layers, fully connected layers, softmax layers, or any other appropriate type of neural network layer. Furthermore, layers can be added or removed as needed and the weights can be omitted for more complicated forms of interconnection.
- During feed-forward operation, a set of
input neurons 202 each provide an input signal in parallel to a respective row ofweights 204. In the hardware embodiment described herein, theweights 204 each have a respective settable value, such that a weight output passes from theweight 204 to a respectivehidden neuron 206 to represent the weighted input to the hiddenneuron 206. In software embodiments, theweights 204 may simply be represented as coefficient values that are multiplied against the relevant signals. The signals from each weight adds column-wise and flows to ahidden neuron 206. - The
hidden neurons 206 use the signals from the array ofweights 204 to perform some calculation. Thehidden neurons 206 then output a signal of their own to another array ofweights 204. This array performs in the same way, with a column ofweights 204 receiving a signal from their respective hiddenneuron 206 to produce a weighted signal output that adds row-wise and is provided to theoutput neuron 208. - It should be understood that any number of these stages may be implemented, by interposing additional layers of arrays and
hidden neurons 206. It should also be noted that some neurons may beconstant neurons 209, which provide a constant output to the array. Theconstant neurons 209 can be present among theinput neurons 202 and/or hiddenneurons 206 and are only used during feed-forward operation. - During back propagation, the
output neurons 208 provide a signal back across the array ofweights 204. The output layer compares the generated network response to training data and computes an error. The error signal can be made proportional to the error value. In this example, a row ofweights 204 receives a signal from arespective output neuron 208 in parallel and produces an output which adds column-wise to provide an input to hiddenneurons 206. Thehidden neurons 206 combine the weighted feedback signal with a derivative of its feed-forward calculation and stores an error value before outputting a feedback signal to its respective column ofweights 204. This back propagation travels through theentire network 200 until allhidden neurons 206 and theinput neurons 202 have stored an error value. - During weight updates, the stored error values are used to update the settable values of the
weights 204. In this manner theweights 204 can be trained to adapt theneural network 200 to errors in its processing. It should be noted that the three modes of operation, feed forward, back propagation, and weight update, do not overlap with one another. - Referring now to
FIG. 3 , a method for co-clustering data is shown. Block 202 trains a co-clustering network in an end-to-end fashion. The network is described above, with separate branches being trained for the respective instances and features using anautoencoder 102, aninference network 104, and aGMM network 106. The two branches are then cross-correlated to inblock 108 and the cross correlation loss information is used in co-clustering to generate an output. The training process uses training data that includes a set of known inputs and their corresponding known co-clustered outputs, which can be supplied by any appropriate means. Thetraining 302 uses discrepancies between the network's generated output and the expected output to provide adjustments to theweights 204 of the network. - It is specifically contemplated that the entire co-clustering process is trained end-to-end, rather than training each segment in a piecewise fashion. This advantageously prevents the training process from stopping in local optima in the
autoencoders 102, helping improve overall co-clustering performance. - Block 304 then uses the trained network to perform clustering on input data that has dependencies between its rows and columns. As noted above, block 304 reduces the dimensionality of the data and then performs inferences on the rows and the columns before identifying a mutual information loss between the rows and the columns that can be used to co-cluster them. The output can be, for example, a matrix having one or more co-clusters within it, with the co-clusters representing groupings of data that have relationships between their column and row information.
-
Block 306 then uses the trained co-clustering network to identify clustered features of a new document. In some embodiments, the new document can represent textual data, but it should be understood that other embodiments can include documents that represent any kind of data, such as graphical data, audio data, binary data, executable data, etc. Block 308 uses the network to identify document clusters based on how the identified features of the new document aligns with known feature clusters. Thus, in one example, the words in a text document can be mapped to word clusters for known documents. The word clusters thereby identify corresponding co-clustered document clusters, such that block 308 finds a classification for the new document. - Embodiments described herein may be entirely hardware, entirely software or including both hardware and software elements. In a preferred embodiment, the present invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc.
- Embodiments may include a computer program product accessible from a computer-usable or computer-readable medium providing program code for use by or in connection with a computer or any instruction execution system. A computer-usable or computer readable medium may include any apparatus that stores, communicates, propagates, or transports the program for use by or in connection with the instruction execution system, apparatus, or device. The medium can be magnetic, optical, electronic, electromagnetic, infrared, or semiconductor system (or apparatus or device) or a propagation medium. The medium may include a computer-readable storage medium such as a semiconductor or solid state memory, magnetic tape, a removable computer diskette, a random access memory (RAM), a read-only memory (ROM), a rigid magnetic disk and an optical disk, etc.
- Each computer program may be tangibly stored in a machine-readable storage media or device (e.g., program memory or magnetic disk) readable by a general or special purpose programmable computer, for configuring and controlling operation of a computer when the storage media or device is read by the computer to perform the procedures described herein. The inventive system may also be considered to be embodied in a computer-readable storage medium, configured with a computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner to perform the functions described herein.
- A data processing system suitable for storing and/or executing program code may include at least one processor coupled directly or indirectly to memory elements through a system bus. The memory elements can include local memory employed during actual execution of the program code, bulk storage, and cache memories which provide temporary storage of at least some program code to reduce the number of times code is retrieved from bulk storage during execution. Input/output or I/O devices (including but not limited to keyboards, displays, pointing devices, etc.) may be coupled to the system either directly or through intervening I/O controllers.
- Network adapters may also be coupled to the system to enable the data processing system to become coupled to other data processing systems or remote printers or storage devices through intervening private or public networks. Modems, cable modem and Ethernet cards are just a few of the currently available types of network adapters.
- Referring now to
FIG. 4 , a co-clustering system 400 is shown. The system 400 includes ahardware processor 402 andmemory 404. A co-clusteringneural network 406 is implemented as described above, withautoencoders 102,inference networks 104, and GMM networks 106. The co-clusteringneural network 406 also includes static functions, such as thecross-loss block 108 and the joint optimization performed by co-clustering 110. - A
training module 408 can be implemented as software that is stored in thememory 404 and that is executed by the hardware processor. In other embodiments, thetraining module 408 can be implemented in one or more discrete hardware components such as, e.g., an application-specific integrated chip or a field programmable gate array. Thetraining module 408 trains theneural network 406 in an end-to-end fashion using a provided set of training data. - Referring now to
FIG. 5 , anexemplary processing system 500 is shown which may represent the co-clustering system 400. Theprocessing system 500 includes at least one processor (CPU) 504 operatively coupled to other components via asystem bus 502. Acache 506, a Read Only Memory (ROM) 508, a Random Access Memory (RAM) 510, an input/output (I/O)adapter 520, asound adapter 530, anetwork adapter 540, auser interface adapter 550, and adisplay adapter 560, are operatively coupled to thesystem bus 502. - A
first storage device 522 is operatively coupled tosystem bus 502 by the I/O adapter 520. Thestorage device 522 can be any of a disk storage device (e.g., a magnetic or optical disk storage device), a solid state magnetic device, and so forth. Thestorage device 522 can be the same type of storage device or different types of storage devices. - A
speaker 532 is operatively coupled tosystem bus 502 by thesound adapter 530. Atransceiver 542 is operatively coupled tosystem bus 502 bynetwork adapter 540. Adisplay device 562 is operatively coupled tosystem bus 502 bydisplay adapter 560. - A first
user input device 552 is operatively coupled tosystem bus 502 byuser interface adapter 550. Theuser input device 552 can be any of a keyboard, a mouse, a keypad, an image capture device, a motion sensing device, a microphone, a device incorporating the functionality of at least two of the preceding devices, and so forth. Of course, other types of input devices can also be used, while maintaining the spirit of the present principles. Theuser input device 522 can be the same type of user input device or different types of user input devices. Theuser input device 552 is used to input and output information to and fromsystem 500. - Of course, the
processing system 500 may also include other elements (not shown), as readily contemplated by one of skill in the art, as well as omit certain elements. For example, various other input devices and/or output devices can be included inprocessing system 500, depending upon the particular implementation of the same, as readily understood by one of ordinary skill in the art. For example, various types of wireless and/or wired input and/or output devices can be used. Moreover, additional processors, controllers, memories, and so forth, in various configurations can also be utilized as readily appreciated by one of ordinary skill in the art. These and other variations of theprocessing system 500 are readily contemplated by one of ordinary skill in the art given the teachings of the present principles provided herein. - The foregoing is to be understood as being in every respect illustrative and exemplary, but not restrictive, and the scope of the invention disclosed herein is not to be determined from the Detailed Description, but rather from the claims as interpreted according to the full breadth permitted by the patent laws. It is to be understood that the embodiments shown and described herein are only illustrative of the present invention and that those skilled in the art may implement various modifications without departing from the scope and spirit of the invention. Those skilled in the art could implement various other feature combinations without departing from the scope and spirit of the invention. Having thus described aspects of the invention, with the details and particularity required by the patent laws, what is claimed and desired protected by Letters Patent is set forth in the appended claims.
Claims (19)
1. A method for co-clustering data, comprising:
reducing dimensionality for instances and features of an input dataset independently of one another;
determining a mutual information loss for the instances and the features independently of one another;
cross-correlating the instances and the features, using a processor, based on the mutual information loss, to determine a cross-correlation loss; and
determining co-clusters in the input data based on the cross-correlation loss.
2. The method of claim 1 , further comprising classifying a new instance based on associated new features.
3. The method of claim 1 , wherein the instances include documents and the features include words associated with respective documents.
4. The method of claim 1 , wherein determining the mutual information loss includes an inference neural network step and a Gaussian mixture model step.
5. The method of claim 4 , further comprising an inference neural network and a Gaussian mixture model in an end-to-end fashion.
6. The method of claim 1 , wherein determining co-clusters includes optimizing an objective function that includes a respective dimension reconstruction loss term for the instances and for the features and a cross-correlation loss term that includes the determined cross-correlation loss.
7. The method of claim 6 , wherein the objective function is:
where J1 is the reconstruction loss term for the instances, J2 is the reconstruction loss term for the features, J3 is the cross-correlation loss term, θr and θc are dimension reduction parameters for the instances and the features, respectively, and ηr and ηc are mutual information loss parameters for the instances and the features, respectively.
8. The method of claim 6 , wherein reducing the dimensionality of the instances and the features comprises applying respective autoencoders to the input data.
9. The method of claim 8 , wherein each autoencoder determines a dimension reconstruction loss by reducing the dimensionality of data and then restoring the reduced dimensionality data to an original dimensionality.
10. The method of claim 1 , further comprising performing text classification using the determined co-clusters.
11. A data co-clustering system, comprising:
an instance autoencoder configured to reduce a dimensionality for instances of an input dataset;
a feature autoencoder configured to reduce a dimensionality for features of an input dataset;
an instance mutual information loss branch configured to determining a mutual information loss for the instances;
a feature mutual information loss branch configured to determine a mutual information loss for the features;
a processor configured to cross-correlate the instances and the features based on the mutual information loss, to determine a cross-correlation loss and to determine co-clusters in the input data based on the cross-correlation loss.
12. The system of claim 11 , wherein the processor is further configured to classify a new instance based on associated new features.
13. The system of claim 11 , wherein the instances include documents and the features include words associated with respective documents.
14. The system of claim 11 , wherein the input dataset comprises a matrix having columns that represent one of the features and the instances and rows that represent the other of the features and the instances.
15. The system of claim 11 , wherein each mutual information loss branch determines a respective mutual information loss using an inference neural network and a Gaussian mixture model.
16. The system of claim 15 , further comprising a training module configured to train the inference neural network and a Gaussian mixture model in an end-to-end fashion.
17. The system of claim 11 , wherein the processor is further configured to determine co-clusters using optimizing an objective function that includes a respective dimension reconstruction loss term for the instances and for the features and a cross-correlation loss term that includes the determined cross-correlation loss.
18. The system of claim 17 , wherein the objective function is:
where J1 is the reconstruction loss term for the instances, J2 is the reconstruction loss term for the features, J3 is the cross-correlation loss term, θr and θc are dimension reduction parameters for the instances and the features, respectively, and ηr and ηc are mutual information loss parameters for the instances and the features, respectively.
20. The method of claim 17 , wherein each autoencoder determines a dimension reconstruction loss by reducing the dimensionality of data and then restoring the reduced dimensionality data to an original dimensionality.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US16/429,425 US20190370651A1 (en) | 2018-06-01 | 2019-06-03 | Deep Co-Clustering |
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201862679749P | 2018-06-01 | 2018-06-01 | |
US16/429,425 US20190370651A1 (en) | 2018-06-01 | 2019-06-03 | Deep Co-Clustering |
Publications (1)
Publication Number | Publication Date |
---|---|
US20190370651A1 true US20190370651A1 (en) | 2019-12-05 |
Family
ID=68693557
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
US16/429,425 Abandoned US20190370651A1 (en) | 2018-06-01 | 2019-06-03 | Deep Co-Clustering |
Country Status (1)
Country | Link |
---|---|
US (1) | US20190370651A1 (en) |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898635A (en) * | 2020-06-24 | 2020-11-06 | 华为技术有限公司 | Neural network training method, data acquisition method and device |
US20220019888A1 (en) * | 2020-07-20 | 2022-01-20 | Adobe Inc. | Unified framework for dynamic clustering and discrete time event prediction |
Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090300547A1 (en) * | 2008-05-30 | 2009-12-03 | Kibboko, Inc. | Recommender system for on-line articles and documents |
US20180024968A1 (en) * | 2016-07-22 | 2018-01-25 | Xerox Corporation | System and method for domain adaptation using marginalized stacked denoising autoencoders with domain prediction regularization |
US20180129906A1 (en) * | 2016-11-07 | 2018-05-10 | Qualcomm Incorporated | Deep cross-correlation learning for object tracking |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
US20190228312A1 (en) * | 2018-01-25 | 2019-07-25 | SparkCognition, Inc. | Unsupervised model building for clustering and anomaly detection |
US20200065656A1 (en) * | 2016-11-15 | 2020-02-27 | Google Llc | Training neural networks using a clustering loss |
US20200327404A1 (en) * | 2016-03-28 | 2020-10-15 | Icahn School Of Medicine At Mount Sinai | Systems and methods for applying deep learning to data |
-
2019
- 2019-06-03 US US16/429,425 patent/US20190370651A1/en not_active Abandoned
Patent Citations (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20090300547A1 (en) * | 2008-05-30 | 2009-12-03 | Kibboko, Inc. | Recommender system for on-line articles and documents |
US20200327404A1 (en) * | 2016-03-28 | 2020-10-15 | Icahn School Of Medicine At Mount Sinai | Systems and methods for applying deep learning to data |
US20180024968A1 (en) * | 2016-07-22 | 2018-01-25 | Xerox Corporation | System and method for domain adaptation using marginalized stacked denoising autoencoders with domain prediction regularization |
US20180129906A1 (en) * | 2016-11-07 | 2018-05-10 | Qualcomm Incorporated | Deep cross-correlation learning for object tracking |
US20200065656A1 (en) * | 2016-11-15 | 2020-02-27 | Google Llc | Training neural networks using a clustering loss |
US20180165554A1 (en) * | 2016-12-09 | 2018-06-14 | The Research Foundation For The State University Of New York | Semisupervised autoencoder for sentiment analysis |
US20190228312A1 (en) * | 2018-01-25 | 2019-07-25 | SparkCognition, Inc. | Unsupervised model building for clustering and anomaly detection |
Non-Patent Citations (5)
Title |
---|
BADINO, L. et al., "An Auto-encoder based Approach to Unsupervised Learning of Subword Units" (Year: 2014) * |
PENG, H. et al., "Feature Selection Based on Mutual Information: Criteria of Max-Dependency, Max-Relevance, and Min-Redundancy" (Year: 2005) * |
QIU, G., "Image and Feature Co-Clustering" (Year: 2004) * |
USAMA, M. et al., "Unsupervised Machine Learning for Networking: Techniques, Applications and Research Challenges" (Year: 2017) * |
VINCENT, P. et al., "Extracting and Composing Robust Features with Denoising Autoencoders" (Year: 2008) * |
Cited By (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111898635A (en) * | 2020-06-24 | 2020-11-06 | 华为技术有限公司 | Neural network training method, data acquisition method and device |
US20220019888A1 (en) * | 2020-07-20 | 2022-01-20 | Adobe Inc. | Unified framework for dynamic clustering and discrete time event prediction |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
US11770571B2 (en) | Matrix completion and recommendation provision with deep learning | |
US11941523B2 (en) | Stochastic gradient boosting for deep neural networks | |
US20200184339A1 (en) | Representation learning for input classification via topic sparse autoencoder and entity embedding | |
Gribonval et al. | Sample complexity of dictionary learning and other matrix factorizations | |
US8489529B2 (en) | Deep convex network with joint use of nonlinear random projection, Restricted Boltzmann Machine and batch-based parallelizable optimization | |
CN112966114B (en) | Literature classification method and device based on symmetrical graph convolutional neural network | |
US11996116B2 (en) | Methods and systems for implementing on-device non-semantic representation fine-tuning for speech classification | |
CN109063719B (en) | Image classification method combining structure similarity and class information | |
US20200065656A1 (en) | Training neural networks using a clustering loss | |
CN108921342B (en) | Logistics customer loss prediction method, medium and system | |
US20220207352A1 (en) | Methods and systems for generating recommendations for counterfactual explanations of computer alerts that are automatically detected by a machine learning algorithm | |
CN112861936A (en) | Graph node classification method and device based on graph neural network knowledge distillation | |
CN111125520B (en) | Event line extraction method based on deep clustering model for news text | |
US11636667B2 (en) | Pattern recognition apparatus, pattern recognition method, and computer program product | |
US20150161232A1 (en) | Noise-enhanced clustering and competitive learning | |
US11475236B2 (en) | Minimum-example/maximum-batch entropy-based clustering with neural networks | |
US20190370651A1 (en) | Deep Co-Clustering | |
US11886955B2 (en) | Self-supervised data obfuscation in foundation models | |
Vialatte et al. | A study of deep learning robustness against computation failures | |
CN114329233A (en) | Cross-region cross-scoring collaborative filtering recommendation method and system | |
CN112148931A (en) | Meta path learning method for high-order abnormal picture classification | |
WO2023000165A1 (en) | Method and apparatus for classifying nodes of a graph | |
US20220309292A1 (en) | Growing labels from semi-supervised learning | |
US20220207353A1 (en) | Methods and systems for generating recommendations for counterfactual explanations of computer alerts that are automatically detected by a machine learning algorithm | |
US20220367051A1 (en) | Methods and systems for estimating causal effects from knowledge graphs |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
AS | Assignment |
Owner name: NEC LABORATORIES AMERICA, INC., NEW JERSEY Free format text: ASSIGNMENT OF ASSIGNORS INTEREST;ASSIGNORS:CHENG, WEI;CHEN, HAIFENG;NI, JINGCHAO;SIGNING DATES FROM 20190529 TO 20190530;REEL/FRAME:049346/0199 |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: DOCKETED NEW CASE - READY FOR EXAMINATION |
|
STPP | Information on status: patent application and granting procedure in general |
Free format text: NON FINAL ACTION MAILED |
|
STCB | Information on status: application discontinuation |
Free format text: ABANDONED -- FAILURE TO RESPOND TO AN OFFICE ACTION |