CN107636639B

CN107636639B - Fast orthogonal projection

Info

Publication number: CN107636639B
Application number: CN201680028711.7A
Authority: CN
Inventors: 于信男; 桑吉夫·库马尔; 郭锐淇
Original assignee: Google LLC
Current assignee: Google LLC
Priority date: 2015-09-24
Filing date: 2016-08-22
Publication date: 2021-01-08
Anticipated expiration: 2036-08-22
Also published as: EP3278238A1; JP2018524660A; JP6469890B2; CN107636639A; JP2019057329A; KR102002573B1; KR20170132291A

Abstract

The present invention relates to methods, systems, and apparatus, including computer programs encoded on computer storage media, for efficiently performing linear projection. In one aspect, a method includes acts of obtaining a plurality of content items from one or more content sources. The additional acts include extracting a plurality of features from each of the plurality of content items; generating a feature vector for each of said extracted features to create a search space; generating a series of cell matrices based on the generated feature vectors; transforming the series of element matrices into a structured matrix such that the transformation preserves one or more relationships associated with each element matrix in the series of element matrices; receiving a search object; searching the enhanced search space based on the received search object; providing one or more links responsive to content items of the search object.

Description

Fast orthogonal projection

Background

Large-scale searching and retrieving of information in various complex computer processes, such as, for example, computer vision applications, may utilize linear projections. Such applications may utilize orthogonal projections in order to preserve euclidean distances between data points. The generation of such orthogonal projections typically requires the use of unstructured matrices. However, the computational complexity of building unstructured orthogonal matrices is O (d)³) And the spatial and temporal complexity is O (d)²). This means that as the input dimension d increases, the generation of unstructured orthogonal matrices can become an extremely expensive operation.

Disclosure of Invention

According to one embodiment of the subject matter described by this specification, linear projection is performed efficiently using a larger structured matrix in order to achieve cost savings with respect to computation time and memory space. A larger structured matrix may be generated based on a series of smaller orthogonal element matrices. For example, a larger structured matrix may be formed by using the kronecker product of a series of smaller orthogonal element matrices. In mathematics, the expression is given by

The kronecker or tensor product of representation is an operation on two matrices of arbitrary size that produces a larger matrix. The kronecker product is a generalization of the outer product from vector to matrix and selects the matrix that gives the tensor product against the basis of the criteria.

In some aspects, the subject matter embodied in this specification can be embodied in methods that include the actions of obtaining a plurality of content items. Additional actions may include: extracting a plurality of features from each of a plurality of content items; generating a feature vector for each extracted feature to create a search space; generating a series of cell matrices based on the generated feature vectors, wherein each cell matrix in the series of cell matrices is associated with one or more relationships; and enhancing the search space at least in part by transforming the series of element matrices into a structured matrix such that the transformation preserves one or more relationships associated with each element matrix in the series of element matrices. Additional actions may include receiving a search object; searching the enhanced search space based on the received search object; and providing one or more links to one or more content items responsive to the search object. The plurality of content items may comprise high dimensional data. The high dimensional data may be selected from the group consisting of text, images, videos, content advertisements, and map data.

Other versions include corresponding systems, apparatus, and computer programs encoded on computer storage devices configured to perform the actions of the methods.

These and other versions may each optionally include one or more of the following features. For example, in some embodiments, the relationship associated with the element matrix may include orthogonality. Alternatively, or in addition, the relationship associated with the element matrix may comprise a euclidean distance.

In some aspects, converting the series of element matrices into a structured matrix may include generating a kronecker projection based at least in part on applying a kronecker product to the series of element matrices. A series of element matrices are randomly generated based at least in part on the euclidean distance of a particular snapshot of the feature vector search space. Converting the series of cell matrices into a structured matrix is implemented with a storage complexity of d-dimensional data, o (logd), such that the conversion preserves one or more relationships associated with each cell matrix in the series of cell matrices.

In some embodiments, the method may include extracting one or more features associated with the search object; generating a search object vector representing features of a search object; comparing the search object vector to an enhanced search space comprising a structured matrix; and an act of identifying one or more content items that satisfy the predetermined relationship based on the comparison.

It is to be understood that these aspects can be implemented in any convenient form. For example, the aspects and embodiments may be implemented by a suitable computer program carried on a suitable carrier medium which may be a tangible carrier medium (e.g. a diskette) or an intangible carrier medium (e.g. a communications signal). These aspects may also be implemented using suitable apparatus which may take the form of a programmable computer running a computer program arranged to implement the invention.

The details of one or more implementations are set forth in the accompanying drawings and the description below. Other features and advantages will be apparent from the description and drawings, and from the claims.

Drawings

Fig. 1 is a block diagram of an example system that can be used to efficiently perform linear projection in accordance with at least one aspect of the present disclosure.

Fig. 2 is a flow diagram of an example process that may be used to efficiently perform linear projection in accordance with at least one aspect of the present disclosure.

Fig. 3 is a block diagram of another example system that may be used to efficiently perform linear projection in accordance with at least one aspect of the present disclosure.

Fig. 4 is a flow diagram of an example process of performing a search query on an enhanced search space in accordance with at least one aspect of the present disclosure.

In the drawings, like numbering represents like elements throughout.

Detailed Description

Fig. 1 is a block diagram of an example system 100 that can be used to efficiently perform linear projection in accordance with at least one aspect of the present disclosure. The system 100 may include: such as client 110, server 120, remote computer 130, and network 140.

In general, using the example system 100, a family of structured matrices can be used to efficiently perform orthogonal projection of high-dimensional data that may exist with respect to a variety of complex computer applications (such as, for example, computer vision applications). System 100 may enable the creation of a series of smaller orthogonal element matrices. Once a series of smaller orthogonal element matrices are acquired, aspects of the present disclosure may transform the series of smaller orthogonal element matrices into a structured matrix. In accordance with at least one aspect of the present disclosure, a structured matrix may be formed by using a kronecker product of a series of smaller orthogonal cell matrices. As a result of this transition, the present disclosure may achieve advantages over existing systems in both computational and spatial complexity. For example, the present disclosure implements the computational complexity o (dlogd) and spatial complexity o (logd) of d-dimensional data.

The reduction in memory and processing power required for storage, search, and retrieval of high-dimensional data stored in large databases enabled by the methods, systems, and non-transitory computer-readable media of the present disclosure is significant. These advantages provided by the present disclosure may enable searching and retrieval of high-dimensional data to be performed on mobile platforms that may lack the memory space necessary to facilitate efficient storage, searching, and retrieval of high-dimensional data. Accordingly, complex computer applications including, but not limited to, image searches, video searches, display of related content advertisements, and/or map data may be implemented using less memory footprint and less processing power, enabling storage, searching, and retrieval of high-dimensional data associated with such applications via mobile platforms (such as, for example, smartphones, tablet computers, and/or other thin client devices). The present invention thus solves the problems associated with how to efficiently perform searches on high dimensional data.

The client 110 of the system 100 may include at least a processor 111, a memory 112, and a database 115. Memory 112 may enable storage of computer program code for execution of one or more applications on client 110. For example, the application may include a browser 113. Using browser 113, client 110 may access one or more web-based applications via network 140. These network-based applications may include: such as map applications, video streaming applications, mobile payment systems, advertising services, etc. Browser 113 may be configured to receive input from a user of client 110 through one or more user interfaces associated with client 110. The received input may include: among other things, for example, a search query entered via a keypad (e.g., a physical keyboard, a graphical rendering keyboard generated via a capacitive touch user interface, etc.), a search query entered via a voice command, a gesture representing one or more executable commands, and so forth.

Alternatively, or in addition, the client 110 may utilize the processor 111 and memory 112 to store and execute one or more mobile applications 114 stored locally on the client 110. For example, the client 110 may include a content database 115 configured to store local content including, for example, text files, audio files, image files, video files, or a combination thereof. To retrieve such stored local content from the content database 115, one or more mobile applications 114 may provide functionality to facilitate, for example, local document searches, local audio file searches, local image file searches, local video searches, and the like. However, alternatively, or in addition, the mobile application 114 may also ensure that any such local search may also be performed remotely to one or

more content databases

129, 133 hosted by one or

more computers

120, 130 accessible via the network 140 to provide a consolidated list of search results, which may include search results from both the local content database and the remote content database. Likewise, mobile applications 114 may include other types of applications including, for example, handwriting recognition programs. Other types of mobile applications 114 may also fall within the scope of the disclosure provided herein.

In a manner similar to that described above with respect to browser 113, mobile application 114 may be configured to receive input from a user of client 110. Alternatively, or in addition, one or more mobile applications 114 can be configured to receive different inputs than the browser 113 based on particular functionality provided by the one or more mobile applications 114. For example, the handwriting recognition program may be configured to receive input in the form of handwritten text input via a user's motion performed using a stylus or a user's finger in conjunction with a capacitive touch user interface integrated into client 110 or externally coupled to client 110. Once such input is captured by aspects of the present invention, features associated with the handwritten text input may be searched to retrieve one or more text characters, text strings, etc., that may correspond to the handwritten input, in accordance with aspects of the present invention.

Client 110 may represent one or more client devices. These client devices may include, for example, mobile computing platforms and non-mobile computing platforms. Mobile computing platforms may include, for example, smart phones, tablet computers, laptop computers, or other thin client devices. Non-mobile computing platforms may include, for example, desktop computers, set-top box entertainment systems, and the like. The client 110 may be configured to communicate with the server 120 via the network 140 using one or more communication protocols.

Server 120 may represent one or more server computers. Server 120 may include at least a processor 121, a memory 122, and a content database 129. Memory 122 may comprise a suite of software tools that may be used to implement features of the subject matter described by this specification. These software tools may include, for example, a content recognition unit 123, a feature extraction unit 124, a feature vector generation unit 125, a cell matrix generation unit 126, and a structured matrix generation unit 127. The software tools described above may each include program instructions that, when executed by processor 121, may perform the exemplary functions described in this specification to create an enhanced search space that significantly reduces the memory footprint required to facilitate storage, search, and retrieval operations involving high-dimensional data. High-dimensional data may include data in many dimensions, such as, for example, hundreds of dimensions, thousands of dimensions, millions of dimensions, or even more dimensions.

The content recognition unit 123 may be configured to obtain content from one or more of a plurality of different sources. For example, the content recognition unit 123 may utilize web crawlers, web spiders, and the like that may traverse the network 140 to scan and identify content items stored in the database 133 of one or more remote computers 130. Once a content item is identified, the content identification unit may retrieve a copy of the content item, or a portion of the content item, from database 133 and store the copy of the content item in content database 129 of server 120. The content item may include a variety of different types of content that may be created using the client 110, the server 120, or the remote computer 130, including: for example, text data, audio data, image data, video data, or any combination thereof.

Alternatively, or in addition, the content recognition unit 123 may be configured to capture portions of content input by a user via one or more user interfaces of the client device 110. For example, the content recognition unit 123 may be configured to capture handwritten text input via a user's motion performed using a stylus or a user's finger in conjunction with a capacitive touch user interface integrated into the client 110 or externally coupled to the client 110. Alternatively, or in addition, the content identification unit 123 may be configured to receive one or more content items that may be uploaded via one or more remote computers. For example, the content recognition unit 123 may receive one or more content items that one or more users of the remote computer 130 may wish to add to a library of content items maintained by the database 129. Alternatively, or in addition, the content identification unit may be configured to retrieve content items previously stored in the database 129 of the server 120.

Content items retrieved from one or more of the above sources may be used to generate a library of content items stored in a database 129 accessible by one or more users of client 110, remote computer 130, or the like. For example, the server 120 may aggregate a large amount of location information, geographic information, image information, etc. over a period of time, which may be used to support a mapping application accessible to a user of the client 110 via the browser 112 or mobile application 114, or to a user of a similar application via the remote computer 130. Alternatively, or in addition, for example, the server 120 may aggregate a large number of video files over a period of time to support a video streaming service accessible by a user of the client 110 via the browser 112 or mobile application 114 or a user of a similar application via the remote computer 130. The content items retrieved by the server 120 may likewise be used to support other types of applications accessible by the client 110 or a user of the remote computer 130.

The content recognition unit 123 may periodically determine that a sufficient number of content items have been collected in order to initiate generation of the enhanced search space. The periodic determination may be based on, for example, expiration of a predetermined time period. Alternatively, or in addition, the periodic determination may be based on a predetermined amount of data being collected, e.g., after 100GB of data, 100TB of data, etc. are collected. Alternatively, or in addition, the periodic determination may be based on a determination that content has been collected from a predetermined number of content sources, e.g., content captured from a predetermined number of users subscribed to a service, content captured from a predetermined number of users actively using a service, content captured from a predetermined percentage of all known content sources, etc. Alternatively, or in addition, the content recognition unit 123 may trigger generation of the enhanced search space in response to receiving an instruction from one or more human users to generate the enhanced search space.

The feature extraction unit 124 may be configured to analyze the content acquired by the content recognition unit 123 in order to identify specific content-dependent features, or characteristics, that may be uniquely associated with each specific content item. The feature data may include, for example, colors, counters, curves, textures, pixels, etc., that may be associated with, for example, image content. Alternatively, or in addition, the characteristic data may include, for example, document keywords associated with, for example, textual content, word usage frequency, and the like. As the complexity of the content source increases, the number of features associated with the content may also increase. For example, a particular high definition image may be associated with at least one feature corresponding to each particular pixel in the image. The likelihood that a particular content item may be identified during the search and retrieval process based on features extracted from the content item may increase with the number of features extracted from the content item. The content features extracted by the feature extraction unit 124 may be stored in the storage unit 122 or the database 129 for subsequent use by the feature vector generation unit 125.

The feature vector generation unit 125 may be configured to acquire or receive the high-dimensional feature data extracted by the feature extraction unit 124. Upon receiving the extracted feature data, the feature vector generation unit 125 may generate a plurality of feature vectors that may be used to numerically represent each feature extracted from the acquired content. The values of a particular feature vector may be expressed in the form of a single row matrix. The set of feature vectors generated from the extracted features stored in the database 129 may thus create a searchable model of the high-dimensional data acquired by the content recognition unit 123. Similarity determinations may be made between any two or more feature vectors based on the calculation of euclidean distances between any two or more feature vectors. The smaller the euclidean distance, the greater the degree of similarity that may exist between the feature vectors.

In at least one aspect of the subject matter disclosed by this specification, at any particular point in time, for a particular snapshot of a particular feature vector search space, there may be a euclidean distance between the various feature vectors present in the feature vector search space. For example, a particular snapshot of the feature vector search space may be captured at any particular point in time after a predetermined number of feature vectors have been generated by the feature vector generation unit 125. In some cases, prior to learning or optimizing the feature vector search space, there may be an original feature vector search space comprising a plurality of feature vectors respectively separated by original euclidean distances. Alternatively, or in addition, the original euclidean distance may be, for example, the euclidean distance that existed between the respective feature vectors at the time the snapshot of the feature vector search space was captured.

The cell matrix generating unit 126 may be configured to acquire or receive a plurality of high-dimensional feature vectors generated by the feature vector generating unit 125. The cell matrix generation unit 126 may then organize the acquired feature vectors into a series of M cell matrices. Each cell matrix in the series of M cell matrices may be smaller than the structured matrix described below. For example, in at least one aspect of the subject matter disclosed by this specification, the size of each cell matrix may be 2x 2. Alternatively, or in addition, each cell matrix in the series of M cell matrices may be orthogonal. The cell matrix generation unit 126 may generate the series of M cell matrices by, for example, generating smaller random gaussian matrices and then performing QR factorization. The random generation of the series of M-element matrices may be based at least in part on the original euclidean distance of a particular snapshot of the feature vector search space. For example, each element matrix in the series of M element matrices may be randomly generated so as to preserve the original euclidean distance of the original feature vector search space. Alternatively, the series of M cell matrices may be configured to train the cell matrices with a machine learning system to return image-specific results, for example, when a particular image is presented.

The structured matrix generation unit 127 may be configured to acquire, or receive, a series of M element matrices generated by the element matrix generation unit 126. The structured matrix generation unit 127 may be configured to convert the series of M cell matrices into a structured matrix. The structured matrix may be larger in size than each matrix in the series of M smaller element matrices. The transition of the series of M-cell matrices may occur in a manner that preserves the relationships associated with each cell matrix of the series of M-cell matrices. The retained relationship may include, for example, orthogonality or euclidean distance. In at least one aspect of the present disclosure, the converting may include generating the linear projection by using a kronecker product of a series of M-element matrices. With respect to at least one aspect of the subject matter disclosed by the present specification, the kronecker product of the series of M element matrices can be achieved by using a process that includes, for example, a fast fourier transform or fast fourier transform-like computation. The generation of a linear projection using the structured matrix generation unit 127 to convert the series of M smaller element matrices into a larger structured matrix may result in a significant reduction in computational and spatial costs compared to the projection of an unstructured matrix. For example, the linear projection generated using the structured matrix generation unit 127 can be implemented for d-dimensional dataComputing speed O (d) with unstructured matrix²) And spatial complexity O (d)²) The comparative computation speed o (dlogd) and the spatial complexity o (logd).

The output of the structured matrix generation unit 127 may produce an enhanced search space. The enhanced search space may be stored in enhanced search space storage area 128. Although the search space has been enhanced to move the spatial complexity of the unstructured matrix from O (d)²) Reduced to o (logd), but the larger structured matrix may still provide a representation of the feature vector space for a particular set of content items, including substantially all of the generated feature vectors. Thus, the accuracy and precision of the search is not compromised with the various aspects of the present disclosure described herein.

The remote computer 130 may represent one or more remote computers. Each remote computer 130 may include at least a processor 131, a memory 132, and a content database 133. The remote computer 130 may be configured to make one or more content items discoverable by a software tool capable of identifying and obtaining web content, such as, for example, the content identification unit 123. One or more users of some remote computers 130 may also search for and access content items stored in the content database 129. Remote computer 130 may be configured to communicate with server 120 via network 140.

Network 140 may be configured to facilitate connectivity between clients 110, server 120, and/or remote computers 130. Client 110, server 120, and/or remote computer 130 may be connected to network 140 via one or more wired or

wireless communication links

142a, 142b, and/or 142c, respectively. Network 140 may include any combination of one or more types of public and/or private networks including, but not limited to, a Local Area Network (LAN), a Wide Area Network (WAN), the internet, a cellular data network, or any combination thereof.

Fig. 2 is a flow diagram of an example process 200 that may be used to efficiently perform linear projection in accordance with at least one aspect of the present disclosure.

The process 200 begins at 210: the scanning of the content is initiated by means of the content recognition unit 123 via one or more content items from one or more content sources local to the server 120 or remote from the server 120. Content scanning may be performed by, for example, web crawlers, web spiders, and the like. Alternatively, or in addition, the content recognition unit 123 may receive one or more content items from one or more remote computers 130 or one or more client computers 110. Once the content is identified, the content identification unit may sample the identified content and store at least a portion of the identified content in the content database 129, store at least a portion of the identified content in another portion of the main memory 122, or transmit at least a portion of the identified content to the feature extraction unit 124.

The process 200 may continue at 220 where at 220 the feature extraction unit 124 may access one or more portions of the content identified by the content identification unit 123. Feature extraction unit 124 may extract one or more features and/or characteristics associated with the acquired content. The extracted features may be stored in the content database 129, in another portion of the main memory 122, or transmitted to the feature vector generation unit 125.

The process 200 may continue at 230, where at 230 the feature vector generation unit 125 may generate one or more feature vectors based on the content features extracted by the feature extraction unit 124. The feature vectors may be used to generate searchable data models of the high-dimensional data. The searchable model may facilitate similarity determination based on a comparison of two or more feature vectors. Such a comparison may be based on an evaluation of the euclidean distances that exist between two or more feature vectors. The smaller the distance that exists between any given pair of feature vectors, the greater the similarity that exists between the feature vectors. The generated feature vectors may be stored in the content database 129, stored in another portion of the main memory 122, or transmitted to the cell matrix generation unit 126.

The process 200 may continue at 240, and at 240, the cell matrix generation unit 126 may generate a series of M cell matrices based on the set of the plurality of high-dimensional feature vectors generated by the feature vector generation unit 125. Each matrix in the series of M element matrices may be orthogonal. The series of M-element matrices are randomly or pseudo-randomly generated based at least in part on the original euclidean distance of a particular snapshot of the feature vector search space. Alternatively, or in addition, the series of M-cell matrices may be trained by using one or more machine language learning systems (such as those set forth herein below). The generated series of M-cell matrices may be stored in the content database 129, stored in another portion of the main memory 122, or transmitted to the structured matrix generation unit 117.

The process 200 may continue at 250 where the structured matrix generation unit 127 may be configured to create a larger structured matrix based on a series of M smaller cell matrices at 250. A larger structured matrix may be created by converting/rotating the series of M-cell matrices into a larger structured matrix. The transition may be performed such that the transition retains a relationship associated with each cell matrix in the series of M cell matrices. In at least one aspect of the subject matter disclosed by the present specification, the converting can include generating the linear projection by using a kronecker product of the series of M-element matrices. A larger structured matrix may result in an enhanced search space. The spatial complexity of the larger structured matrix may be similar to the spatial complexity o (logd) of the d-dimensional data. At 260, the enhanced search space may be stored in main memory in enhanced search space storage area 128, in content database 129, and so on.

Although the present disclosure is described herein with respect to kronecker products, it is noted that the present disclosure need not be so limited. Also, as described herein, other matrix transformation or rotation methods may be utilized to facilitate the transformation of a series of smaller orthogonal cell matrices into a larger structured matrix. For example, any efficient transformation of a cell matrix that preserves one or more relationships associated with a series of cell matrices may be utilized in order to generate a larger structured matrix from a series of smaller orthogonal cell matrices in accordance with the present disclosure. Examples of such relationships that may be retained in the generated structured matrix include, among others: such as orthogonality, euclidean distance, etc., among others.

Fig. 3 is a block diagram of an example system 300 that can be used to efficiently perform linear projection in accordance with at least another aspect of the present disclosure. System 300 may include, for example, a client 310, a server 320, a remote computer 330, and a network 340.

Client 310 may include one or more client devices that may each be substantially similar to client 110. Client 310 may include at least processor 311, main memory 312, and content database 319. However, the client 310 may further include a content recognition unit 313, a feature extraction unit 314, a feature vector generation unit 315, a unit matrix generation unit 316, a structured matrix generation unit 317, and an enhanced search space storage area 318. Each of the content identification unit 313, feature extraction unit 314, feature vector generation unit 315, cell matrix generation unit 316, structured matrix generation unit 317, and enhanced search space storage area 318 are substantially the same as the content identification unit 123, feature extraction unit 124, feature vector generation unit 125, cell matrix generation unit 126, structured matrix generation unit 127, and enhanced search space storage area 128 of the system 100 of fig. 1. However, in the system 300 of fig. 3, the content identification unit 313, the feature extraction unit 314, the feature vector generation unit 315, the unit matrix generation unit 316, the structured matrix generation unit 317, and the enhanced search space storage area 318 may all be implemented on the client 310 instead of or in addition to the server 320. Thus, the efficiencies provided by the subject matter of this specification may facilitate searching and retrieving high-dimensional data on client devices (such as, for example, client 310).

Thus, the features of the subject matter described by this specification can be applied to various aspects of one or more mobile applications 114 that may be running on a client 310, such as, for example, generating an enhanced search space to support local storage, searching, and retrieval of text files, audio files, image files, video files, or a combination thereof. Features of the present disclosure may also be applied to generating an enhanced search space in order to improve storage, search, and retrieval operations associated with other types of mobile applications (such as, for example, handwriting recognition applications), search and display of content advertisements, and the like.

With the above features, the present disclosure may provide significant advantages when utilizing methods (such as, among others, binary embedding or cartesian k-means, for example) for search and retrieval techniques including: for example, Approximate Nearest Neighbor (ANN) search methods. The present disclosure thus solves the complex search problem with better accuracy while requiring significantly less time and memory.

Fig. 4 is a flow diagram of an example process 400 for performing a search query on an enhanced search space in accordance with at least one aspect of the present disclosure.

The process 400 may begin at 410: a computer, such as server 120 or client 310, receives the search object. The search object may include a query that includes one or more keywords, images, video clips, handwritten strokes entered via a stylus or a user's finger, addresses, and/or other data that may be associated with a content item maintained by

content database

129 or 319. Upon receiving the search object, server 120 or client 310 may analyze the search object to extract one or more features, or characteristics, associated with the search object received in 420.

Process 400 may continue at 430 with generating one or more search object feature vectors associated with the search object at 430 based on the search object features extracted at 420. In 440, the server 120 or client 310 may process the search object feature vectors against the previously generated enhanced search space saved in the enhanced search

space storage area

128 or 318. This may include, for example, analyzing the search object feature vectors in view of a linear projection of the structured matrix to identify a subset of the high-dimensional feature vectors that provide a nearest neighbor match for the search object feature vectors. Alternatively, or in addition, stage 440 may include identifying a plurality of matches representing a subset of feature vectors that fall within a predetermined threshold distance of the search object feature vectors. In at least one aspect of the present disclosure, a distance between the search object vector and the feature vector linearly projected via the structured matrix in the enhanced search space may be a euclidean distance. Finally, in 460, the process may retrieve one or more content items associated with feature vectors identified in the enhanced search space that sufficiently match the search object feature vectors. Alternatively, or in addition, one or more links referencing the retrieved content items may be provided to the computer that submitted the search object.

As described above, at least one stage in a large scale search and retrieval approach for data associated with complex computer applications may utilize linear projections, at least in part. For a given vector, a vector such as, for example,

and a projection matrix

The linear projection may be as follows in equation (1)

Shown in (a):

h(x)＝Rx (1)

such linear projection may be followed by, among other things, quantization to convert the high-dimensional features into compact codes that utilize less memory, such as, for example, binary embedding or product codes. The compact code may be binary code or non-binary code. Such compact code may be used to efficiently perform search execution times and reduce storage requirements associated with various complex computer applications, such as, for example, image retrieval, feature matching, attribute recognition, and object classification, among others. For example, a Local Sensitive Hashing (LSH) technique for large-scale approximate nearest neighbor searching may be used to linearly project the input data before converting it into compact codes. For example, a k-bit binary code may satisfy the following equation (2):

h(x)＝sign(Rx) (2)

however, as the input dimension d increases, such linear projection operations become computationally expensive. To achieve a higher recall in the search task, long codes with large k may be used so that k is o (d). In this case, the spatial and computational complexity of the projection is O (d)²) And such high projection costs often become a bottleneck in learning and predicting time. For example, when k-d-50,000, the projection matrix itself may require 10GB (single precision) and projecting one vector on a single code may require 800 ms.

In at least one aspect of the present disclosure, the projection matrices may be orthogonal. Orthogonal transitions may be beneficial because, among other things, it preserves the euclidean distance between points, and it is also known that such orthogonal transitions can distribute variances more evenly in dimension. These attributes are important to make many well-known techniques perform well on real-world data.

The use of orthogonal projections may provide additional benefits for certain applications. For example, orthogonality may be one means to meet the goal of maximally learning the uncorrelated bits when learning data-dependent binary codes. One way to achieve this is by imposing an orthogonal, or near orthogonal, constraint on the projections. Also, in binary embedding, imposing an orthogonality constraint on the projections may achieve improved results when performing a near-nearest-neighbor search.

To efficiently perform linear projection of high-dimensional data, the present disclosure provides methods, systems, and non-transitory computer-readable media for transforming a series of smaller element matrices into a structured matrix in a manner that preserves the relationships associated with the original element matrices as the relationships that existed prior to the matrix transformation. The structured matrix resulting from the transformation may be, for example, a large single matrix. Alternatively, however, the structured matrix may conceptually represent a family of flexible orthogonal structured matrices. In at least one aspect of the present disclosure, the retained relationship may be an orthogonality associated with each matrix in the series of element matrices. Alternatively, or in addition, the retained relationship may be a distance between corresponding feature vectors associated with the matrix. The distance may be, for example, the euclidean distance between corresponding eigenvectors associated with the matrix. The transformation of the cell matrix can be achieved by using the kronecker product of a smaller cell matrix, so that the space and computational complexity is substantially reduced. The flexibility associated with the transitions may facilitate a change in the number of free parameters in the matrix to suit the needs of a given application.

Thus, at least one aspect of the present disclosure may construct a family of orthogonal structured matrices by transforming a series of small orthogonal matrices to compose a large structured matrix. At least one aspect of the present disclosure facilitates the above-described transition by using the kronecker product of a series of small orthogonal element matrices. The kronecker projection matrix may satisfy equation (3):

in equation (3), A_jJ is 1, …, and M is a small orthogonal matrix. Small orthogonal matrix a_jJ-1, …, M may be referred to as a cell matrix. The large matrix generated from the above-described transitions can be associated with at least four major advantages. First, the large matrix satisfies the orthogonality constraint, and thus the large matrix can preserve euclidean distances in the original space. Second, a fast fourier transform-like computation can be used to compute the projections with a temporal complexity of o (dlogd). Third, by varying the size of the cell matrix, the resulting large matrix can be associated with a variation in the number of parameters (degree of freedom), thus making it easier to control the performance versus speed tradeoff. Fourth, the spatial complexity of the large matrix is O (logd), while the spatial complexity of most other structured matrices is O (d). In addition, the proposed kronecker projection offers advantages in approximating nearest neighbor searches in a variety of different settings, including, for example, binary embedding and vector quantization.

The binary embedding method canTo map the original vector into a k-bit binary vector such that h (x) e { +1, -1}^k. Such a mapping may utilize the representation of the data points as binary code, thereby significantly reducing storage costs, even when k ═ o (d). The approximate nearest neighbor may be retrieved by using a hamming distance in binary code space that may be efficiently calculated in various ways including, for example, using table lookups, or the POPCNT instruction in modern computer architectures.

LSH can be used to generate binary codes in a manner that preserves cosine distances and typically generates binary codes using randomized projections. However, using such randomized projection may forego the advantages of learning data-dependent binary code by optimizing the projection matrix R. For example, methods using iterative quantization (ITQ) show that by using PCA projection followed by learned orthogonal projection, the resulting binary embedding may outperform either non-orthogonal projection or randomized orthogonal projection. The projections can be learned by alternating between the projected data points and solving the projections via SVD. However, for high dimensional features, this approach may not be feasible unless the dimensions are reduced fundamentally, which is a performance penalty. Various aspects of the present disclosure utilizing projections that facilitate learning with the kronecker product may yield performance capabilities similar to ITQ while being generally more efficient.

The quantization method may represent data points via a set of quantization factors, which may typically be obtained by a vector quantization algorithm, such as, for example, a k-means algorithm. To search for the nearest neighbor of a given query vector q, the euclidean distance between q and all data points in the database may be calculated. The euclidean distance can be estimated by the distance of the vector to the quantization factor. Alternatively, or in addition, when the data is high dimensional, the quantization may be performed independently in the subspace. The common subspace may be identified by chunking vectors that may result in Product Quantization (PQ).

The distance between the query vector q and the data point x may be stated with respect to equation (4):

in equation (4), where m is the total number of subspaces, x⁽ⁱ⁾And q is⁽ⁱ⁾Is a subvector and mu_i(x⁽ⁱ⁾) Is a quantization function on the subspace i. Because of its asymmetric nature, only the data points are quantized, and the query vector cannot be quantized. To improve performance, different subspaces have similar variances for a given data. One way to achieve this effect is by applying orthogonal transitions R to the data, as set forth in equation (5):

since the projection matrix R is orthogonal, the projection matrix R may preserve euclidean distances. Instead of using a random projection matrix, the projection matrix can be learned from given data, which leads to improved retrieval results. However, methods existing prior to the present disclosure that facilitate projection operations may be associated with higher source costs (e.g., processor usage, memory usage, execution time, etc.) in high dimensional spaces.

Thus, for at least the foregoing reasons, there is a need for fast projections that are both orthogonal and efficient for learning. As described below, these objectives can be achieved by transforming a series of cell matrices into a large structured matrix using a transformation algorithm that preserves the relationships associated with each respective cell matrix. The transformation algorithm may include, among other things, generating projections using a kronecker product, for example.

The kronecker product may be associated with several attributes that facilitate the above-described transitions. For example, suppose

And is

A₁And A₂The kronecker product of

Satisfies equation (6):

in equation (6), wherein a₁(i, j) is A₁Row i and column j. The kronecker product may also be referred to as a tensor product or a direct product. The operation mat (x, a, b) may transform the d-dimensional vector into an a × b matrix (ab ═ d), and vec (·) may form a vector by "stretching" the matrix into vectors, and vec (mat (x, a, b)) ═ x.

The kronecker product may be associated with a number of characteristics that facilitate the advantages recited herein. For example, at least a subset of these characteristics helps generate fast orthogonal projections while preserving the relationships associated with the original element matrices. Two specific characteristics of the kronecker product that contribute to the above advantages include

And the fact that the kronecker product preserves the orthogonality of the matrix of cells. I.e., if A₁And A₂Are all orthogonal, then

And are also orthogonal.

According to at least one aspect of the present disclosure, a kronecker projection matrix

The kronecker product, which may include a plurality of cell matrices, is set forth below in equation (7)The following steps:

wherein, in the equation (7),

and is

One benefit of forming a larger matrix in this manner is that the computation of the kronecker projection can take advantage of reduced computational complexity. To simplify the discussion, we assume that R is square, i.e., k ═ d, and that all cell matrices are also square, having the same order d_e. Floating point operations (FLOP) accurately estimate the computational cost of different methods. Let us assume that the FloP computes the Kronek projection onto the d-dimensional vector as f (d, d)_e) The order of the cell matrix is d_e. One attribute of the kronecker product is shown below in equation (8).

Execute

Requires d (2 d)_e-1) FLOP (dd)_eSub multiplication sum dd_e-d additions). Thereafter calculating

Is changed into d_eOf each feature dimension d/d_eAnd step d_eThe kronecker projection is calculated as reflected in equation (9).

f(d，d_e)＝d(2d_e-1)+d_ef(d/d_e，d_e). (9)

FLOP to perform a Crohn's projection of a d-dimensional vector based on equation (9)Is that

Another attractive property of the kronecker projection that helps to facilitate the advantages described herein is its structural flexibility. For example, by controlling A_jThe size of j-1, M, can easily balance the number of parameters (and therefore the capabilities) and the computational cost of the model. Exist of

A matrix of cells, each cell matrix having

And (4) a parameter. The number of parameters in the kronecker projection may be

Range of d²(when d is_eD) to 4log₂d (when d)_eWhen 2).

In the above, the present disclosure has been described with reference to examples in which the kronecker projection R and all the cell matrices are square. However, the present disclosure need not be so limited. Rather, the present disclosure may also extend to non-square kronecker projections and/or non-square cell matrices, for example. The size of the cell matrix is selected, for example, by factoring d and k. Alternatively, or in addition, it may happen that d or k cannot be factored into the product of smaller numbers. For example, the dimensionality may be altered by subsampling or zero padding with respect to the input features. Separately, for example, a longer code may be used than the output, and then subsampled. The generation of the kronecker projection in the context of a square projection matrix R and a non-square projection matrix will be discussed further below.

The kronecker projections may also be randomly generated in a similar manner to unstructured projections, circular projections, and/or bilinear projections, etc. However, the randomly generated kronecker projection improves the projection list described above because it is suitable for high dimensional data.

Randomly generated kronecker projections can be applied for binary embedding and quantization. These applications of kronecker projection can be achieved by replacing the unstructured projection matrix (R in equations (1) and (5)) with a randomized kronecker projection matrix.

For example, with respect to the kronecker projection, the methods, systems, and computer programs described herein may all generate a matrix of M (small) orthogonal elements. In at least one aspect of the present disclosure, the cell matrix may be generated by creating a small random gaussian matrix and then performing QR factorization. For a cell matrix of size 2x2, for example, the temporal complexity of generating a randomized d-order kronecker projection is only o (logd). This is a significant benefit because, for example, the temporal complexity of the unstructured projection that generates the orthogonal matrix of order d is O (d)³). Thus, the randomized kronecker projection provides a practical solution for generating randomized projections of high-dimensional data.

In accordance with another aspect of the present disclosure, a system and method for optimizing kronecker projection parameters is disclosed. As explained in more detail below, the optimization algorithm will be discussed with respect to binary embedding, quantization, which will then be shown to both formulate to solve the orthogonal purchasse states problem for each matrix of cells. For purposes of this discussion, we assume that the training data X ═ X₁,x₂,...,x_N]∈R^d×N. The following discussion addresses the assumption of k ≠ d alone first, and then discusses the case extending to k ≠ d.

First, the problem of minimizing the binarization loss of binary embedding is solved. The optimization problem can be expressed as shown in equation (10).

In equation (10), where the binary matrix B ═ B₁,b₂,...,b_N]∈{-1,1}^d×NAnd b is_iIs x_iBinary code of, i.e. b_i＝sign(Rx_i). In addition, a kronecker structure is used for R. A local solution of equation (3) can be found by alternating minimization. In the case where R is fixed, B is calculated by direct binarization based on the definition. Alternatively, B is fixed and k ═ d (we will discuss k below)<Case of d), R is found by the orthogonal purrochans-tese problem set forth in equation (11).

Next, for quantization, we consider the cartesian k-means (ck-means) method. For ck-mean, the input sample x is divided into m subspaces, x ═ x⁽¹⁾；x⁽²⁾；...；x^(m)]And each subspace may be quantized to h sub-centers. According to the example discussed below, consider the case where all sub-centers have the same fixed base. However, the present disclosure need not be so limited. For example, the present disclosure may also be applied in a similar manner to a set of sub-centers with variable cardinality.

Let p be [ p ]⁽¹⁾；p⁽²⁾；...；p^(m)]Wherein p is^(j)∈{0，1}^h，||p^(j)1 | | |1 ═ 1. In other words, p^(j)Is the ion center x^(j)The most recent indicator. Suppose that

Is the jth sub-center matrix and C ∈ R^d×mhIs a center matrix formed by the (diagonal) concatenation of all sub-centers, as set forth in equation (12):

in ck-means by orthogonal matrices

And block diagonal matrix

To parameterize the center matrix C. The optimization problem for ck means can be written as equation (13):

a similar alternating process is used to apply the kronecker structure to the orthogonal matrix R. In case R is fixed, updating D and P may correspond to vector quantization with k means in each subspace. The above approach is efficient because the number of centers is typically small since the number of clusters per subspace is always set to a small number (e.g., h ═ 256). Updating R with fixed D and P can be considered as an orthogonal purocites-Tech problem, as reflected in equation (1).

Thus, the orthogonal Prockets Tech problem may arise with respect to the approach discussed above. For various aspects of the present disclosure that utilize the kronecker product and/or projection, the problem may be referred to as a kronecker stage. The kronecker-stevens will be shown below with respect to equation (15):

to address the above optimization, each cell matrix may be updated using an iterative approachAnd a local solution is found. Can be rewritten as in equation 16

To start the method.

The second equation holds because the kronecker product can preserve orthogonality. Thus, the next step may be to maximize

By using the properties of the trace, it can be expressed as equation (17):

in equation (17), where bi and xi are the ith columns of matrix B and matrix X, respectively. This problem can be solved by updating one cell matrix at a time while keeping all other cell matrices unchanged. Without loss of generality, consider updating Aj, as shown in equation (18):

in equation (18), wherein,

and is

Suppose A_pre、A_nextAnd A_jAre respectively k_pre×d_pre、k_next×d_nextAnd k_j×d_j. And, d_pred_jd_nextD and k_prek_jk_next＝k。

According to the characteristics associated with the kronecker product, the objective function Aj in equation (18) can satisfy:

suppose that

And is

Then, equation (19) can be expressed as:

when updating a cell matrix, the computational cost may come from three different sources. The first source, referred to herein as S1, is from computing the kronecker product of data using a fixed matrix of cells. The second source, referred to herein as S2, is from calculating the product of the projection data and the code. A third source, referred to herein as S3, comes from performing SVD to get an optimal cell matrix. For the case where the cell matrix may be large, the optimization bottleneck may be SVD. Alternatively, SVD is performed within an approximately constant time for the case where the cell matrix may be small (such as, for example, 2 × 2). Therefore, the main computational cost comes from S1(o (ndlogd)) and S2(o (nd)). Since there are in total

A matrix of cells, so the overall computational complexity of the optimization is O (Ndlog)²d)。

In the above optimization, a randomized kronecker projection is used as initialization. For binary embedding and quantization, the target may be rapidly reduced based on the proposed procedure. A satisfactory solution can be found within a few tens of iterations.

The above method is discussed based on the case when k ═ d. However, various aspects of the present disclosure may be used in the case when k ≠ d. For the case where k ≠ d, the projection matrix R may be formed by the Kronecker product of a non-square row/column orthogonal cell matrix. In this case, the kronecker product can preserve row/column orthogonality. For example, when k>d, the orthogonal Prockets Tech optimization problem can be solved similar to when k ═ d. Alternatively, for example, when k<d is, R^TR is not equal to I. Therefore, the second equation in equation (16) does not hold. Therefore, the temperature of the molten metal is controlled,

the following steps are changed:

can be determined by assuming tr (X)^TR^TRX) is independent on R, as in the case of k ≧ d, to mitigate the problem.

Various implementations of the systems and techniques described here can be realized in digital electronic circuitry, integrated circuitry, application specific ASICs (application specific integrated circuits), computer hardware, firmware, software, and/or combinations thereof. These various implementations can include implementation in one or more computer programs, storage systems, at least one input device, and at least one output device that are executable and/or interpretable on a programmable system including at least one programmable processor, which may be special or general purpose, coupled to receive data and instructions from, and to transmit data and instructions to, the programmable processor.

These computer programs (also known as programs, software applications, or code) include machine instructions for a programmable processor, and may be implemented using high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. As used herein, the terms "machine-readable medium" and "computer-readable medium" refer to any computer program product, apparatus, and/or device (e.g., magnetic discs, optical disks, memory, Programmable Logic Devices (PLDs)) used to provide machine instructions and/or data to a programmable processor, including a machine-readable medium that receives machine instructions as a machine-readable signal. The term "machine-readable signal" refers to any signal used to provide machine instructions and/or data to a programmable processor.

To provide for interaction with a user, the systems and techniques described here can be implemented on a computer having: a display device (e.g., a CRT (cathode ray tube) or LCD (liquid crystal display) monitor) for displaying information to a user; and a keyboard and an orientation device (e.g., a mouse or a trackball) by which a user may provide input to the computer. Other kinds of devices may also be used to provide for interaction with a user; for example, feedback provided to the user can be any form of sensory feedback (e.g., visual feedback, auditory feedback, or tactile feedback); and input from the user may be received in any form, including acoustic, speech, or tactile input.

The systems and techniques described here can be implemented in a computing system that includes a back-end component (e.g., as a data server), or that includes a middleware component (e.g., an application server), or that includes a front-end component (e.g., a user computer having a graphical user interface or a web browser through which a user can interact with an implementation of the systems and techniques described here), or any combination of such back-end, middleware, or front-end components. The components of the system can be interconnected by any form or medium of digital data communication (e.g., a communication network). Examples of communication networks include: a local area network ("LAN"), a wide area network ("WAN"), and the Internet.

The computer system may include clients and servers. A client and server are generally remote from each other and typically interact through a communication network. The relationship of client and server arises by virtue of computer programs running on the respective computers and having a client-server relationship to each other.

Although the present disclosure includes some details, these should not be construed as limitations on the scope of the disclosure or of what may be claimed, but rather as descriptions of features of example embodiments of the disclosure. Certain features that are described in this specification in the context of separate embodiments can also be implemented in combination in a single embodiment. Conversely, various features that are described in the context of a single embodiment can also be provided in multiple embodiments separately or in any suitable subcombination. Furthermore, although features may be described as acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Also, while operations are shown in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system components in the implementations described above should not be understood as requiring such separation in all implementations, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Specific embodiments of the present subject matter have been described. Other embodiments are within the scope of the following claims. For example, the actions recited in the claims can be performed in a different order and still achieve desirable results. Other steps may be provided, or steps may be eliminated, from the described flows, and other components may be added to, or removed from, the described systems. In some cases, multitasking and parallel processing may be advantageous.

Claims

1. A computer-implemented method, the method comprising:

obtaining a plurality of content items;

extracting a plurality of features from each of the plurality of content items;

generating a feature vector for each extracted feature to create a search space;

generating a series of cell matrices based on the generated feature vectors, wherein each cell matrix in the series of cell matrices is associated with one or more relationships;

enhancing the search space at least in part by transforming the series of element matrices into a structured matrix such that the transformation preserves the one or more relationships associated with each element matrix in the series of element matrices;

receiving a search object;

searching the enhanced search space based on the received search object; and

providing one or more links to one or more content items responsive to the search object.

2. The method of claim 1, wherein the one or more relationships associated with the element matrix comprise orthogonality.

3. The method of claim 1, wherein the one or more relationships associated with the element matrix comprise euclidean distances.

4. The method of claim 1, wherein transforming the series of cellular matrices into the structured matrix further comprises:

generating a kronecker projection based at least in part on applying a kronecker product to a series of element matrices.

5. The method of claim 1, wherein searching the enhanced search space based on the received search object further comprises:

extracting one or more features associated with the search object;

generating a search object vector representing the features of the search object;

comparing the search object vector to an enhanced search space comprising the structured matrix; and

one or more content items satisfying a predetermined relationship are identified based on the comparison.

6. The method of claim 1, wherein the series of element matrices are randomly generated based at least in part on an original Euclidean distance of a particular snapshot of a feature vector search space.

7. The method of any of the above claims, wherein transforming the series of cell matrices into a structured matrix such that the transforming preserves the one or more relationships associated with each cell matrix in the series of cell matrices is achieved in a storage space complexity of d-dimensional data, o (logd).

8. A system for performing linear projection, the system comprising:

one or more computers and one or more storage devices storing instructions that, when executed by the one or more computers, are operable to cause the one or more computers to perform operations comprising:

obtaining a plurality of content items;

extracting a plurality of features from each of the plurality of content items;

receiving a search object;

searching the enhanced search space based on the received search object; and

9. The system of claim 8, wherein the one or more relationships associated with the element matrix comprise orthogonality.

10. The system of claim 8, wherein the one or more relationships associated with the element matrix comprise euclidean distances.

11. The system of claim 8, wherein transforming the series of element matrices into the structured matrix further comprises:

12. The system of claim 8, wherein searching the enhanced search space based on the received search object further comprises:

extracting one or more features associated with the search object;

13. The system of claim 8, wherein the series of element matrices are randomly generated based at least in part on an original Euclidean distance of a particular snapshot of a feature vector search space.

14. The system of any of claims 8 to 13, wherein transforming the series of cell matrices into a structured matrix such that the transforming preserves the one or more relationships associated with each cell matrix in the series of cell matrices is achieved in a storage space complexity of d-dimensional data, o (logd).

15. A non-transitory computer-readable storage medium storing instructions that, when executed by one or more computers, cause the one or more computers to perform the method of any one of claims 1-7.