CN107636639A - Quick rectangular projection - Google Patents
Quick rectangular projection Download PDFInfo
- Publication number
- CN107636639A CN107636639A CN201680028711.7A CN201680028711A CN107636639A CN 107636639 A CN107636639 A CN 107636639A CN 201680028711 A CN201680028711 A CN 201680028711A CN 107636639 A CN107636639 A CN 107636639A
- Authority
- CN
- China
- Prior art keywords
- matrix
- cell
- series
- search
- matrixs
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/53—Querying
- G06F16/532—Query formulation, e.g. graphical querying
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/50—Information retrieval; Database structures therefor; File system structures therefor of still image data
- G06F16/58—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
- G06F16/583—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
- G06F16/5854—Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using shape and object relationship
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F17/00—Digital computing or data processing equipment or methods, specially adapted for specific functions
- G06F17/10—Complex mathematical operations
- G06F17/16—Matrix or vector computation, e.g. matrix-matrix or matrix-vector multiplication, matrix factorization
Abstract
The present invention relates to method, system and the equipment for efficiently performing linear projection, including encode the computer program on computer-readable storage medium.In an aspect, method includes obtaining the action of multiple content items from one or more content sources.Other action includes, and multiple features are extracted from each content item in the multiple content item;Characteristic vector is generated to create search space for each feature extracted;A series of cell matrixs are generated based on the characteristic vector generated;A series of cell matrixs are changed into structured matrix causes the transformation to retain the one or more relations associated with each cell matrix in a series of cell matrixs;Receive object search;The search space strengthened is searched for based on received object search;One or more links, content item of one or more of links in response to the object search are provided.
Description
Background technology
Letter is searched for and retrieves on a large scale in various complicated computer processes (such as, for example, computer vision application)
Breath can utilize linear projection.This application can utilize rectangular projection to preserve the Euclidean distance between data point.
The generation of this rectangular projection usually requires to use unstructured matrix.However, the calculating for establishing unstructured orthogonal matrix is answered
Miscellaneous degree is O (d3), and room and time complexity is O (d2).It means that with input dimension d increases, it is unstructured orthogonal
The generation of matrix can become extremely expensive operation.
The content of the invention
According to a kind of embodiment of the theme described by this specification, efficiently performed using larger structured matrix
Linear projection is to realize relative to the cost savings for calculating time and memory space.A series of less positive presentates can be based on
Variable matrix generates larger structured matrix.For example, can be by using a series of Crow of less orthogonal cells matrixes
Interior gram of product forms larger structured matrix.In mathematics, byThe Kronecker product or tensor product of expression are to any
The operation for the larger matrix of generation that two matrixes of size are carried out.Kronecker product is the extensive of the apposition from vector to matrix,
And provide the matrix of tensor product relative to the basis selection of standard.
In certain aspects, the theme embodied in this manual can be presented as the action including obtaining multiple content items
Method.Other action can include:Multiple features are extracted from each content item in multiple content items;For each extraction
To feature generation characteristic vector to create search space;A series of cell matrixs are generated based on the characteristic vector of generation,
Wherein, each cell matrix in this series of cell matrix is associated with one or more relations;And lead at least in part
Cross by a series of cell matrixs be changed into structured matrix cause the transformation retain with a series of cell matrixs in each unit
The one or more relations of matrix correlation connection strengthen search space.Other action can include receiving object search;It is based on
The object search received searches for the search space strengthened;And offer is in response to one or more contents of object search
One or more links of item.Multiple content items can include high dimensional data.The high dimensional data can be selected from by text, image,
The group of video, contents advertising and map datum composition.
Other versions include being configured to perform the correspondence system of the action of this method, equipment and coding in computer storage
Computer program on device.
These and other version can alternatively include the feature below one or more respectively.For example, in some realities
Apply in mode, the relation associated with cell matrix can include orthogonality.Alternately, or in addition, with cell matrix phase
The relation of association can include Euclidean distance.
In certain aspects, generation Kronecker throwing can be included by a series of cell matrixs being changed into structured matrix
Shadow, Kronecker projection are based at least partially on to a series of cell matrix application Kronecker products.It is based at least partially on
The Euclidean distance of the particular snapshot of characteristic vector search space generates a series of cell matrixs at random.With d dimension datas
Storage complexity O (logd) come realize by a series of cell matrixs be changed into structured matrix cause transformation retain with a series of lists
One or more relations that each cell matrix in variable matrix is associated.
In some embodiments, this method can include extracting the one or more features associated with object search;
Generation represents the object search vector of the feature of object search;By object search vector and having strengthened including structured matrix
Search space compares;And the action for the one or more content items for meeting predetermined relationship is relatively identified based on this.
It is to be understood that these aspects can be implemented according to any convenient form.For example, appropriate computer can be passed through
Program come implement these aspect and embodiment, it can be visible carrier medium (example that the appropriate computer program, which is carried at,
Such as, disk) or the appropriate mounting medium of invisible mounting medium (for example, signal of communication) on.Can also be by using suitable
Equipment come implement these aspect, the suitable equipment can take operation in order to implement the present invention and set computer program
Programmable calculator form.
The details of one or more kinds of embodiments is set forth in following drawing and description.Further feature and advantage
It will be become apparent by specification, drawings and the claims.
Brief description of the drawings
Fig. 1 is the frame that can be used for efficiently performing the example system of linear projection according at least one aspect of the disclosure
Figure.
Fig. 2 is the stream that can be used for efficiently performing the instantiation procedure of linear projection according at least one aspect of the disclosure
Cheng Tu.
Fig. 3 is another example system that can be used for efficiently performing linear projection according at least one aspect of the disclosure
Block diagram.
Fig. 4 is the example mistake for performing search inquiry to the search space strengthened according at least one aspect of the disclosure
The flow chart of journey.
In the accompanying drawings, identical reference specifies identical element from beginning to end.
Embodiment
Fig. 1 is the example system 100 that can be used for efficiently performing linear projection according at least one aspect of the disclosure
Block diagram.System 100 can include:For example, client 110, server 120, remote computer 130 and network 140.
Generally, using example system 100, structured matrix race can be used to efficiently perform just trading for high dimensional data
Shadow, the high dimensional data can exist relative to various complicated computer applications (such as, for example, computer vision application).System
System 100 can realize a series of establishment of less orthogonal cells matrixes.Once get a series of less orthogonal cells squares
A series of less orthogonal cells matrixes can be changed into structured matrix by battle array, various aspects of the disclosure.According to the disclosure
At least one aspect, structuring square can be formed by using a series of Kronecker product of less orthogonal cells matrixes
Battle array.As the result of this transformation, the disclosure can be realized is better than existing system in terms of computation complexity and space complexity
The advantages of system.For example, the disclosure realizes the computation complexity O (dlogd) and space complexity O (logd) of d dimension datas.
By the realization of disclosed method, system and non-transitory computer-readable medium to being stored in large database concept
High dimensional data carry out storage, search and retrieval needed for the reduction of memory and the reduction of disposal ability be significant.By
These advantages that the disclosure provides can enable the search of high dimensional data and retrieval be performed on a mobile platform, and the movement is put down
Platform may lack the efficient storage for promoting high dimensional data, search and memory headroom necessary to retrieval.Therefore, it is possible to use
Less EMS memory occupation and smaller disposal ability are wide to implement including but not limited to picture search, video search, related content
Sophisticated computers application including the display of announcement, and/or map datum, so that via mobile platform (such as, for example, intelligent hand
Machine, tablet PC, and/or other thin-client devices) realize and apply the storage of associated high dimensional datas with these, search
Rope and retrieval.Therefore the present invention solves the problems, such as associated with how efficiently to perform search to high dimensional data.
The client 110 of system 100 can comprise at least processor 111, memory 112 and database 115.Memory
112 can realize the storage of the computer program code for performing one or more applications on client 110.For example,
Using browser 113 can be included.By using browser 113, client 110 can via network 140 come access one or
Multiple network applications.These network applications can include:For example, map application, video stream application, mobile branch
Pay system, advertising service etc..Browser 113 is configurable to by one or more users circle associated with client 110
Face receives the input of the user from client 110.The input received can include:Among other, for example, via small
The search inquiry of keyboard (for example, physical keyboard, via graphic rendition keyboard etc. caused by capacitance touch user interface) input,
Via the search inquiry of voice command input, the gesture for representing one or more executable commands etc..
Alternately, or in addition, client 110 can store and perform sheet using processor 111 and memory 112
The one or more Mobile solutions 114 of ground storage on client 110.For example, client 110 may be configured to storage originally
The content data base 115 of ground content, the local content include, for example, text, audio file, image file, video are literary
Part or its combination.In order to retrieve the local content of this storage, one or more Mobile solutions 114 from content data base 115
Promotion can be provided such as local document searching, local audio file search, the search of local image file, local video search
Function.Alternatively, however, or in addition, Mobile solution 114 it may also be ensured that pair can also be accessed via network 140
One or more computers 120,130 hosted by one or more content data bases 129,133 remotely perform it is any this
Local search, to provide the merging list of search result, the merging list of the search result can include coming from local content number
According to the search result in storehouse and remote content data storehouse.Equally, Mobile solution 114 can include other types of application, and this is other
The application of type includes, for example, handwriting recognition program.Other types of Mobile solution 114 can also fall into this specification offer
The scope of the present disclosure in.
By to described above with respect to browser 113 in the way of similar mode, Mobile solution 114 is configurable to connect
Receive the input of the user from client 110.Alternately, or in addition, one or more Mobile solutions 114 are configurable to
The input different from browser 113 is received based on the specific function provided by one or more Mobile solutions 114.For example, hand
Write recognizer be configurable to receive via user using the finger of stylus or user combine be integrated into client 110 or
Motion performed by person from coupled outside to the capacitance touch user interface of client 110 and the handwritten text form that inputs
Input.Once this input is captured by various aspects of the invention, then, according to various aspects of the invention, may search for
The feature associated with handwritten text input is to retrieve the one or more text characters that can correspond to handwriting input, text word
Symbol string etc..
Client 110 can represent one or more client terminal devices.These client terminal devices can include, for example, moving
Dynamic calculating platform and non-moving calculating platform.Mobile computing platform can include, for example, smart mobile phone, tablet PC, above-knee
Type computer or other thin-client devices.Non-moving calculating platform can include, for example, desktop computer, set top box
Entertainment systems etc..Client 110 is configurable to by using one or more communication protocols via network 140 and server
120 communications.
Server 120 can represent one or more server computers.Server 120 can comprise at least processor
121st, memory 122 and content data base 129.Memory 122 can include can be used for implementing the master described by this specification
A whole set of Software tool of the feature of topic.These Software tools can include, for example, content recognition unit 123, feature extraction list
Member 124, characteristic vector generation unit 125, cell matrix generation unit 126 and structured matrix generation unit 127.It is above-mentioned soft
Part instrument can include programmed instruction respectively, and the programmed instruction can perform in this manual when being performed by processor 121
The search space that the exemplary functions of description have been strengthened with creating, the search space strengthened substantially reduce promotion and are related to higher-dimension
EMS memory occupation needed for storage, search and the search operaqtion of data.High dimensional data can include the data of many dimensions, such as,
For example, hundreds of dimensions, thousands of dimensions, millions of individual dimensions or even more various dimensions.
Content recognition unit 123 is configurable to from multiple one or more of not homologous acquisition contents.For example, content
Recognition unit 123, which can utilize, to be stored in one or more remote computers 130 with traverses network 140 to scan and identify
The web crawlers of content item in database 133, Web Spider etc..Once recognizing content item, content recognition unit can be from
Database 133 obtains the duplicate of content item or a part for content item, and the duplicate of the content item is stored in into server
In 120 content data base 129.The content item can include it is various types of can by using client 110, service
The content that device 120 or remote computer 130 create, including:For example, text data, voice data, view data, video counts
According to or its any combinations.
Alternately, or in addition, content recognition unit 123 is configurable to capture user via client terminal device 110
One or more user interfaces input content part.For example, content recognition unit 123 be configurable to capture via with
Family using stylus either user finger combine be integrated into client 110 or from coupled outside to the electric capacity of client 110
The handwritten text that formula touches the motion performed by user interface and inputted.Alternately, or in addition, content recognition unit 123
It is configurable to receive the one or more content items that can be uploaded via one or more remote computers.For example, content is known
One or more users that other unit 123 can receive remote computer 130 wish to be added to the content preserved by database 129
Xiang Ku one or more content items.Alternately, or in addition, content recognition unit is configurable to be stored in before obtaining
Content item in the database 129 of server 120.
From one or more of above-mentioned source source obtain content item can be used for generation be stored in be available for client 110,
Content item storehouse in the database 129 that one or more users of the grade of remote computer 130 access.For example, server 120 can be with
It polymerize substantial amounts of positional information, geography information, image information etc. within certain time, these information can be used for supporting client
Either Mobile solution 114 is addressable or user of similar application is via remote computer via browser 112 by 110 user
130 addressable map applications.Alternately, or in addition, for example, server 120 can polymerize largely within certain time
Video file to support the user of client 110, via browser 112, either Mobile solution 114 is addressable or similar
The user of application is via 130 addressable video streaming services of remote computer.The content item obtained by server 120 equally can be with
For supporting the other types of application of the user-accessible of client 110 or remote computer 130.
Content recognition unit 123 can periodically determine to have collected sufficient amount of content item to start what is strengthened
The generation of search space.The regular determination can be based on, for example, predetermined amount of time expires.Alternately, or in addition, should
Periodically determining can be based on the data for collecting predetermined quantity, for example, after 100GB data, 100TB data etc. are collected.
Alternately, or in addition, the regular determination can collect content, example based on determination from the content source of predetermined quantity
Such as, the content that is captured from the user of the predetermined quantity of subscription service, caught from the user of the positive predetermined quantity using service
Content, the content of content source capture of predetermined ratio etc. from all contents known sources obtained.Alternately, it is or in addition, interior
Hold the instruction for the search space that recognition unit 123 can strengthen in response to receiving the generation from one or more human users
To trigger the generation of the search space of enhancing.
Feature extraction unit 124 is configurable to the content that is obtained by content recognition unit 123 of analysis to identify uniqueness
Ground can be associated with each specific content item certain content dependence-producing property or characteristic.Characteristic can include, example
Such as, can be with the color associated such as picture material, counter, curve, texture, pixel.Alternately, or in addition,
Characteristic can include, for example, with the document keyword associated such as content of text, word frequency of use.With interior
Rong Yuan complexity increase, the equally possible increase of the feature quantity associated with content.For example, specific high-definition image may be with
It is associated corresponding at least one feature of each specific pixel in image.Can be based on being searched from the feature that content item extracts
Identify this possibility of specific content item such as the feature quantity increase extracted from content item during rope and retrieving.
The content characteristic extracted by feature extraction unit 124 can be stored in memory cell 122 or database 129, for
Characteristic vector generation unit 125 subsequently uses.
Characteristic vector generation unit 125 is configurable to obtain or receives the height that is extracted by feature extraction unit 124
Dimensional feature data.Once receive the characteristic extracted, characteristic vector generation unit 125 can generate multiple features to
Amount, these characteristic vectors can be used for from numerically represent from the contents extraction of acquisition to each feature.Can be according to single file
The form of matrix expresses the value of particular feature vector.From the feature for the feature the extracted generation being stored in database 129
What therefore vector set can create the high dimensional data that is obtained by content recognition unit 123 can search for model.Based on any two
The calculating of Euclidean distance between individual or multiple characteristic vectors, it can be carried out between any two or more characteristic vectors
Similarity determines.Euclidean distance is smaller, and similarity that may be present is bigger between characteristic vector.
In at least one aspect by the theme of this disclosure, in any particular point in time, for special characteristic
The particular snapshot in vector search space, all it there may be between each characteristic vector present in this feature vector search space
Euclidean distance.For example, can be after the characteristic vector of predetermined quantity be generated via characteristic vector generation unit 125
The particular snapshot of any particular point in time capture characteristic vector search space.In some cases, learning or optimizing feature
Before vector search space, it is understood that there may be have original including multiple characteristic vectors for being spaced one from by original Euclidean distance
Characteristic vector search space.Alternately, or in addition, original Euclidean distance can be, for example, capture feature to
Measure search space snapshot when between each characteristic vector existing Euclidean distance.
Cell matrix generation unit 126 is configurable to obtain or receives what is generated by characteristic vector generation unit 125
Multiple high dimensional feature vectors.Then the characteristic vector of acquisition can be organized into a series of M units by cell matrix generation unit 126
Matrix.Each cell matrix in a series of this M cell matrix may be smaller than structured matrix described below.For example,
In at least one aspect by the theme of this disclosure, the size of each cell matrix can be 2x2.Alternately, or
Person is in addition, each cell matrix in a series of this M cell matrix can be orthogonal.Cell matrix generation unit 126 can
To generate this serial M cell matrix by, for example, generating less random Gaussian matrix and then performing QR factorization.
A series of random generation of this M cell matrix can be based at least partially on the original of the particular snapshot of characteristic vector search space
Euclidean distance.For example, each cell matrix in a series of this M cell matrix can be generated at random to retain original spy
Levy the original Euclidean distance in vector search space.Alternately, a series of this M cell matrix is configurable to utilize machine
Learning system carrys out training unit matrix, to return to specific image result for example when specific image is presented.
Structured matrix generation unit 127 is configurable to obtain or received and generated by cell matrix generation unit 126
A series of M cell matrixs.Structured matrix generation unit 127 is configurable to a series of this M cell matrix being changed into knot
Structure matrix.Structured matrix may be bigger than each matrix in a series of this M small cells matrix in size.May
In the way of the relation associated with a series of each cell matrix of this M cell matrix is retained, a series of this M unit occurs
The transformation of matrix.The relation of reservation can include, for example, orthogonality or Euclidean distance.In at least one of the disclosure
In aspect, transformation can include a series of Kronecker product next life linear projection by using M cell matrixs.Relative to by
At least one aspect of the theme of this disclosure, a series of Kronecker product of this M cell matrix can be by using bag
Include, for example, the process of the calculating of Fast Fourier Transform (FFT) or similar Fast Fourier Transform (FFT) is realized.Use structured matrix
Generation unit 127 may by the generation for the linear projection that a series of this M small cells matrix is changed into larger structure matrix
Cause the substantially reducing in terms of calculating with space cost compared with the projection of unstructured matrix.For example, use structuring square
The linear projection that battle array generation unit 127 generates can realize the calculating speed O (d with unstructured matrix for d dimension datas2) and
Space complexity O (d2) the calculating speed O (dlogd) and space complexity O (logd) that are contrasted.
The output of structured matrix generation unit 127 may produce the search space strengthened.It will can strengthen
Search space is stored in the search space storage region 128 strengthened.Although search space has been enhanced with will be non-structural
Change the space complexity of matrix from O (d2) O (logd) is reduced to, but larger structured matrix still can be content item
Specific collection provides the expression of the characteristic vector space for the characteristic vector for generally including all generations.Therefore, using herein
Described in various aspects of the disclosure will not damage the degree of accuracy and the precision of search.
Remote computer 130 can represent one or more remote computers.Each remote computer 130 can be wrapped at least
Include processor 131, memory 132 and content data base 133.Remote computer 130 is configurable to make one or more contents
The Software tool (such as, for example, content recognition unit 123) that item could can be identified and obtained Web content is found.It is some remote
One or more users of journey computer 130 can also search for and access the content item being stored in content data base 129.Remotely
Computer 130 is configurable to communicate with server 120 via network 140.
Network 140 is configurable to promote the connection between client 110, server 120, and/or remote computer 130
Property.Client 110, server 120, and/or remote computer 130 can be respectively via one or more wired or channel radios
Letter link 142a, 142b, and/or 142c are connected with network 140.Network 140 can include one or more types public and/
Or any combinations of private network, the public and/or private network of one or more types include but is not limited to LAN
(LAN), wide area network (WAN), internet, cellular data network or its any combinations.
Fig. 2 is the instantiation procedure 200 that can be used for efficiently performing linear projection according at least one aspect of the disclosure
Flow chart.
Process 200 starts from 210:Using content recognition unit 123 via next comfortable server 120 locally or from service
One or more content items of the one or more content sources of device 120 farther out initiate the scanning to content.Can be by, for example,
Web crawlers, Web Spider etc. perform content scanning.Alternately, or in addition, content recognition unit 123 can receive comes from
One or more content items of one or more remote computers 130 or one or more client computers 110.Once know
It is clipped to content, content recognition unit can be sampled to the content recognized and by least a portion of the content recognized
It is stored in content data base 129, at least a portion of the content recognized is stored in another part of main storage 122
In, or at least a portion of the content recognized is sent to feature extraction unit 124.
Process 200 can continue in 220, and in 220, feature extraction unit 124 can be accessed by content recognition unit
One or more parts of the content of 123 identifications.Feature extraction unit 124 can extract one associated with the content obtained
Or multiple features and/or characteristic.Main storage 122 can be stored in by the characteristic storage extracted in content data base 129
Another part in, or send characteristic vector generation unit 125 to.
Process 200 can continue in 230, and in 230, characteristic vector generation unit 125 can be based on by feature extraction
The content characteristic that unit 124 extracts generates one or more characteristic vectors.Characteristic vector can available for generation high dimensional data
Search for data model.Can search for model can promote similarity to determine based on the comparison of two or more characteristic vectors.It is this
Comparing can the assessment based on the Euclidean distance being present between two or more characteristic vectors.It is present in any given
Characteristic vector to the distance between it is smaller, the similarity being present between characteristic vector is bigger.Can be by the characteristic vector of generation
It is stored in content data base 129, is stored in another part of main storage 122, or sends cell matrix generation list to
Member 126.
Process 200 can continue in 240, and in 240, cell matrix generation unit 126 can be based on by characteristic vector
The set for multiple high dimensional features vector that generation unit 125 generates generates a series of M cell matrixs.A series of this M units square
Each matrix in battle array can be orthogonal.The original Europe for being based at least partially on the particular snapshot of characteristic vector search space is several
In distance randomly or pseudo-randomly generates a series of this M cell matrix.Alternately, or furthermore it is possible to by making
Train a series of this M mono- with one or more machine language learning systems (such as, in those systems stated herein below)
Variable matrix.A series of this M cell matrix of generation can be stored in content data base 129, be stored in main storage 122
In another part, or send structured matrix generation unit 117 to.
Process 200 can continue in 250, and in 250, structured matrix generation unit 127 is configurable to be based on one
Serial M small cells matrix creates larger structured matrix.Can be by by a series of this M cell matrixs transformation/rotation
Larger structured matrix is created for larger structured matrix.The transformation can be performed and make it that transformation reservation is a series of with this
The relation that each cell matrix in M cell matrixs is associated.In at least one aspect by the theme of this disclosure,
Transformation can include the Kronecker product next life linear projection by using a series of this M cell matrix.Larger structuring
Matrix may produce the search space strengthened.The space complexity of larger structured matrix may be with the sky of d dimension datas
Between complexity O (logd) it is similar.In 260, the search space strengthened can be stored in the search space storage strengthened
In main storage in region 128, it is medium to be stored in content data base 129.
Although the disclosure is described herein relative to Kronecker product, it should be noted that the disclosure need not
So limit.Equally, as described in this article, can using other matrixes change or spinning solution come promote by it is a series of compared with
Small orthogonal cells matrix is changed into the transformation of larger structured matrix.Retain and a series of unit squares for example, can utilize
Any efficient transition of the cell matrix of the associated one or more relations of battle array so as to according to the disclosure from a series of less
Orthogonal cells matrix generates larger structured matrix.Wherein it is possible to these relations being retained in the structured matrix of generation
Example include:Among other, for example, orthogonality, Euclidean distance etc..
Fig. 3 is the example system 300 that can be used for efficiently performing linear projection according at least another aspect of the disclosure
Block diagram.System 300 can include, for example, client 310, server 320, remote computer 330 and network 340.
Client 310 can include can one or more client terminal devices generally similar with client 110 respectively.
Client 310 can comprise at least processor 311, main storage 312 and content data base 319.However, client 310 may be used also
With including content recognition unit 313, feature extraction unit 314, characteristic vector generation unit 315, cell matrix generation unit
316th, structured matrix generation unit 317 and the search space storage region 318 strengthened.Content recognition unit 313, feature
Extraction unit 314, characteristic vector generation unit 315, cell matrix generation unit 316, the and of structured matrix generation unit 317
Content recognition unit 123 with Fig. 1 systems 100 of each in the search space storage region 318 strengthened, feature extraction
Unit 124, characteristic vector generation unit 125, cell matrix generation unit 126, structured matrix generation unit 127 and increase
Strong search space storage region 128 is substantially the same.However, in Fig. 3 system 300, content recognition unit 313, feature
Extraction unit 314, characteristic vector generation unit 315, cell matrix generation unit 316, the and of structured matrix generation unit 317
The search space storage region 318 strengthened can be implemented in client 310, rather than on server 320 implement or
Person is in addition to server 320.Therefore, the efficiency provided by the theme of this specification can promote in client terminal device (such as, example
Such as, client 310) on search and retrieval high dimensional data.
Therefore, can be run in client 310 one can apply to by the feature of the theme of this specification description
Or the various aspects of multiple Mobile solutions 114, such as, for example, the search space that has strengthened of generation is to support text, sound
Frequency file, image file, video file or its combination be locally stored, search for and retrieve.The feature of the disclosure can be with
Applied to the search space that has strengthened of generation to improve with other types of Mobile solution (such as, for example, handwriting recognition should
With) associated storage, search and search operaqtion, the search of contents advertising and display etc..
Using features described above, the disclosure can Application way (such as, among other, for example, binary system it is embedded or
Person Descartes k averages) when, the advantages of notable is provided for search and retrieval technique, these search and retrieval technique include:For example,
Approximate KNN (ANN) searching method.Therefore the disclosure solves the problems, such as the search of complexity with the more preferably degree of accuracy, while only need
Want significantly less time and internal memory.
Fig. 4 is the example mistake for performing search inquiry to the search space strengthened according at least one aspect of the disclosure
The flow chart of journey 400.
Process 400 may begin at 410:Computer, for example, server 120 or client 310 receive object search.
The object search can include inquiry, the inquiry include one or more keywords, image, video segment, via stylus or
The handwritten stroke of the finger input of person user, address, and/or can be with the content item that is preserved by content data base 129 or 319
Associated other data.After object search is received, server 120 or client 310 can analyze object search
To extract the one or more features or characteristic associated with the object search received in 420.
Process 400 can continue in 430, in 430, based on the object search feature extracted in 420, generation with
The associated one or more object search characteristic vectors of object search.In 440, server 120 or client 310 can be with
For be stored in the search space storage region 128 or 318 strengthened before generation the search space strengthened come
Handle object search characteristic vector.This can include, for example, in view of the linear projection of structured matrix is special to analyze object search
Vector is levied to be identified as the subset that object search characteristic vector provides the high dimensional feature vector of arest neighbors matching.Alternately,
Or in addition, the stage 440 can include identification represent fall object search characteristic vector predetermined threshold distance in feature to
Multiple matchings of the subset of amount.In at least one aspect of the disclosure, in object search vector with being searched via what is strengthened
The distance between characteristic vector of structured matrix linear projection in rope space can be Euclidean distance.Finally, 460
In, the process can retrieve the spy matched enough with object search characteristic vector with being recognized in the search space strengthened
Levy one or more content items of vector correlation connection.Alternately, or furthermore it is possible to by the content item that refer to retrieve
One or more link is supplied to the computer for submitting object search.
As described above, say at least in part, extensive search and retrieval with the associated data of sophisticated computers application
At least one stage in method can utilize linear projection.For given vector, such as, for example,And projection
MatrixLinear projection can be as following in equation (1)Shown in:
H (x)=Rx (1)
It can carry out after this linear projection, among other, for example, quantifying, high dimensional feature is converted to
Using the compact code of smaller internal memory, such as, for example, binary system insertion or product code.The compact code can be binary code
Or non-binary code.This compact code can be used for efficiently performing the search execution time and reduce and various complicated calculations
The associated storage demand of machine application, such as, for example, image retrieval, characteristic matching, Attribute Recognition and object classification etc..Example
Such as, can be used for being converted to by input data for local sensitive hash (LSH) technology of extensive approximate KNN search
Input data is linearly projected before compact code.For example, k- bit-binaries code can meet following equation (2):
H (x)=sign (Rx) (2)
However, with input dimension d increases, this linear projection operation becomes costly in terms of calculating.In order to examine
Higher recall ratio is realized in rope task, big k long code can be used to cause k=O (d).In this case, the sky of projection
Between and computation complexity be O (d2), and so high projection cost would generally become bottleneck in terms of learning with predicted time.
For example, work as k=d=50, when 000, projection matrix itself may need 10GB (single precision) and project one on single code
Individual vector may require that 800ms.
In at least one aspect of the disclosure, projection matrix can be orthogonal.It is orthogonal transformation be probably it is beneficial, this
It is because among other, it can be with the Euclidean distance between retention point, and it is also known that this orthogonal transformation
Variance can be distributed more uniformly across in dimension.These attributes are good to real world data for making a variety of widely-known techniques
Good perform is important.
Other benefit can be provided using rectangular projection for specific application.For example, orthogonality can meet learning
Farthest learn a kind of means of the target of irrelevant bit when practising the binary code dependent on data.On one kind is realized
The mode for stating this target is by applying orthogonal or nearly orthogonal limitation to projection.Equally, it is embedded in binary system
In, improved result can be realized when performing approximate KNN search by applying orthogonality limitation to projection.
In order to efficiently perform the linear projection of high dimensional data, present disclose provides for according to will be with original unit matrix phase
A series of small cell matrixes are changed into structure by the mode that the relation of association is left the existing relation before matrix transformation
Change method, system and the non-transitory computer-readable medium of matrix.Structured matrix caused by because of transformation can be example
Such as, big single matrix.Alternatively, however, structured matrix can conceptually represent flexible orthohormbic structure matrix
Race.In at least one aspect of the disclosure, the relation of reservation can be and each matrix in this series unit matrix
Associated orthogonality.Alternately, or in addition, the relation retained can be in the character pair vector joined with matrix correlation
The distance between.The distance can be, for example, the Euclidean distance between the character pair vector that matrix correlation joins.
The transformation of cell matrix can be realized by using the Kronecker product of small cell matrix so that space and computation complexity
Generally reduce.The flexibility associated with transformation can promote the change of free parameter quantity in a matrix to adapt to give
The demand of application.
Therefore, at least one aspect of the disclosure can form big structure square by changing a series of small orthogonal matrixes
Battle array constructs orthohormbic structure matrix race.At least one aspect of the disclosure by using a series of small orthogonal cells matrixes gram
Kronecker product promotes above-mentioned transformation.Kronecker projection matrix can meet equation (3):
In equation (3), Aj, j=1 ..., M is small orthogonal matrix.Small orthogonal matrix Aj, j=1 ..., M can be referred to as
Cell matrix.Big matrix can be associated with least four major advantages according to caused by above-mentioned transformation.First, big matrix meets
Orthogonality limits, and therefore big matrix can retain the Euclidean distance in luv space.Second, similar fast Fourier
The calculating of conversion can be used for calculating projection, and its time complexity is O (dlogd).3rd, by changing the big of cell matrix
Small, caused big matrix can be associated with the change (degrees of freedom) of number of parameters, hence in so that easily control performance with
Speed is traded off.4th, the space complexity of big matrix is O (logd), and the spatial complex of most of other structures matrixes
Degree is O (d).In addition, in various different settings (including for example, binary system insertion and vector quantization), the Kronecker of proposition
It is projected in approximate KNN search aspect and provides the advantage that.
Original vector can be mapped in k- bit-binary vectors by binary system embedding grammar causes h (x) ∈ {+1, -1
}k.This mapping can be denoted as binary code using data point, so as to substantially reduce carrying cost, even if working as k=O
(d) it is also such when.Approximate KNN can be retrieved by using the Hamming distance in binary code space, can be according to
Various modes efficiently calculate the Hamming distance, and the various modes include, for example, using table lookup or in modern computing
POPCNT instructions in frame structure.
LSH can be used for generating the side of binary code according to reservation COS distance and usually using randomization projection
Formula generates binary code.However, it may abandon learning to rely on by optimizing projection matrix R using this randomization projection
The advantages of binary code of data.For example, being shown using the method for iterative quantization (ITQ), project and then make by using PCA
With the rectangular projection of study, caused binary system insertion may surpass non-orthogonal projection or randomized orthogonal projection.Can be with
By replacing between data for projection point and solving projection via SVD to learn to project.However, high dimensional feature is directed to, it is this
Method is probably infeasible, unless dimension is fundamentally reduced, and this damages performance.Utilize promotion Kronecker
The various aspects of the disclosure of the projection of product study can produce the performance capability similar to ITQ, while generally more efficient.
Quantization method can represent data point via the set of quantizing factor, and the set of the quantizing factor can generally lead to
Vector quantization algorithm (such as, for example, k- mean algorithms) is crossed to obtain., can in order to search for given query vector q arest neighbors
To calculate the Euclidean distance between all data points in q and database.The distance of quantizing factor can be arrived by vector
Estimate Euclidean distance.Alternately, or in addition, when data are higher-dimensions, can independently be performed in subspace
Quantify.Chunk can be carried out by quantifying the vector of (PQ) to may result in product, to identify conventional subspace.
The distance between query vector q and data point x can be stated relative to equation (4):
In equation (4), wherein m is the sum of subspace, x(i)And q(i)It is subvector and μi(x(i)) it is in subspace
Quantization function on i.Because its asymmetric essence, it is only capable of quantized data point, it is impossible to quantify query vector.For raising property
Can, for given data, different subspaces has similar variance.A kind of mode for realizing this effect is by will just
Transformation R is handed over to be applied in data, as stated in equation (5):
Because projection matrix R is orthogonal, so projection matrix R can retain Euclidean distance.It is not using random
Projection matrix, but projection matrix can be learnt from given data, this is improved retrieval result.However, it is present in this
The method of promotion projection operation before open can with source cost higher in higher dimensional space (for example, processor use, it is interior
Deposit use, perform time etc.) it is associated.
Consequently, because at least the above reason, for binary system insertion and quantifies, it is necessary to be both that orthogonal and can efficiently learns
Fast projection.As described below, these purposes can be by using the relation retained with each corresponding units matrix correlation connection
Transformation algorithm a series of cell matrixs are changed into big structured matrix to realize.The transformation algorithm can include, except
Outside other, projected for example, being generated using Kronecker product.
Kronecker product can be associated with promoting some attributes of above-mentioned transformation.For example, it is assumed thatAnd
AndA1And A2Kronecker product be Meet equation (6):
In equation (6), wherein, a1(i, j) is A1The i-th row and jth row unit.Kronecker product can also be claimed
For tensor product or direct product.D dimensional vectors can be become dimension as a × b matrixes (ab=d) by operating mat (x, a, b), and vec ()
Can be by the way that matrix " stretching " be formed into vector, and vec (mat (x, a, b))=x into vector.
The multiple characteristics for the advantages of Kronecker product can be with promoting to enumerate herein are associated.For example, retain with
While the relation of original unit matrix correlation connection, at least one subset of these characteristics helps to generate fast orthogonal projection.Promote
Entering two particular characteristics of the Kronecker product of above-mentioned advantage includes
And Kronecker product stick unit matrix orthogonality the fact.That is, if A1And A2All be it is orthogonal, then
It is orthogonal.
According at least one aspect of the disclosure, Kronecker projection matrixMultiple units can be included
The Kronecker product of matrix, such as the following statement in equation (7):
Wherein, in equation (7),And
A benefit for forming larger matrix in this way is that calculating for Kronecker projection can utilize reduction
Computation complexity.In order to simplify discuss, it will be assumed that R is square, i.e. k=d, and all cell matrixs be also it is square,
With identical rank de.Floating-point operation (FLOP) accurately have estimated the calculating cost of distinct methods.Tieed up assuming that being calculated with FLOP in d
Kronecker on vector is projected as f (d, de), the rank of cell matrix is de.Below Kronecker product is shown in equation (8)
An attribute.
PerformNeed d (2de- 1) individual FLOP (ddeSecondary multiplication and dde- d sub-additions).This
After calculateIt is changed into deMore small-scale problem, each personal characteristic dimension d/deAnd rank
deUnit matrix come calculate Kronecker projection, as in equation (9) reflection.
F (d, de)=d (2de-1)+def(d/de, de). (9)
Based on equation (9), performing the FLOP of the Kronecker projection of d dimensional vectors is
Another the attracting characteristic for helping to promote the Kronecker of the advantages of described herein to project is the spirit of its structure
Activity.For example, by controlling Aj, j=1, M size, can easily balance model number of parameters (accordingly, it is capable to
Power) and calculating cost.In the presence ofIndividual cell matrix, each cell matrix haveIndividual parameter.In Kronecker projection
Number of parameters can beScope is d2(work as deDuring=d) arrive 4log2D (works as deWhen=2).
Hereinbefore, the disclosure is described with reference to example, in these examples, Kronecker projection R and all lists
Variable matrix is all square.However, the disclosure need not be limited so.On the contrary, for example, the disclosure extends also to non-square
Kronecker projects and/or non-square cell matrix.For example, by factorization d and k come the size of selecting unit matrix.Can
Alternatively, either additionally, it is possible to occur that d or k can not be factorized as the situation of the product compared with decimal.It is for example, relative
In input feature vector, dimension can be changed by double sampling or zero padding.Individually say, for example, relative to output, can make
With longer code and then double sampling.The back of the body in square projection matrix R and non-square projection matrix is discussed further below
The generation that Kronecker projects under scape.
According to it is unstructured projection, circulation projection, and/or bilinearity projection etc. similar mode, can also give birth at random
Projected into Kronecker.However, the Kronecker projection generated at random improves above-mentioned projection list, because it is suitable for height
Dimension data.
The Kronecker projection generated at random can apply to binary system insertion and quantify.These of Kronecker projection should
With unstructured projection matrix can be replaced (in equation (1) and equation (5) by using the Kronecker projection matrix of randomization
R) realize.
For example, being projected relative to Kronecker, method, system and computer program described herein can generate M
(small) orthogonal cells matrix., can be by creating small random Gaussian matrix and then performing at least one aspect of the disclosure
QR factorization generates cell matrix.For the cell matrix that size is 2 × 2, for example, the Crow of the d ranks of generation randomization
The time complexity of interior gram of projection is only O (logd).This is significant benefit, because, for example, the orthogonal matrix of generation d ranks
The time complexity of unstructured projection is O (d3).Therefore, the Kronecker of randomization is projected as generating the random of high dimensional data
Change projection and provide practicable solution.
According to another aspect of the present disclosure, a kind of system and method for optimizing Kronecker projective parameter are disclosed.It is as follows
What face was explained in more detail, optimized algorithm will be discussed for binary system insertion, quantization, and then can show that binary system will be embedded, quantifies
It can be formulated out and solve the orthogonal general Roc of each cell matrix your this this (procrustes) problem of making a mistake.In order to
Carry out the discussion, it will be assumed that training data X=[x1,x2,...,xN]∈Rd×N.Following discussing first individually solves k=d's
It is assumed that then the situation for expanding to k ≠ d is discussed.
First, solves the problems, such as the binarization loss reduction for being embedded in binary system.Can be as shown in equation (10)
Express optimization problem.
In equation (10), wherein, binary matrix B=[b1,b2,...,bN]∈{-1,1}d×N, and biIt is xiTwo
Carry system code, i.e. bi=sign (Rxi).In addition, Kronecker structure is utilized to R.Equation can be found by alternating minimization
(3) local solution.In the case where R is fixed, based on definition, B is calculated by straight binary.Alternately, it is in B
(we can be discussed below k to fixed and k=d<D situation) in the case of, it is orthogonal general by what is stated in equation (11)
Roc you make a mistake this problem to find R.
Next, for quantifying, it is contemplated that Descartes k averages (ck averages) method.For ck averages, input sample x
It is divided into m sub-spaces, x=[x(1);x(2);...;x(m)], and can be quantized to h subcenter per sub-spaces.According to
Example discussed below, it is contemplated that all subcenters have a case that identical fixed basic number.However, the disclosure need not be such
Limit.For example, the disclosure can also be applied to the subcenter set with shifting base in a comparable manner.
Assuming that p=[p(1);p(2);...;p(m)], wherein p(j)∈ { 0,1 }h, | | p(j)| | 1=1.In other words, p(j)It is ion
Center x(j)Nearest designator.Assuming thatIt is j-th of subcenter matrix and C ∈ Rd×mhIt is by all sons
The center matrix that (diagonal) cascade of the heart is formed, as stated in equation (12):
In ck averages, pass through orthogonal matrixAnd block diagonal matrix In parameterizing
Heart Matrix C.The optimization problem of ck averages can be write as equation (13):
Kronecker structure is applied on orthogonal matrix R using similar alternation procedure.In the case where R is fixed,
Renewal D and P can carry out vector quantization equivalent in every sub-spaces with k averages.The above method be it is efficient, this be because
For because the quantity of the cluster of every sub-spaces is arranged to less number (for example, h=256) all the time, so the quantity at center
It is typically small.R can will be updated as your this this problem of making a mistake of orthogonal general Roc with fixed D and P, such as in equation (1)
Reflection.
Therefore, for methods discussed above, orthogonal general Roc you this make a mistake this problem it is possible that.For utilizing Crow
Interior gram of product and/or the various aspects of the disclosure of projection, the problem can be referred to as the general Roc of Kronecker, and you make a mistake this.Will be
Showing the general Roc of Kronecker below in relation to equation (15), you make a mistake this:
In order to solve optimization above, each cell matrix can be updated using alternative manner so as to find local solution.
It can be rewritten by such as equation 16To start this method.
Second equation is set up, because Kronecker product can retain orthogonality.Therefore, next step can be maximizedBy using the attribute of trace, it can be expressed as equation (17):
In equation (17), wherein, bi and xi are matrix B and matrix X the i-th row respectively.Can be by once updating one
Individual cell matrix keeps all other cell matrix constant to solve this problem simultaneously.Without loss of generality, renewal Aj is considered,
As shown in equation (18):
In equation (18), wherein,AndIt is false
If Apre、AnextAnd AjDimension be k respectivelypre×dpre、knext×dnextAnd kj×dj.Moreover, dpredjdnext=d and
kprekjknext=k.
According to the characteristic associated with Kronecker product, the object function Aj in equation (18) can meet:
Assuming thatAnd
Then, equation (19) can be expressed as:
When updating a cell matrix, calculate cost and may be from three different sources.First source, is claimed herein
For S1, the Kronecker product of data is calculated come the cell matrix for fixation of using by oneself.Second source, referred to herein as S2, from meter
Calculate the product of data for projection and code.3rd source, referred to herein as S3, from performing SVD to obtain optimal unit square
Battle array.For the possible larger situation of cell matrix, optimization bottleneck is probably SVD.Alternately, may be smaller for cell matrix
The situation of (such as, for example, 2 × 2), SVD is performed in approximate constant time.Therefore, the main sheet that is calculated as comes from S1 (O
) and S2 (O (Nd)) (Ndlogd).Due to a total ofIndividual cell matrix, so the computation complexity entirely optimized is O
(Ndlog2d)。
In above-mentioned optimization, the Kronecker projection of randomization is used as initializing.For binary system insertion and quantify,
Process of the target based on proposition may be reduced quickly.Gratifying solution can be found in tens iteration.
The above method is discussed based on the situation as k=d.However, various aspects of the disclosure can be used for working as k ≠ d
When situation.For k ≠ d situation, projection can be formed by the Kronecker product of non-square row/column orthogonal cells matrix
Matrix R.In this case, Kronecker product can be with reservation line/orthogonality of columns.For example, work as k>During d, orthogonal general Roc you make a mistake
This optimization problem can be similar to and is equally resolved as k=d.Alternately, for example, working as k<During d, RTR≠I.Therefore, etc.
Second equation in formula (16) is invalid.Therefore,It is changed into:
Can be with by assuming that tr (XTRTRX) be on R it is independent, it is identical with the case of k >=d, to relax the problem.
The various embodiments of system and technology described herein can be in digital electronic circuitry, integrated circuit system
Realized in system, special ASIC (application specific integrated circuit), computer hardware, firmware, software, and/or combinations thereof.These are various
Implementation can be included in one or more computer program that can perform and/or can be explained on programmable system, storage
Implementation in system, at least one input unit and at least one output device, it can be special that the programmable system, which includes,
With or general at least one programmable processor, couple from the programmable processor and receive data and instruction and by data
And instruction is transferred to programmable processor.
These calculation procedures (also referred to as program, software, software application or code) include the machine of programmable processor
Instruction, and these can be implemented using the programming language, and/or compilation/machine language of level process and/or object-oriented
Calculation procedure.As used herein, term " machine readable media " and " computer-readable medium " refer to being used to refer in machine
Order and/or data are supplied to any computer program product, equipment, and/or the device of programmable processor (for example, disk, light
Disk, memory, programmable logic device (PLD)), including, reception is machine readable as the machine instruction of machine-readable signal
Medium.Term " machine-readable signal " is referred to for machine instruction and/or data to be supplied into any of programmable processor
Signal.
Interacted to provide with user, can implement system and technology described herein, the computer on computers
Have:For the display device to user's display information (for example, CRT (cathode-ray tube) or LCD (liquid crystal display) monitoring
Device);And keyboard and orienting device (for example, mouse or trace ball), user can by the keyboard and the orienting device come
Provide input to computer.The devices of other species can be also used for providing and be interacted with user;For example, it is supplied to user's
Feedback can be any type of sensory feedback (for example, visual feedback, audio feedback or touch feedback);And it can use
Any form (including vocal input, phonetic entry or, sense of touch) receive the input from user.
System described herein and technology can be implemented in the computing system including background component (for example, as data
Server) or the computing system (for example, application server) including middleware component or the calculating including front end component
System is (for example, the subscriber computer with graphic user interface or web browser, user can pass through graphical user circle
Face or the web browser to interact with the embodiment of system described herein and technology) or including this backstage portion
In any combination of computing system of part, middleware component or front end component.Any form or the number of medium can be passed through
The part of system is connected with each other by digital data communication (for example, communication network).The example of communication network includes:LAN
(" LAN "), wide area network (" WAN ") and internet.
Computer system can include client and server.Client and server is generally off-site from each other and generally logical
Communication network is crossed to interact.By the way that the meter of client-server relation is run and had each other on corresponding computer
Calculation machine program produces the relation of client and server.
Although the disclosure includes some details, these details should not be considered as to the disclosure or may be wanted
The limitation of the scope for the content asked, but the description of the feature as the example embodiment of the disclosure.Exist in this manual
The some features described under the background of independent embodiment can also be implemented in single embodiment in combination.On the contrary, in list
Various features described in the background of individual embodiment can also be provided more individually or according to any suitable sub-portfolio
In individual embodiment.Although in addition, can describe feature as working in a manner of some combinations, and even it is described as most
Just require this, but one or more feature of desired combination can be removed from combination in some cases, and
Required combination can be directed to the modification of sub-portfolio or sub-portfolio.
Equally, although showing operation according to particular order in the accompanying drawings, should not be considered to need according to
Shown particular order either carries out this operation or the operation for needing to carry out all diagrams with reality in sequential order
Existing desired result.In some cases, multitasking and parallel processing can be favourable.In addition, should not will be upper
The separation for stating the various system units in embodiment is interpreted as needing this separation in all embodiments, and should manage
Solution, described program element and system generally can be integrated in single software product or be encapsulated into multiple software productions together
In product.
The specific embodiment of this theme has been described.Other embodiments are in the scope of the following claims.For example,
The action described in detail in the claims can in a different order carry out and still can realize desired result.Can
With provide other steps or can from the flow delete step of description, and can by other parts added to description system
Or remove other parts from the system of description.In some cases, multitasking and parallel processing can be favourable.
Claims (15)
1. a kind of computer implemented method, methods described include:
Obtain multiple content items;
Multiple features are extracted from each content item in the multiple content item;
Characteristic vector is generated by each feature extracted to create search space;
A series of cell matrixs are generated based on the characteristic vector generated, wherein, it is each in a series of cell matrixs
Cell matrix is associated with one or more relations;
At least partially through by a series of cell matrixs be changed into structured matrix cause it is described transformation retain with it is described
One or more of relations that a series of each cell matrix in cell matrixs is associated, to strengthen the search space;
Receive object search;
The search space strengthened is searched for based on received object search;And
One or more links are provided, one or more of links are pointed in response in the one or more of the object search
Rong Xiang.
2. the method according to claim 11, wherein, the one or more of relation bags associated with the cell matrix
Include orthogonality.
3. method according to claim 1 or 2, wherein, the one or more of passes associated with the cell matrix
System includes Euclidean distance.
4. the method according to any one of the claims, wherein, a series of cell matrixs are changed into described
Structured matrix further comprises:
Kronecker projection is generated, the Kronecker projection is based at least partially on to a series of cell matrixs using in Crow
Gram product.
5. the method according to any one of the claims, wherein, searched for based on received object search
The search space of enhancing further comprises:
The extraction one or more features associated with the object search;
Generation represents the object search vector of the feature of the object search;
Object search vector is compared with the search space strengthened including the structured matrix;And
One or more content items of predetermined relationship are met to identify based on the comparison.
6. the method according to any one of the claims, wherein, a series of cell matrix at least part grounds
Generated at random in the original Euclidean distance of the particular snapshot of characteristic vector search space.
7. the method according to any one of the claims, wherein, a series of cell matrixs are changed into structure
Change matrix cause the transformation retain it is associated with each cell matrix in a series of cell matrixs one or
Multiple relations are the memory space complexity O (logd) with d dimension datas come what is realized.
8. a kind of system, the system includes:
One or more storage devices of one or more computers and store instruction, the instruction is by one or more of
When computer performs, it is operable such that one or more of computers perform operation, the operation includes:
Obtain multiple content items;
Multiple features are extracted from each content item in the multiple content item;
Characteristic vector is generated by each feature extracted to create search space;
A series of cell matrixs are generated based on the characteristic vector generated, wherein, it is each in a series of cell matrixs
Cell matrix is associated with one or more relations;
At least partially through by a series of cell matrixs be changed into structured matrix cause it is described transformation retain with it is described
One or more of relations that a series of each cell matrix in cell matrixs is associated, to strengthen the search space;
Receive object search;
The search space strengthened is searched for based on received object search;And
One or more links are provided, one or more of links are pointed in response in the one or more of the object search
Rong Xiang.
9. system according to claim 8, wherein, the one or more of relation bags associated with the cell matrix
Include orthogonality.
10. system according to claim 8 or claim 9, wherein, the one or more of passes associated with the cell matrix
System includes Euclidean distance.
11. the system according to any one of claim 8 to 10, wherein, a series of cell matrixs are changed into institute
Structured matrix is stated to further comprise:
Kronecker projection is generated, the Kronecker projection is based at least partially on to a series of cell matrixs using in Crow
Gram product.
12. the system according to any one of claim 8 to 11, wherein, searched for based on received object search
The search space strengthened further comprises:
The extraction one or more features associated with the object search;
Generation represents the object search vector of the feature of the object search;
Object search vector is compared with the search space strengthened including the structured matrix;And
One or more content items of predetermined relationship are met to identify based on the comparison.
13. the system according to any one of claim 8 to 12, wherein, a series of cell matrixs are at least in part
The original Euclidean distance of the particular snapshot in feature based vector search space and generate at random.
14. the system according to any one of claim 8 to 13, wherein, a series of cell matrixs are changed into knot
It is associated with each cell matrix in a series of cell matrixs one that structure matrix make it that the transformation retains
Or multiple relations are the memory space complexity O (logd) with d dimension datas come what is realized.
15. a kind of non-transitory computer-readable medium for storing software, the software includes can be by one or more computers
The instruction of execution, the instruction, which performs one or more of computers when so being performed, to be included according to claim 1
To the operation of the method any one of 7.
Applications Claiming Priority (7)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
US201562232238P | 2015-09-24 | 2015-09-24 | |
US201562232258P | 2015-09-24 | 2015-09-24 | |
US62/232,238 | 2015-09-24 | ||
US62/232,258 | 2015-09-24 | ||
US14/951,909 | 2015-11-25 | ||
US14/951,909 US10394777B2 (en) | 2015-09-24 | 2015-11-25 | Fast orthogonal projection |
PCT/US2016/047965 WO2017052874A1 (en) | 2015-09-24 | 2016-08-22 | Fast orthogonal projection |
Publications (2)
Publication Number | Publication Date |
---|---|
CN107636639A true CN107636639A (en) | 2018-01-26 |
CN107636639B CN107636639B (en) | 2021-01-08 |
Family
ID=60808567
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201680028711.7A Active CN107636639B (en) | 2015-09-24 | 2016-08-22 | Fast orthogonal projection |
Country Status (4)
Country | Link |
---|---|
EP (1) | EP3278238A1 (en) |
JP (2) | JP6469890B2 (en) |
KR (1) | KR102002573B1 (en) |
CN (1) | CN107636639B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020173334A1 (en) * | 2019-02-25 | 2020-09-03 | 阿里巴巴集团控股有限公司 | Data storage method and data query method |
Families Citing this family (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR102217219B1 (en) * | 2020-07-30 | 2021-02-18 | 주식회사 파란헤움 | Emergency notification system apparatus capable of notifying emergency about indoor space and operating method thereof |
KR102417839B1 (en) * | 2020-08-13 | 2022-07-06 | 주식회사 한컴위드 | Cloud-based offline commerce platform server that enables offline commerce based on gold and digital gold token, and operating method thereof |
KR102302948B1 (en) * | 2020-08-13 | 2021-09-16 | 주식회사 한컴위드 | Gold bar genuine product certification server to perform genuine product certification for gold bar and operating method thereof |
KR102302949B1 (en) * | 2020-08-13 | 2021-09-16 | 주식회사 한컴위드 | Digital content provision service server supporting the provision of digital limited content through linkage with gold bar and operating method thereof |
KR102639404B1 (en) | 2020-10-30 | 2024-02-21 | 가부시키가이샤 페닉스 솔루션 | RFID tag for rubber products and manufacturing method of RFID tag for rubber products |
CN112380494B (en) * | 2020-11-17 | 2023-09-01 | 中国银联股份有限公司 | Method and device for determining object characteristics |
Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5479523A (en) * | 1994-03-16 | 1995-12-26 | Eastman Kodak Company | Constructing classification weights matrices for pattern recognition systems using reduced element feature subsets |
CN1823334A (en) * | 2003-05-14 | 2006-08-23 | 塞利布罗斯有限公司 | Search engine method and apparatus |
CN101281545A (en) * | 2008-05-30 | 2008-10-08 | 清华大学 | Three-dimensional model search method based on multiple characteristic related feedback |
CN103279578A (en) * | 2013-06-24 | 2013-09-04 | 魏骁勇 | Video retrieving method based on context space |
CN103389966A (en) * | 2012-05-09 | 2013-11-13 | 阿里巴巴集团控股有限公司 | Massive data processing, searching and recommendation methods and devices |
CN103440280A (en) * | 2013-08-13 | 2013-12-11 | 江苏华大天益电力科技有限公司 | Retrieval method and device applied to massive spatial data retrieval |
CN103984675A (en) * | 2014-05-06 | 2014-08-13 | 大连理工大学 | Orthogonal successive approximation method for solving global optimization problem |
US20140280428A1 (en) * | 2013-03-13 | 2014-09-18 | International Business Machines Corporation | Information retrieval using sparse matrix sketching |
CN104794733A (en) * | 2014-01-20 | 2015-07-22 | 株式会社理光 | Object tracking method and device |
Family Cites Families (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US6859802B1 (en) * | 1999-09-13 | 2005-02-22 | Microsoft Corporation | Image retrieval based on relevance feedback |
US6826300B2 (en) * | 2001-05-31 | 2004-11-30 | George Mason University | Feature based classification |
US8015350B2 (en) | 2006-10-10 | 2011-09-06 | Seagate Technology Llc | Block level quality of service data in a data storage device |
US7941442B2 (en) | 2007-04-18 | 2011-05-10 | Microsoft Corporation | Object similarity search in high-dimensional vector spaces |
US8457409B2 (en) * | 2008-05-22 | 2013-06-04 | James Ting-Ho Lo | Cortex-like learning machine for temporal and hierarchical pattern recognition |
JP5375676B2 (en) * | 2010-03-04 | 2013-12-25 | 富士通株式会社 | Image processing apparatus, image processing method, and image processing program |
JP5563494B2 (en) * | 2011-02-01 | 2014-07-30 | 株式会社デンソーアイティーラボラトリ | Corresponding reference image search device and method, content superimposing device, system and method, and computer program |
JP5258915B2 (en) * | 2011-02-28 | 2013-08-07 | 株式会社デンソーアイティーラボラトリ | Feature conversion device, similar information search device including the same, coding parameter generation method, and computer program |
US8891878B2 (en) * | 2012-06-15 | 2014-11-18 | Mitsubishi Electric Research Laboratories, Inc. | Method for representing images using quantized embeddings of scale-invariant image features |
JP5563016B2 (en) * | 2012-05-30 | 2014-07-30 | 株式会社デンソーアイティーラボラトリ | Information search device, information search method and program |
JP5959446B2 (en) * | 2013-01-30 | 2016-08-02 | Kddi株式会社 | Retrieval device, program, and method for high-speed retrieval by expressing contents as a set of binary feature vectors |
JP6195365B2 (en) * | 2013-10-18 | 2017-09-13 | Kddi株式会社 | Vector encoding program, apparatus and method |
-
2016
- 2016-08-22 CN CN201680028711.7A patent/CN107636639B/en active Active
- 2016-08-22 JP JP2017556909A patent/JP6469890B2/en active Active
- 2016-08-22 KR KR1020177031376A patent/KR102002573B1/en active IP Right Grant
- 2016-08-22 EP EP16760273.9A patent/EP3278238A1/en not_active Withdrawn
-
2019
- 2019-01-15 JP JP2019004165A patent/JP2019057329A/en active Pending
Patent Citations (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US5479523A (en) * | 1994-03-16 | 1995-12-26 | Eastman Kodak Company | Constructing classification weights matrices for pattern recognition systems using reduced element feature subsets |
CN1823334A (en) * | 2003-05-14 | 2006-08-23 | 塞利布罗斯有限公司 | Search engine method and apparatus |
CN101281545A (en) * | 2008-05-30 | 2008-10-08 | 清华大学 | Three-dimensional model search method based on multiple characteristic related feedback |
CN103389966A (en) * | 2012-05-09 | 2013-11-13 | 阿里巴巴集团控股有限公司 | Massive data processing, searching and recommendation methods and devices |
US20140280428A1 (en) * | 2013-03-13 | 2014-09-18 | International Business Machines Corporation | Information retrieval using sparse matrix sketching |
CN103279578A (en) * | 2013-06-24 | 2013-09-04 | 魏骁勇 | Video retrieving method based on context space |
CN103440280A (en) * | 2013-08-13 | 2013-12-11 | 江苏华大天益电力科技有限公司 | Retrieval method and device applied to massive spatial data retrieval |
CN104794733A (en) * | 2014-01-20 | 2015-07-22 | 株式会社理光 | Object tracking method and device |
CN103984675A (en) * | 2014-05-06 | 2014-08-13 | 大连理工大学 | Orthogonal successive approximation method for solving global optimization problem |
Non-Patent Citations (3)
Title |
---|
CHARLES F等: "The ubiquitous Kronecker product", 《JOURNAL OF COMPUTATIONAL AND APPLIED MATHEMATICS》 * |
YUNCHAO GONG等: "Learning Binary Codes for High-Dimensional Data Using Bilinear Projections", 《IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION》 * |
孔庶: "统计稀疏学习:特征提取、聚类、分类及多特征融合", 《中国优秀硕士学位论文全文数据库 信息科技辑》 * |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
WO2020173334A1 (en) * | 2019-02-25 | 2020-09-03 | 阿里巴巴集团控股有限公司 | Data storage method and data query method |
Also Published As
Publication number | Publication date |
---|---|
JP2018524660A (en) | 2018-08-30 |
KR20170132291A (en) | 2017-12-01 |
CN107636639B (en) | 2021-01-08 |
JP6469890B2 (en) | 2019-02-13 |
EP3278238A1 (en) | 2018-02-07 |
JP2019057329A (en) | 2019-04-11 |
KR102002573B1 (en) | 2019-07-22 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107636639A (en) | Quick rectangular projection | |
CN107229757B (en) | Video retrieval method based on deep learning and Hash coding | |
CN111755078B (en) | Drug molecule attribute determination method, device and storage medium | |
Zhang et al. | Information fusion in visual question answering: A survey | |
CN109471945B (en) | Deep learning-based medical text classification method and device and storage medium | |
CN111401406B (en) | Neural network training method, video frame processing method and related equipment | |
CN109816009A (en) | Multi-tag image classification method, device and equipment based on picture scroll product | |
US20160350649A1 (en) | Method and apparatus of learning neural network via hierarchical ensemble learning | |
Oloulade et al. | Graph neural architecture search: A survey | |
CN109993102B (en) | Similar face retrieval method, device and storage medium | |
CN104616029B (en) | Data classification method and device | |
US10394777B2 (en) | Fast orthogonal projection | |
WO2022161380A1 (en) | Model training method and apparatus, and image retrieval method and apparatus | |
CN109948735B (en) | Multi-label classification method, system, device and storage medium | |
CN112164002B (en) | Training method and device of face correction model, electronic equipment and storage medium | |
WO2023011382A1 (en) | Recommendation method, recommendation model training method, and related product | |
CN113254654B (en) | Model training method, text recognition method, device, equipment and medium | |
WO2022170569A1 (en) | Data processing method and apparatus | |
TWI741877B (en) | Network model quantization method, device, and electronic apparatus | |
CN115062134A (en) | Knowledge question-answering model training and knowledge question-answering method, device and computer equipment | |
CN104572930B (en) | Data classification method and device | |
CN112528039A (en) | Word processing method, device, equipment and storage medium | |
US20230153085A1 (en) | Systems and methods for source code understanding using spatial representations | |
CN115879508A (en) | Data processing method and related device | |
CN115457332A (en) | Image multi-label classification method based on graph convolution neural network and class activation mapping |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |