CN113297454A - Retrieval method, query method, device, system, electronic equipment and computer storage medium - Google Patents

Retrieval method, query method, device, system, electronic equipment and computer storage medium Download PDF

Info

Publication number
CN113297454A
CN113297454A CN202010291995.7A CN202010291995A CN113297454A CN 113297454 A CN113297454 A CN 113297454A CN 202010291995 A CN202010291995 A CN 202010291995A CN 113297454 A CN113297454 A CN 113297454A
Authority
CN
China
Prior art keywords
index
result
retrieval condition
unstructured data
indexing
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010291995.7A
Other languages
Chinese (zh)
Inventor
楼仁杰
魏闯先
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Alibaba Group Holding Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN202010291995.7A priority Critical patent/CN113297454A/en
Publication of CN113297454A publication Critical patent/CN113297454A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9532Query formulation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the invention provides a retrieval method, a query method, a device, a system, electronic equipment and a computer storage medium. The retrieval method comprises the following steps: acquiring a retrieval condition, wherein the retrieval condition comprises a structured field retrieval condition and an unstructured data retrieval condition based on the same object; based on the structured field retrieval condition, carrying out structured field indexing to obtain a preorder indexing result; and performing similarity index based on the unstructured data retrieval condition, and determining an unstructured data index result from the preamble index result to generate a retrieval result. Because the preamble index result is obtained through the structured field index with lower calculation cost, the index range is reduced, and the subsequent calculation cost of similarity index based on unstructured data is reduced, thereby saving the calculation resource.

Description

Retrieval method, query method, device, system, electronic equipment and computer storage medium
Technical Field
The embodiment of the invention relates to the technical field of communication, in particular to a retrieval method, a query method, a device, a system, electronic equipment and a computer storage medium.
Background
With the development of multimedia devices such as image screens and the like, a large amount of data such as images, screens and the like are generated in the internet every day. In order to deeply analyze the contents in the image and the view screen, the feature vectors in the image and the view screen are extracted by deep learning, and then a vector analysis engine is used for vector retrieval. For example, the most common vector Search is the Approximate Nearest Neighbor (ANN) Search. The purpose of ANN retrieval is to find similar vectors in a high-dimensional space, and similar picture search is achieved at a service level. The vector server develops a dedicated ANN index for ANN retrieval. However, in many services, there is room for improvement.
Disclosure of Invention
Embodiments of the present invention provide a retrieval method, a query method, an apparatus, a system, an electronic device, and a computer storage medium, so as to solve or alleviate the above problems.
According to a first aspect of the embodiments of the present invention, there is provided a retrieval method, including: acquiring retrieval conditions, wherein the retrieval conditions comprise structured field retrieval conditions and unstructured data retrieval conditions based on the same object, and performing structured field indexing based on the structured field retrieval conditions to obtain a preamble index result; and performing similarity index based on the unstructured data retrieval condition, and determining an unstructured data index result from the preamble index result to generate a retrieval result.
According to a second aspect of the embodiments of the present invention, there is provided a query method, including: acquiring a query request, wherein the query request comprises an object to be queried; responding to the query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried; sending a structured field retrieval condition and an unstructured data retrieval condition to a server, so that the server performs structured field indexing based on the structured field retrieval condition to obtain a preamble index result, performs similarity indexing based on the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result; and receiving a query result based on the unstructured data index result returned by the server.
According to a third aspect of the embodiments of the present invention, there is provided a query method, including: acquiring a display picture query request, wherein the display picture query request comprises a picture to be queried; responding to the display picture query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the picture to be queried; sending a structured field retrieval condition and an unstructured data retrieval condition to a server, so that the server performs structured field indexing in a display picture data set based on the structured field retrieval condition to obtain a preamble index result, performs similarity indexing based on the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result; and receiving a query result based on the unstructured data index result returned by the server, wherein the query result comprises at least one display picture meeting the display picture query request in the display picture data set.
According to a fourth aspect of the embodiments of the present invention, there is provided an inquiry apparatus, including: the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for acquiring a display picture query request which comprises a picture to be queried; the retrieval condition generation module responds to the display picture query request and generates a structured field retrieval condition and an unstructured data retrieval condition of the picture to be queried; the sending module is used for sending a structured field retrieval condition and an unstructured data retrieval condition to a server so that the server can perform structured field indexing in a display picture data set based on the structured field retrieval condition to obtain a preamble index result, perform similarity indexing based on the unstructured data retrieval condition, and determine an unstructured data index result from the preamble index result; and the receiving module is used for receiving a query result which is returned by the server and is based on the unstructured data index result, wherein the query result comprises at least one display picture which meets the display picture query request in the display picture data set.
According to a fifth aspect of the embodiments of the present invention, there is provided a retrieval apparatus including: the system comprises an acquisition module, a structured field index module and a data search module, wherein the acquisition module is used for acquiring search conditions, the search conditions comprise a structured field search condition and an unstructured data search condition based on the same object, and the structured field index module is used for carrying out structured field index based on the structured field search condition to obtain a preamble index result; and the similarity index module is used for carrying out similarity index based on the special unstructured data retrieval condition and determining an unstructured data index result from the preorder index result so as to generate a retrieval result.
According to a sixth aspect of the embodiments of the present invention, there is provided an inquiry apparatus, including: the acquisition module acquires a query request, wherein the query request comprises an object to be queried; the retrieval condition generation module responds to the query request and generates a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried; the sending module is used for sending a structured field retrieval condition and an unstructured data retrieval condition to a server so that the server can perform structured field indexing based on the structured field retrieval condition to obtain a preamble index result, perform similarity indexing based on the unstructured data retrieval condition, and determine an unstructured data index result from the preamble index result; and the receiving module is used for receiving the query result based on the unstructured data index result returned by the server.
According to a seventh aspect of the embodiments of the present invention, there is provided a query system including: a front-end node and at least one computing node, the front-end node to: acquiring a query request, wherein the query request comprises an object to be queried; responding to the query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried; sending the structured field retrieval condition and the unstructured data retrieval condition to the at least one computing node; receiving a query result sent by the at least one computing node based on the unstructured data indexing result, the at least one computing node being configured to: based on the structured field retrieval condition, carrying out structured field indexing to obtain a preorder indexing result; based on the unstructured data retrieval condition, carrying out similarity index, and determining an unstructured data index result from the preorder index result; and generating a query result based on the unstructured data index result.
According to an eighth aspect of embodiments of the present invention, there is provided an electronic apparatus, including: one or more processors; a computer readable medium configured to store one or more programs which, when executed by the one or more processors, cause the one or more processors to implement the retrieval method of the first aspect or the query method of the second or third aspects.
According to a ninth aspect of embodiments of the present invention, there is provided a computer readable medium, on which a computer program is stored, which when executed by a processor, implements the retrieval method according to the first aspect or the query method according to the second or third aspect.
According to the scheme of the embodiment of the invention, the structured field indexing is carried out based on the structured field retrieval condition to obtain a preamble indexing result, then the similarity indexing is carried out based on the unstructured data retrieval condition, and the unstructured data indexing result is determined from the preamble indexing result. Because the preamble index result is obtained through the structured field index with lower calculation cost, the index range is reduced, and the subsequent calculation cost of similarity index based on unstructured data is reduced, thereby saving the calculation resource.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments described in the embodiments of the present invention, and it is also possible for a person skilled in the art to obtain other drawings based on the drawings.
Fig. 1 is a schematic block diagram of a network architecture in which a retrieval method according to an embodiment of the present invention is located;
FIG. 2A is a schematic block diagram of a retrieval method according to another embodiment of the present invention;
FIGS. 2B-2D are schematic diagrams of a retrieval method according to another embodiment of the invention;
FIG. 3 is a schematic block diagram of a network architecture in which a query method according to another embodiment of the present invention is located;
FIG. 4A is a schematic block diagram of a query method according to another embodiment of the invention;
FIG. 4B is a schematic block diagram of a query method according to another embodiment of the invention;
FIG. 5 is a schematic block diagram of a retrieval device according to another embodiment of the present invention;
FIG. 6A is a schematic block diagram of a querying device of another embodiment of the present invention;
FIG. 6B is a schematic block diagram of a querying device of another embodiment of the present invention;
FIG. 6C is a schematic block diagram of a querying device of another embodiment of the present invention;
FIG. 7 is a schematic block diagram of an electronic device of another embodiment of the present invention;
fig. 8 is a hardware configuration of an electronic device according to another embodiment of the present invention.
Detailed Description
In order to make those skilled in the art better understand the technical solutions in the embodiments of the present invention, the technical solutions in the embodiments of the present invention will be described clearly and completely with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all embodiments. All other embodiments obtained by a person skilled in the art based on the embodiments of the present invention shall fall within the scope of the protection of the embodiments of the present invention.
The following further describes specific implementation of the embodiments of the present invention with reference to the drawings.
Fig. 1 is a schematic block diagram of a network architecture in which a retrieval method according to an embodiment of the present invention is located. For example, a network architecture according to one example of an embodiment of the present invention may include a client and a server. For example, the user device 150 used by the user 140 is connected to the server via a network or the like. The server side can comprise a server cluster of at least one of a WEB server, an application server and a database server. The server may include a query engine 110 and a compute node. For example, the computing nodes may be a single server or a cluster of servers, e.g., distributed servers such as cloud servers. FIG. 1 illustrates a logical block diagram of executing a query on structured data and unstructured data, according to an embodiment of the invention. In one embodiment, the query engine 110 may be a query engine that can access and process queries on different types of data. For example, as shown in FIG. 1, query engine 110 may access unstructured data 160. In embodiments of the present invention, query engine 110 may implement an interface that supports queries according to a common or common query specification language (e.g., SQL), or any other type of formatted query. In embodiments of the present invention, the query engine 110 can also translate a query received in a different query language format into another query language format for execution (e.g., translate SPL queries into SQL queries).
In embodiments of the present invention, query engine 110 may send index requests to perform queries on unstructured data 160. For example, in one embodiment, an SQL query can be received that specifies another query predicate for unstructured data 160. The index request may generate an execution plan with structured data operations 131 and unstructured data operations 132 in order to execute such SQL queries. In embodiments of the present invention, other queries received may be directed to unstructured data 160, and the appropriate structured data operations 131 or unstructured data operations 131 may be determined and included in the execution plan generated by the index request.
In embodiments of the present invention, query engine 110 may implement structured data query execution to perform structured data operation 131. The query engine 110 may also implement unstructured data query execution to perform unstructured data operations 132, applying the unstructured data operations 132.
It should be appreciated that in embodiments of the present invention, structured data can be data stored according to a predefined data model (e.g., a relational data model), and in one embodiment, the unstructured output store can pre-label the structured data. In one embodiment, structured data may be stored in different types of formats (such as a row-oriented format (e.g., where rows of data are stored together in a single persistent storage block), or a column-oriented format (e.g., where columns of data are stored together in a single persistent storage block).
It should also be understood that the previous description of executing queries against structured data and unstructured data is a logical illustration and, therefore, should not be construed as limiting the implementation of query engines or other illustrated features (such as index requests, structured data query execution, unstructured data query execution, or data stores for structured data or unstructured data).
The specification begins with a general description of a provider network that implements structured data processing that performs queries on structured data and unstructured data. Various examples of structured data processing services are discussed next, including different components/modules or arrangements of components/modules that can be employed as part of implementing the techniques. Many different methods and techniques for implementing queries against structured and unstructured data are then discussed, some of which are illustrated in the accompanying flow charts. Finally, a description of an example computing system is provided on which various components, modules, systems, devices, and/or nodes may be implemented. Various examples are provided throughout the specification.
Fig. 2A is a schematic block diagram of a retrieval method according to another embodiment of the present invention. The retrieval method of fig. 2A, comprising:
210: and acquiring a retrieval condition, wherein the retrieval condition comprises a structured field retrieval condition and an unstructured data retrieval condition based on the same object.
For example, the objects may be processed into structured data, semi-structured data, or unstructured data via processing. For example, the object may be an object such as a picture, sound, or video. For example, the unstructured data may be labeled as having structured data. For example, the unstructured data may be unlabeled but may have its structured data determined through recognition processing. The structured field retrieval condition is a retrieval condition of the structured data for the object. The object may be structured data and may be, for example, attribute information of the object. For example, the structured data may include the size of a picture, the color of a picture, the acquisition time of a picture, the modification time of a picture, the size of a video, the resolution of a video, the encoding mode of a video, and author information of a video. For example, the structured data may also include information about the client, such as the time of the query initiated by the client, the location where the query was initiated by the client, user behavior records in the client, and the like. For example, the structured data may also include information input from the client, such as time, location, price of the particular item, style of the particular item, sales of the particular item, brand of the particular item, etc. input by the client. For example, the unstructured data retrieval condition may be a feature vector retrieval condition. For example, the retrieval condition may be a sub-query generated by the client based on the query. For example, the structured field retrieval condition may be a part or all of the characteristic fields, such as keywords or keywords, possessed for the structured data.
230: and based on the structured field retrieval condition, carrying out structured field indexing to obtain a preamble index result.
For example, a structured search data set is determined, and a search is performed based on all data objects in the structured search data set. Or, determining a structured retrieval data set, and retrieving based on a part of the structured retrieval data set to obtain a preamble index result. For example, a first structured search data set and a second structured search data set are determined, wherein the first structured search data set and the second structured search data set may overlap each other. Alternatively, the first and second structured search data sets may be intersecting sets. For example, a search is performed based on data objects in the first structured search dataset to obtain a preamble index result. In other words, the structured search dataset may include multiple sub datasets, e.g., each dataset including different types of structured data. For example, structured indexing is performed based on the sub data set, resulting in a preamble index result. For example, the above-described case is also applicable when there is a query condition for different types of structured data in the structured field retrieval condition. In other words, the structured field index may be performed based on only a portion of the query conditions, resulting in a preamble index result. For example, a structured data set may include unstructured data.
240: and based on the unstructured data retrieval conditions, performing similarity indexing, and determining an unstructured data indexing result from the previous indexing result to generate a retrieval result.
It should be appreciated that the similarity index described herein may be calculated for similarity in any manner. For example, vector similarity calculation is employed. For example, a clustering or classification method is adopted to perform feature vector retrieval based on the preamble index result. For example, feature vector retrieval is performed based on the entire preamble index results. For example, feature vector retrieval is performed based on partial preamble index results. For example, feature vector search is performed based on all or part of the structured data set described above, and then a feature vector search result is obtained based on the preamble index result. For example, the vector similarity index may be performed based on the first feature vector search condition to obtain a feature vector preliminary index result. For example, a second feature vector retrieval condition is generated based on the preamble result, and a second feature vector similarity index is performed on the feature vector preliminary index result based on the second feature vector retrieval condition to obtain the feature vector index result. It should also be understood that the above-mentioned various indexing methods are only exemplary, and the embodiments of the present invention do not limit this.
Because the preamble index result is obtained through the structured field index with lower calculation cost, the index range is reduced, and the subsequent calculation cost of similarity index based on unstructured data is reduced, thereby saving the calculation resource.
It should be appreciated that ANN retrieval is an approximation process that performs similarity computation on global vectors and sorts the truncations. For example, the bottom layer of Massively Parallel Processing (MPP) analytic data in processing an ANN search needs to extract all vector data to a computation layer for vector computation if no dedicated ANN index exists, and then sort and intercept a part of the sorting, so that the computation overhead is very large. The scheme of the embodiment of the invention saves the computing resources. In addition, the scheme of the invention realizes structured query and keeps good compatibility with the existing equipment.
In another implementation manner of the present invention, performing similarity indexing based on an unstructured data retrieval condition, and determining an unstructured data indexing result from the pre-order indexing result includes: based on the characteristic vector retrieval condition, carrying out vector similarity indexing on the preorder indexing result to obtain a first vector similarity indexing result; based on the first vector similarity index result, a feature vector index result is determined.
For example, as shown in fig. 2B, in each compute node, a vector similarity index is performed on the preamble index result based on the feature vector search condition to obtain a first vector similarity index result. For example, in response to a feature vector retrieval condition, the primary service node sends the same retrieval condition to multiple compute nodes for vector similarity indexing by the corresponding query engines. For example, in response to the feature vector retrieval condition, different feature vector retrieval conditions are sent to different computing nodes so that corresponding query engines perform corresponding vector similarity indexes, thereby providing flexibility of the system for computing resource configuration. For example, the same query condition is sent to different compute nodes in response to the feature vector retrieval condition. For example, the query engines of different compute nodes generate different sub-queries. For example, different compute nodes process according to the sub-query to obtain a processing result of the sub-query. For example, the main service node processes the processing result of the sub-query to obtain a vector similarity index result. In this way, the computational load of the master service node is reduced. Thereby, a separation of functionality from the main service node to the compute node is achieved.
In another implementation manner of the present invention, performing similarity indexing based on an unstructured data retrieval condition, and determining an unstructured data indexing result from the pre-order indexing result includes: based on the feature vector retrieval condition, performing vector similarity index to obtain a second vector similarity index result; and filtering the second vector similarity index result by adopting the preamble index result to obtain a feature vector index result.
For example, a first preamble index result and a second preamble index result are obtained in response to a feature vector retrieval condition. For example, a first preamble index result is used for vector similarity index to obtain a second vector similarity initial index result. For example, the second vector similarity initial index result is filtered by using the second preamble index result, so as to obtain the second vector similarity index result. Optionally, the preamble index result further includes a third preamble index result. For example, a vector similarity index may be performed on the first preamble result to obtain a first initial result. For example, the first initial result may be filtered using the second preamble result to obtain a second initial result. For example, a vector similarity index may be performed using the third preamble result. In other words, the preamble index result may be subjected to lower processing of the vector similarity index processing and the filtering processing. The number of these two processes is not limited. It should be understood that the similarity index may be performed in any manner, and the embodiment of the present invention is not limited thereto.
For example, the processing is performed using the nodes shown in fig. 2C, and it should be understood that the computing nodes shown in fig. 2C may be distributed computing nodes or may be single computing nodes. For example, the processing may also be performed using the compute node shown in FIG. 2B or 2D.
In another implementation manner of the present invention, performing a structured field index based on a structured field search condition to obtain a preamble index result includes: determining an original similarity index data set; indexing in the original similarity index data set based on the structured field retrieval condition to obtain a preamble index result, wherein the method further comprises the following steps: based on the original similarity index dataset, the computational cost of the vector similarity index is determined.
For example, a query engine may be employed to compute a cost calculation module. It should be understood that the calculation overhead calculation module may also be an independent module, which is not limited in the embodiment of the present invention. For example, the calculation overhead calculation module is configured to calculate the calculation overhead of the filtering process described above. For example, the calculation cost calculation module is configured to calculate the calculation cost of the vector similarity. For example, the computation overhead computation module is configured to compute the computation overhead of the structured field retrieval described above. For example, the computation overhead module may compare the computation overhead of each of the filtering process, the structured field error detection process, and the vector similarity calculation process. For example, the query engine is instructed to process based on the comparison results described above.
For example, the calculation overhead of each of the above three processes may be calculated. For example, the sum of the calculation overhead of two of the above-described three kinds of processing may be calculated. For example, the computational overhead of the two processes is compared with the other.
In another implementation manner of the present invention, the vector similarity index includes a first vector similarity index and a second vector similarity index, wherein the similarity index is performed based on an unstructured data retrieval condition, and the determining the unstructured data index result from the previous order index result includes: determining that the computational cost of the first vector similarity index is greater than the computational cost of the second vector similarity index; and performing second vector similarity index based on the feature vector retrieval condition, and determining a feature vector index result from the forward index results.
In another implementation manner of the present invention, performing similarity indexing based on an unstructured data retrieval condition, and determining an unstructured data indexing result from the pre-order indexing result further includes: determining a computation cost of a first vector similarity index based on a first index cost, wherein the first index cost indicates a cost for obtaining a preamble index result by indexing in an original similarity index data set; and determining the calculation cost of the second vector similarity index based on a second index cost and a filtering processing cost, wherein the second index cost indicates the cost for obtaining the feature vector index result by indexing in the preamble index result, and the filtering processing cost is the cost for filtering the second vector similarity index result based on the preamble index result.
In another implementation manner of the present invention, performing similarity indexing based on an unstructured data retrieval condition, and determining an unstructured data indexing result from the pre-order indexing result further includes: determining the calculation cost of a second vector similarity index based on the first index cost, wherein the first index cost indicates the cost for obtaining a preamble index result by indexing in the original similarity index data set; and determining the calculation cost of the first vector similarity index based on a second index cost and a filtering processing cost, wherein the second index cost indicates the cost for obtaining the feature vector index result by indexing in the preamble index result, and the filtering processing cost is the cost for filtering the second vector similarity index result based on the preamble index result.
For example, the structured field index and the vector similarity index may be forward indexes or reverse indexes. For example, the structured field index is a forward index and the vector similarity index may be an inverted index. For example, the structured field index may be an inverted index and the vector similarity index may be a forward index. For example, the structured field index is divided into a first structured field index and a second structured field index, the first structured field index being a forward index and the second structured field index being an inverted index. For example, the vector similarity index includes a first vector similarity index and a second vector similarity index. For example, the first vector similarity index is a forward index. For example, the second vector similarity index is an inverted index.
In another implementation manner of the present invention, the structured field search condition includes a first search condition for a first structured field and a feature vector search condition for a second structured field, where performing a structured field index based on the structured field search condition to obtain a preamble index result includes: based on a first retrieval condition, performing first reverse indexing on the first structured field to obtain a first reverse indexing result; and performing second inverted indexing on the first inverted indexing result based on the second structured field retrieval condition to obtain a preamble indexing result.
In another implementation of the invention, the method further comprises: performing row number indexing on the unstructured data indexing result to obtain at least one object corresponding to the unstructured data indexing result; a search result is generated that includes at least one object.
Fig. 3 is a schematic block diagram of a network architecture in which a query method according to another embodiment of the present invention is located. For example, a network architecture according to one example of an embodiment of the present invention may include a client and a server. For example, the user device 150 used by the user 140 is connected to the server via a network or the like. The server side can comprise a server cluster of at least one of a WEB server, an application server and a database server. The server may include a query engine 110 and a compute node. For example, the computing nodes may be a single server or a cluster of servers, e.g., distributed servers such as cloud servers. FIG. 3 illustrates a logical block diagram of executing a query on structured data and unstructured data, according to an embodiment of the invention. In one embodiment, the query engine 110 may be a query engine that can access and process queries on different types of data. For example, as shown in FIG. 3, query engine 110 may access unstructured data 160. In embodiments of the present invention, query engine 110 may implement an interface that supports queries according to a common or common query specification language (e.g., SQL), or any other type of formatted query. In embodiments of the present invention, the query engine 110 can also translate a query received in a different query language format into another query language format for execution (e.g., translate SPL queries into SQL queries).
In embodiments of the present invention, query engine 110 may send index requests to perform queries on unstructured data 160. For example, in one embodiment, an SQL query can be received that specifies another query predicate for unstructured data 160. The index request may generate an execution plan with structured data operations 131 and unstructured data operations 132 in order to execute such SQL queries. In embodiments of the present invention, other queries received may be directed to unstructured data 160, and the appropriate structured data operations 131 or unstructured data operations 131 may be determined and included in the execution plan generated by the index request.
In embodiments of the present invention, query engine 110 may implement structured data query execution to perform structured data operation 131. The query engine 110 may also implement unstructured data query execution to perform unstructured data operations 132, applying the unstructured data operations 132.
Fig. 4A is a schematic block diagram of a query method according to another embodiment of the present invention. The query method of fig. 4A, comprising:
410: and acquiring a query request, wherein the query request comprises an object to be queried.
420: and responding to the query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried.
430: and sending the structured field retrieval condition and the unstructured data retrieval condition to the server side, so that the server side carries out structured field indexing based on the structured field retrieval condition to obtain a preamble index result, carries out similarity indexing based on the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result.
440: and receiving a query result based on the feature vector index result returned by the server.
Because the preamble index result is obtained through the structured field index with lower calculation cost, the index range is reduced, and the subsequent calculation cost of similarity index based on unstructured data is reduced, thereby saving the calculation resource. Accordingly, the query efficiency is improved for the front end accordingly.
For example, the query condition sent by the client includes a hybrid query intent. For example, the query condition is that 10 vector records closest to the query vector appear in city number 101 on the day of 20180910 are searched. For example, the query vector is a target picture. For example, the target object in the target picture is labeled blue. For example, the query conditions may also include other structured data.
It should be understood that the client may include a corresponding method or responsive module. For example, a module included in the client, such as ann _ distance, may compute the results from the query vector and the recorded vectors of the client. For example, when the front-end node receives the query, the query of the custom table is disassembled and rewritten. For example, the front-end node may issue the sub-query to one or more computing nodes. For example, the query received by the one or more compute nodes may be a query with three filter criteria. For example, the three filtering conditions described above may be performed in parallel or in series. For example, the queries may be performed serially. For example, a structured query is performed first, followed by an ANN query. For example, the ANN query may be performed first, followed by the structured query, and then followed by the ANN query. For example, in this example, the compute node queries the inverted index of the Date field and the inverted index of the CityId field in parallel. For example, the results of the first two inverted indexes are merged and intersected at an inverted index Collector (reverse Search Collector). For example, the reverse Search Collector transmits the intermediate result of the inverted index to the ANN index Collector (Ann Search Collector). For example, an Ann Search Collector performs ANN Search using an index on the basis of a preamble result set. For example, the Ann Search process of Ann Search Collector is divided into two ways, Feeder (Feeder) and Filter (Filter). For example, Feeder performs calculations driven by the preamble results. For example, Filter computes index-driven and filters using the preamble results. The Ann Search Collector adopts a calculation overhead calculation module to calculate a better retrieval mode. The computation Cost computation module may be implemented as a Cost Based Optimizer. For example, the preferred search method may be the search method with the least computational overhead. For example, a Mixed Index Search (Mixed Index Search) of structured Search and vector similarity Search may be performed. For example, the hybrid index search may include the alternating search patterns described above. For example, after the retrieval process is completed, the retrieval result line number is passed to a Detail Manager (Detail Manager). For example, the detail manager described above may have a function of line number indexing. For example, detail data of the result row is acquired. For example, a front-end node, such as a client that initiates a query, may collect query results for all or a portion of the compute nodes. For example, a front-end node, such as a client, can re-truncate all records according to ann _ distance. It should be understood that the above flow description is only one example of a hybrid retrieval flow of an embodiment of the present invention. For example, embodiments of the present invention may be directed to queries for only a single type of condition. The corresponding operations and flows are as described above, for example, the computing node may skip part of the retrieval flow, and the process is not described again.
Fig. 4B is a schematic block diagram of a query method according to another embodiment of the present invention. It should be understood that the query method of fig. 4B may be applied to the network architecture shown in fig. 3, and other network architectures may also be used. The embodiment of the present invention is not limited thereto. The query method of fig. 4B, comprising:
450: and acquiring a display picture query request, wherein the display picture query request comprises a picture to be queried.
460: and responding to the display picture query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the picture to be queried.
470: and sending the structured field retrieval condition and the unstructured data retrieval condition to the server, so that the server performs structured field indexing in the display picture data set based on the structured field retrieval condition to obtain a preamble index result, performs similarity indexing based on the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result.
480: and receiving a query result based on the feature vector index result returned by the server, wherein the query result comprises at least one display picture meeting the query request of the display picture in the display picture data set.
Because the preamble index result is obtained through the structured field index with lower calculation cost, the index range is reduced, and the subsequent calculation cost of similarity index based on unstructured data is reduced, thereby saving the calculation resource. Accordingly, for the front-end node, picture query efficiency is provided.
Fig. 5 is a schematic block diagram of a retrieval apparatus according to another embodiment of the present invention. The search device of fig. 5 includes:
an obtaining module 510 for obtaining a search condition, wherein the search condition comprises a structured field search condition and an unstructured data search condition based on the same object,
a structured field indexing module 520, which performs structured field indexing based on the structured field retrieval condition to obtain a preamble indexing result;
the similarity indexing module 530 performs similarity indexing based on the unstructured data retrieval conditions, and determines an unstructured data indexing result from the pre-order indexing result to generate a retrieval result.
Because the preamble index result is obtained through the structured field index with lower calculation cost, the index range is reduced, and the subsequent calculation cost of similarity index based on unstructured data is reduced, thereby saving the calculation resource.
The method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc.
In another implementation manner of the present invention, the similarity indexing module is specifically configured to: based on the characteristic vector retrieval condition, carrying out vector similarity indexing on the preorder indexing result to obtain a first vector similarity indexing result; based on the first vector similarity index result, a feature vector index result is determined.
In another implementation manner of the present invention, the similarity indexing module is specifically configured to: based on the feature vector retrieval condition, performing vector similarity index to obtain a second vector similarity index result; and filtering the second vector similarity index result by adopting the preamble index result to obtain a feature vector index result.
In another implementation manner of the present invention, the structured field indexing module is specifically configured to: determining an original similarity index data set; indexing in the original similarity index data set based on the structured field retrieval condition to obtain a preamble index result, wherein the device further comprises a calculation overhead determining module: based on the original similarity index dataset, the computational cost of the vector similarity index is determined.
In another implementation manner of the present invention, the vector similarity index includes a first vector similarity index and a second vector similarity index, where the similarity index module is specifically configured to: determining that the computational cost of the first vector similarity index is greater than the computational cost of the second vector similarity index; and performing second vector similarity index based on the feature vector retrieval condition, and determining a feature vector index result from the forward index results.
In another implementation manner of the present invention, the similarity indexing module is further configured to: determining a computation cost of a first vector similarity index based on a first index cost, wherein the first index cost indicates a cost for obtaining a preamble index result by indexing in an original similarity index data set; and determining the calculation cost of the second vector similarity index based on a second index cost and a filtering processing cost, wherein the second index cost indicates the cost for obtaining the feature vector index result by indexing in the preamble index result, and the filtering processing cost is the cost for filtering the second vector similarity index result based on the preamble index result.
In another implementation manner of the present invention, the similarity indexing module is further configured to: determining the calculation cost of a second vector similarity index based on the first index cost, wherein the first index cost indicates the cost for obtaining a preamble index result by indexing in the original similarity index data set; and determining the calculation cost of the first vector similarity index based on a second index cost and a filtering processing cost, wherein the second index cost indicates the cost for obtaining the feature vector index result by indexing in the preamble index result, and the filtering processing cost is the cost for filtering the second vector similarity index result based on the preamble index result.
In another implementation manner of the present invention, the structured field search condition includes a first search condition for a first structured field and a feature vector search condition for a second structured field, where the structured field indexing module is specifically configured to: based on a first retrieval condition, performing first reverse indexing on the first structured field to obtain a first reverse indexing result; and performing second inverted indexing on the first inverted indexing result based on the second structured field retrieval condition to obtain a preamble indexing result.
In another implementation of the present invention, the apparatus further comprises: a retrieval result generation module: performing row number indexing on the unstructured data indexing result to obtain at least one object corresponding to the unstructured data indexing result; a search result is generated that includes at least one object.
The apparatus of this embodiment is used to implement the corresponding method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not described herein again.
FIG. 6A is a schematic block diagram of a querying device of another embodiment of the present invention; the querying device of fig. 6A, comprising:
the obtaining module 610 obtains a query request, where the query request includes an object to be queried;
the retrieval condition generating module 620 responds to the query request and generates a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried;
a sending module 630, configured to send the structured field search condition and the unstructured data search condition to the server, so that the server performs structured field indexing based on the structured field search condition to obtain a preamble index result, performs similarity indexing based on the unstructured data search condition, and determines an unstructured data index result from the preamble index result;
and the receiving module 640 receives the query result based on the feature vector index result returned by the server.
Because the preamble index result is obtained through the structured field index with lower calculation cost, the index range is reduced, and the subsequent calculation cost of similarity index based on unstructured data is reduced, thereby saving the calculation resource. Accordingly, query efficiency is provided for the front-end node.
The method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc.
The apparatus of this embodiment is used to implement the corresponding method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not described herein again.
Fig. 6B is a schematic block diagram of a query device according to another embodiment of the present invention. The querying device of fig. 6B, comprising:
the acquisition module 650 acquires a display picture query request, wherein the display picture query request includes a picture to be queried;
the retrieval condition generation module 660 responds to the displayed picture query request and generates a structured field retrieval condition and an unstructured data retrieval condition of the picture to be queried;
the sending module 670 sends the structured field retrieval condition and the unstructured data retrieval condition to the server, so that the server performs structured field indexing in the display picture data set based on the structured field retrieval condition to obtain a preamble index result, performs similarity indexing based on the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result;
the receiving module 680 receives a query result based on the feature vector index result returned by the server, where the query result includes at least one display picture in the display picture data set that meets the display picture query request.
It should be understood that the objects may be processed into structured data, semi-structured data, or unstructured data via processing. For example, the object may be an object such as a picture, sound, or video. For example, the unstructured data may be labeled as having structured data. For example, the unstructured data may be unlabeled but may have its structured data determined through recognition processing. The structured field retrieval condition is a retrieval condition of the structured data for the object. The object may be structured data and may be, for example, attribute information of the object. For example, the structured data may include the size of a picture, the color of a picture, the acquisition time of a picture, the modification time of a picture, the size of a video, the resolution of a video, the encoding mode of a video, and author information of a video. For example, the structured data may also include information about the client, such as the time of the query initiated by the client, the location where the query was initiated by the client, user behavior records in the client, and the like. For example, the structured data may also include information input from the client, such as time, location, price of the particular item, style of the particular item, sales of the particular item, brand of the particular item, etc. input from the client. For example, the feature vector search condition is a search condition for unstructured data. For example, the retrieval condition may be a sub-query generated by the client based on the query. For example, the structured field retrieval condition may be a part or all of the characteristic fields, such as keywords or keywords, possessed for the structured data.
Because the preamble index result is obtained through the structured field index with lower calculation cost, the index range is reduced, and the subsequent calculation cost of similarity index based on unstructured data is reduced, thereby saving the calculation resource. Accordingly, for the front-end node, the picture query efficiency is improved.
The method of the present embodiment may be performed by any suitable electronic device having data processing capabilities, including but not limited to: server, mobile terminal (such as mobile phone, PAD, etc.), PC, etc.
The apparatus of this embodiment is used to implement the corresponding method in the foregoing method embodiments, and has the beneficial effects of the corresponding method embodiments, which are not described herein again. In addition, the functional implementation of each module in the apparatus of this embodiment can refer to the description of the corresponding part in the foregoing method embodiment, and is not described herein again.
Fig. 6C is a schematic block diagram of a query device according to another embodiment of the present invention. The query system of FIG. 6C, comprising:
a front-end node 6100 and at least one computing node 6200,
front-end node 6100 to: acquiring a query request, wherein the query request comprises an object to be queried; responding to the query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of an object to be queried; sending a structured field retrieval condition and an unstructured data retrieval condition to at least one computing node; receiving a query result sent by at least one computing node based on the unstructured data index result,
the at least one computing node 6200 is to: based on the structured field retrieval condition, carrying out structured field indexing to obtain a preorder indexing result; based on the unstructured data retrieval condition, carrying out similarity index, and determining an unstructured data index result from the front-order index result; the results are indexed based on the unstructured data, and query results are generated.
Because the preamble index result is obtained through the structured field index with lower calculation cost, the index range is reduced, and the subsequent calculation cost of similarity index based on unstructured data is reduced, thereby saving the calculation resource.
It should be understood that in each compute node, the vector similarity index result may be obtained by performing a vector similarity index on the preamble index result based on the feature vector search condition. For example, in response to a feature vector retrieval condition, the same retrieval condition is sent, either via the master computing node or directly to at least one computing node (e.g., multiple computing nodes), for vector similarity indexing by the respective query engine. For example, in response to the feature vector retrieval condition, different feature vector retrieval conditions are sent to different computing nodes so that corresponding query engines perform corresponding vector similarity indexes, thereby providing flexibility of the system for computing resource configuration. For example, the same or different query conditions are sent to different computing nodes in response to the feature vector retrieval conditions. In one example, the query engines of different compute nodes generate different sub-queries. For example, different compute nodes process according to the sub-query to obtain a processing result of the sub-query. In another example, the main service node processes the processing result of the sub-query to obtain a vector similarity index result. In this way, the computational load of the master service node is reduced. Thereby, a separation of functionality from the main service node to the compute node is achieved.
In addition, the respective described embodiments in fig. 2B-2D are equally applicable to this example of fig. 6C.
In another implementation manner of the present invention, at least one of the computing nodes is a first computing node and a second computing node, the unstructured data retrieval condition is a feature vector retrieval condition, and the front-end node is specifically configured to: responding to the query request, and generating a structured field retrieval condition and a feature vector retrieval condition of an object to be queried; sending a structured field retrieval condition to a first serving node, and sending a feature vector retrieval condition to a second serving node, the first serving node being specifically configured to: based on the structured field retrieval condition, carrying out structured field indexing to obtain a preorder indexing result; sending the preamble index result to a second serving node, where the second serving node is specifically configured to: based on the characteristic vector retrieval condition, carrying out vector similarity index, and determining an unstructured data index result from the forward index result; generating a query result based on the unstructured data index result; the query results are returned to the front-end node.
FIG. 7 is a schematic structural diagram of an electronic device according to another embodiment of the invention; the electronic device may include:
one or more processors 701;
a computer-readable medium 702, which may be configured to store one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a retrieval method as described in the above embodiments or a query method as described in the above embodiments.
Fig. 8 is a hardware configuration of an electronic apparatus according to another embodiment of the present invention; as shown in fig. 8, the hardware structure of the electronic device may include: a processor 801, a communication interface 802, a computer-readable medium 803, and a communication bus 804.
Wherein the processor 801, the communication interface 802, and the computer-readable medium 803 communicate with each other via a communication bus 804.
Alternatively, the communication interface 802 may be an interface of a communication module.
The processor 801 may be specifically configured to: acquiring a retrieval condition, wherein the retrieval condition comprises a structured field retrieval condition and an unstructured data retrieval condition based on the same object; based on the structured field retrieval condition, carrying out structured field indexing to obtain a preorder indexing result; based on the unstructured data retrieval conditions, similarity indexing is performed, unstructured data indexing results are determined from the prior indexing results, or,
acquiring a query request, wherein the query request comprises an object to be queried; responding to the query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried; sending a structured field retrieval condition and an unstructured data retrieval condition to a server so that the server performs structured field indexing based on the structured field retrieval condition to obtain a preamble index result, performs similarity indexing based on the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result; and receiving a query result based on the unstructured data index result returned by the server.
The Processor 801 may be a general-purpose Processor, and includes a Central Processing Unit (CPU), a Network Processor (NP), and the like; but may also be a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off-the-shelf programmable gate array (FPGA) or other programmable logic device, discrete gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like.
The computer-readable medium 803 may be, but is not limited to, a Random Access Memory (RAM), a Read-Only Memory (ROM), a Programmable Read-Only Memory (PROM), an Erasable Read-Only Memory (EPROM), an electrically Erasable Read-Only Memory (EEPROM), and the like.
In particular, according to an embodiment of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code configured to perform the method illustrated by the flow chart. In such an embodiment, the computer program may be downloaded and installed from a network via the communication section, and/or installed from a removable medium. The computer program, when executed by a Central Processing Unit (CPU), performs the above-described functions defined in the method of the present application. It should be noted that the computer readable medium described herein can be a computer readable signal medium or a computer readable storage medium or any combination of the two. The computer readable medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access storage media (RAM), a read-only storage media (ROM), an erasable programmable read-only storage media (EPROM or flash memory), an optical fiber, a portable compact disc read-only storage media (CD-ROM), an optical storage media piece, a magnetic storage media piece, or any suitable combination of the foregoing. In the present application, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In this application, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
Computer program code configured to carry out operations for the present application may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C + +, and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the computer, partly on the computer, as a stand-alone software package, partly on the computer and partly on a remote computer or entirely on the remote computer or server. In the case of a remote computer, the remote computer may operate over any of a variety of networks: including a Local Area Network (LAN) or a Wide Area Network (WAN) -to the computer, or alternatively, to an external computer (for example, through the internet using an internet service provider).
The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present application. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions configured to implement the specified logical function(s). In the above embodiments, specific precedence relationships are provided, but these precedence relationships are only exemplary, and in particular implementations, the steps may be fewer, more, or the execution order may be modified. That is, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams and/or flowchart illustration, and combinations of blocks in the block diagrams and/or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The modules described in the embodiments of the present application may be implemented by software or hardware. The names of these modules do not in some cases constitute a limitation of the module itself.
As another aspect, the present application also provides a computer-readable medium, on which a computer program is stored, which when executed by a processor implements the method as described in the embodiments above.
As another aspect, the present application also provides a computer-readable medium, which may be contained in the apparatus described in the above embodiments; or may be present separately and not assembled into the device. The computer readable medium carries one or more programs which, when executed by the apparatus, cause the apparatus to: acquiring a retrieval condition, wherein the retrieval condition comprises a structured field retrieval condition and an unstructured data retrieval condition based on the same object; based on the structured field retrieval condition, carrying out structured field indexing to obtain a preorder indexing result; based on the unstructured data retrieval conditions, similarity indexing is performed, unstructured data indexing results are determined from the prior indexing results, or,
acquiring a query request, wherein the query request comprises an object to be queried; responding to the query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried; sending a structured field retrieval condition and an unstructured data retrieval condition to a server so that the server performs structured field indexing based on the structured field retrieval condition to obtain a preamble index result, performs similarity indexing based on the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result; and receiving a query result based on the unstructured data index result returned by the server.
The expressions "first", "second", "said first" or "said second" used in various embodiments of the present disclosure may modify various components regardless of order and/or importance, but these expressions do not limit the respective components. The above description is only configured for the purpose of distinguishing elements from other elements. For example, the first user equipment and the second user equipment represent different user equipment, although both are user equipment. For example, a first element could be termed a second element, and, similarly, a second element could be termed a first element, without departing from the scope of the present disclosure.
When an element (e.g., a first element) is referred to as being "operably or communicatively coupled" or "connected" (operably or communicatively) to "another element (e.g., a second element) or" connected "to another element (e.g., a second element), it is understood that the element is directly connected to the other element or the element is indirectly connected to the other element via yet another element (e.g., a third element). In contrast, it is understood that when an element (e.g., a first element) is referred to as being "directly connected" or "directly coupled" to another element (a second element), no element (e.g., a third element) is interposed therebetween.
The above description is only a preferred embodiment of the application and is illustrative of the principles of the technology employed. It will be appreciated by those skilled in the art that the scope of the invention herein disclosed is not limited to the particular combination of features described above, but also encompasses other arrangements formed by any combination of the above features or their equivalents without departing from the spirit of the invention. For example, the above features may be replaced with (but not limited to) features having similar functions disclosed in the present application.

Claims (18)

1. A retrieval method, comprising:
acquiring a retrieval condition, wherein the retrieval condition comprises a structured field retrieval condition and an unstructured data retrieval condition based on the same object;
based on the structured field retrieval condition, carrying out structured field indexing to obtain a preorder indexing result;
and performing similarity index based on the unstructured data retrieval condition, and determining an unstructured data index result from the preamble index result to generate a retrieval result.
2. The method of claim 1, wherein the performing similarity indexing based on the unstructured data retrieval condition, determining an unstructured data index result from the preamble index results, comprises:
based on the characteristic vector retrieval condition, carrying out vector similarity indexing on the preorder indexing result to obtain a first vector similarity indexing result;
determining a feature vector index result based on the first vector similarity index result.
3. The method of claim 1, wherein the performing similarity indexing based on the unstructured data retrieval condition, determining an unstructured data index result from the preamble index results, comprises:
based on the feature vector retrieval condition, performing vector similarity index to obtain a second vector similarity index result;
and filtering the second vector similarity index result by adopting the preamble index result to obtain a feature vector index result.
4. The method of claim 1, wherein performing a structured field index based on the structured field search criteria to obtain a preamble index result comprises:
determining an original similarity index data set;
indexing in the original similarity index dataset based on the structured field retrieval condition to obtain the preamble index result, wherein the method further comprises:
determining a computational cost of the vector similarity index based on the original similarity index dataset.
5. The method of claim 4, wherein the vector similarity index comprises a first vector similarity index and a second vector similarity index, wherein the performing similarity indexing based on the unstructured data retrieval condition to determine an unstructured data index result from the preamble index results comprises:
determining that a computational cost of the first vector similarity index is greater than a computational cost of the second vector similarity index;
and performing second vector similarity index based on the feature vector retrieval condition, and determining a feature vector index result from the preamble index results.
6. The method of claim 5, wherein the performing similarity indexing based on the unstructured data retrieval condition, determining an unstructured data indexing result from the preamble indexing results, further comprises:
determining a computation cost of the first vector similarity index based on a first index cost, wherein the first index cost indicates a cost of indexing in the original similarity index dataset to obtain a preamble index result;
determining a calculation cost of the second vector similarity index based on a second index cost and a filtering processing cost, wherein the second index cost indicates a cost for indexing the feature vector index result in the preamble index result, and the filtering processing cost is a cost for filtering the second vector similarity index result based on the preamble index result.
7. The method of claim 5, wherein the performing similarity indexing based on the unstructured data retrieval condition, determining an unstructured data indexing result from the preamble indexing results, further comprises:
determining a computation cost of the second vector similarity index based on a first index cost, wherein the first index cost indicates a cost of indexing in the original similarity index dataset to obtain a preamble index result;
determining a computation cost of the first vector similarity index based on a second index cost and a filtering processing cost, wherein the second index cost indicates a cost for indexing the feature vector index result in the preamble index result, and the filtering processing cost is a cost for filtering the second vector similarity index result based on the preamble index result.
8. The method of claim 1, wherein the structured field retrieval condition comprises a first retrieval condition for a first structured field and a feature vector retrieval condition for a second structured field, wherein,
the performing a structured field index based on the structured field retrieval condition to obtain a preamble index result includes:
based on the first retrieval condition, performing first reverse indexing on the first structured field to obtain a first reverse indexing result;
and performing second inverted indexing on the first inverted indexing result based on the second structured field retrieval condition to obtain the preorder indexing result.
9. The method of claim 1, wherein the method further comprises:
performing row number indexing on the unstructured data indexing result to obtain at least one object corresponding to the unstructured data indexing result;
generating a retrieval result including the at least one object.
10. A method of querying, comprising:
acquiring a query request, wherein the query request comprises an object to be queried;
responding to the query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried;
sending the structured field retrieval condition and the unstructured data retrieval condition to a server, so that the server performs structured field indexing based on the structured field retrieval condition to obtain a preamble index result, performs similarity indexing based on the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result;
and receiving a query result based on the unstructured data index result returned by the server.
11. A method of querying, comprising:
acquiring a display picture query request, wherein the display picture query request comprises a picture to be queried;
responding to the display picture query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the picture to be queried;
sending the structured field retrieval condition and the unstructured data retrieval condition to a server, so that the server performs structured field indexing in a display picture data set based on the structured field retrieval condition to obtain a preamble index result, performs similarity indexing based on the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result;
and receiving a query result based on the unstructured data index result returned by the server, wherein the query result comprises at least one display picture meeting the display picture query request in the display picture data set.
12. An inquiry apparatus comprising:
the system comprises an acquisition module, a query module and a query module, wherein the acquisition module is used for acquiring a display picture query request which comprises a picture to be queried;
the retrieval condition generation module responds to the display picture query request and generates a structured field retrieval condition and an unstructured data retrieval condition of the picture to be queried;
the sending module is used for sending the structured field retrieval condition and the unstructured data retrieval condition to a server so that the server can perform structured field indexing in a display picture data set based on the structured field retrieval condition to obtain a preamble index result, perform similarity indexing based on the unstructured data retrieval condition, and determine an unstructured data index result from the preamble index result;
and the receiving module is used for receiving a query result which is returned by the server and is based on the unstructured data index result, wherein the query result comprises at least one display picture which meets the display picture query request in the display picture data set.
13. A retrieval apparatus, comprising:
an obtaining module for obtaining a search condition, wherein the search condition comprises a structured field search condition and an unstructured data search condition based on the same object,
the structured field index module is used for carrying out structured field index based on the structured field retrieval condition to obtain a preorder index result;
and the similarity index module is used for carrying out similarity index based on the unstructured data retrieval condition and determining an unstructured data index result from the preorder index result so as to generate a retrieval result.
14. An inquiry apparatus comprising:
the acquisition module acquires a query request, wherein the query request comprises an object to be queried;
the retrieval condition generation module responds to the query request and generates a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried;
the sending module is used for sending the structured field retrieval condition and the unstructured data retrieval condition to a server so that the server performs structured field indexing on the basis of the structured field retrieval condition to obtain a preamble index result, performs similarity indexing on the basis of the unstructured data retrieval condition, and determines an unstructured data index result from the preamble index result;
and the receiving module is used for receiving the query result based on the unstructured data index result returned by the server.
15. A query system, comprising:
a front-end node and at least one computing node,
the front-end node is configured to: acquiring a query request, wherein the query request comprises an object to be queried;
responding to the query request, and generating a structured field retrieval condition and an unstructured data retrieval condition of the object to be queried;
sending the structured field retrieval condition and the unstructured data retrieval condition to the at least one computing node;
receiving a query result sent by the at least one computing node based on the unstructured data index result,
the at least one compute node is to: based on the structured field retrieval condition, carrying out structured field indexing to obtain a preorder indexing result;
based on the unstructured data retrieval condition, carrying out similarity index, and determining an unstructured data index result from the preorder index result;
and generating a query result based on the unstructured data index result.
16. The query system of claim 15, wherein the at least one compute node is a first compute node and a second compute node, the unstructured data retrieval condition is a feature vector retrieval condition,
the front-end node is specifically configured to: responding to the query request, and generating a structured field retrieval condition and a feature vector retrieval condition of the object to be queried;
sending the structured field retrieval condition to the first serving node and the feature vector retrieval condition to the second serving node,
the first service node is specifically configured to: based on the structured field retrieval condition, carrying out structured field indexing to obtain a preorder indexing result;
sending the preamble index result to the second serving node,
the second serving node is specifically configured to: based on the characteristic vector retrieval condition, carrying out vector similarity index, and determining an unstructured data index result from the preorder index result;
generating a query result based on the unstructured data index result;
returning the query result to the front-end node.
17. An electronic device, the device comprising:
one or more processors;
a computer readable medium configured to store one or more programs,
when executed by the one or more processors, cause the one or more processors to implement a retrieval method as recited in any one of claims 1-9 or a query method as recited in claim 10 or 11.
18. A computer-readable medium, on which a computer program is stored which, when being executed by a processor, carries out the retrieval method according to any one of claims 1 to 9 or the query method according to claim 10 or 11.
CN202010291995.7A 2020-04-14 2020-04-14 Retrieval method, query method, device, system, electronic equipment and computer storage medium Pending CN113297454A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202010291995.7A CN113297454A (en) 2020-04-14 2020-04-14 Retrieval method, query method, device, system, electronic equipment and computer storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010291995.7A CN113297454A (en) 2020-04-14 2020-04-14 Retrieval method, query method, device, system, electronic equipment and computer storage medium

Publications (1)

Publication Number Publication Date
CN113297454A true CN113297454A (en) 2021-08-24

Family

ID=77317958

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010291995.7A Pending CN113297454A (en) 2020-04-14 2020-04-14 Retrieval method, query method, device, system, electronic equipment and computer storage medium

Country Status (1)

Country Link
CN (1) CN113297454A (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150304A (en) * 2023-03-28 2023-05-23 阿里云计算有限公司 Data query method, electronic device and storage medium

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116150304A (en) * 2023-03-28 2023-05-23 阿里云计算有限公司 Data query method, electronic device and storage medium

Similar Documents

Publication Publication Date Title
Kang et al. Blazeit: Optimizing declarative aggregation and limit queries for neural network-based video analytics
KR102627690B1 (en) Dimensional context propagation techniques for optimizing SKB query plans
Konrath et al. Schemex—efficient construction of a data catalogue by stream-based indexing of linked data
WO2021135323A1 (en) Method and apparatus for fusion processing of municipal multi-source heterogeneous data, and computer device
US10311055B2 (en) Global query hint specification
Kang et al. Challenges and Opportunities in DNN-Based Video Analytics: A Demonstration of the BlazeIt Video Query Engine.
US11720606B1 (en) Automated geospatial data analysis
US20170337229A1 (en) Spatial indexing for distributed storage using local indexes
US20120117054A1 (en) Query Analysis in a Database
Alarabi et al. TAREEG: A MapReduce-based system for extracting spatial data from OpenStreetMap
CN112395303A (en) Query execution method and device, electronic equipment and computer readable medium
US20200320045A1 (en) Sytems and methods for context-independent database search paths
US10152509B2 (en) Query hint learning in a database management system
US20160342652A1 (en) Database query cursor management
Quoc et al. An elastic and scalable spatiotemporal query processing for linked sensor data
Giangreco et al. ADAM pro: Database support for big multimedia retrieval
US11354313B2 (en) Transforming a user-defined table function to a derived table in a database management system
JPWO2012173267A1 (en) VIDEO PROCESSING SYSTEM, VIDEO PROCESSING METHOD, VIDEO PROCESSING DATABASE GENERATION METHOD AND ITS DATABASE, VIDEO PROCESSING DEVICE, ITS CONTROL METHOD AND CONTROL PROGRAM
US10628421B2 (en) Managing a single database management system
Alsubaiee et al. Asterix: scalable warehouse-style web data integration
CN113297454A (en) Retrieval method, query method, device, system, electronic equipment and computer storage medium
CN113779349A (en) Data retrieval system, apparatus, electronic device, and readable storage medium
US10248702B2 (en) Integration management for structured and unstructured data
Wang et al. Geo-store: a spatially-augmented sparql query evaluation system
CN107368477B (en) HBase coprocessor-based SQL-like query method and system

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
REG Reference to a national code

Ref country code: HK

Ref legal event code: DE

Ref document number: 40059128

Country of ref document: HK