CN111414527B

CN111414527B - Query method, device and storage medium for similar items

Info

Publication number: CN111414527B
Application number: CN202010182774.6A
Authority: CN
Inventors: 栗波
Original assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Current assignee: Tencent Music Entertainment Technology Shenzhen Co Ltd
Priority date: 2020-03-16
Filing date: 2020-03-16
Publication date: 2023-10-10
Anticipated expiration: 2040-03-16
Also published as: CN111414527A

Abstract

The embodiment of the application discloses a query method, a query device and a storage medium for similar items. According to the scheme, a similar item query instruction of a business module for a target item is received, the similar item query instruction comprises an identification of the target item, a pre-generated algorithm model file is loaded into a memory, wherein the algorithm model file comprises query algorithm model data, a mapping relation between indexes of the item and the identifications of the item, the query algorithm model data is used for representing a query algorithm model, the algorithm model file is analyzed to obtain a query algorithm model and each mapping relation, an index corresponding to the identification of the target item is obtained according to the mapping relation, the query algorithm model is called by using the index of the target item to obtain an index of the similar item similar to the target item, an identification corresponding to the index of the similar item is obtained according to the mapping relation, and the identification of the similar item is returned to the business module. The application can improve the efficiency and accuracy of neighbor query.

Description

Query method, device and storage medium for similar items

Technical Field

The application relates to the technical field of data processing, in particular to a similar item query method, a similar item query device and a storage medium.

Background

At present, personalized recommendation is a standard of internet products, and a recommendation system is generated. The core in the recommendation system is that proper commodities are selected from a large number of commodity libraries to be finally displayed to users, and due to the large number of commodity libraries, the architecture of the recommendation system is often composed of three parts: recall, sort, reorder. The index service of using vectors to query similar articles in the recall layer becomes an important one-way recall mode in the recall algorithm.

When the nearest neighbor algorithm is used for nearest neighbor query, when the original nearest neighbor algorithm is used, the service needs to maintain the mapping relation from the index to the Item ID, especially when a plurality of nearest neighbor algorithm models are stored on line, the mapping relation among the plurality of nearest neighbor algorithm model versions is easy to be confused, the query result is wrong, and the query feedback efficiency is low.

Disclosure of Invention

The embodiment of the invention provides a query method, a query device and a storage medium for similar items, aiming at improving the query efficiency and the query accuracy of neighbor queries.

The embodiment of the invention provides a query method for similar items, which comprises the following steps:

receiving a similar item query instruction of a business module for a target item, wherein the similar item query instruction comprises an identification of the target item;

Loading a pre-generated algorithm model file into a memory, wherein the algorithm model file comprises query algorithm model data, a mapping relation between an index of an item and an identifier of the item, and the query algorithm model data is used for representing a query algorithm model;

analyzing the algorithm model file to obtain a query algorithm model and each mapping relation;

obtaining an index corresponding to the identification of the target item according to the mapping relation;

invoking the query algorithm model using the index of the target item to obtain an index of a similar item similar to the target item;

obtaining an identifier corresponding to the index of the similar item according to the mapping relation;

and returning the identification of the similar items to the service module.

The embodiment of the invention also provides a query device for similar items, which comprises:

the receiving unit is used for receiving a similar item inquiry instruction of the business module on the target item, wherein the similar item inquiry instruction comprises an identification of the target item;

the loading unit is used for loading a pre-generated algorithm model file into the memory, wherein the algorithm model file comprises query algorithm model data, a mapping relation between indexes of items and identifications of the items, and the query algorithm model data is used for representing a query algorithm model;

The analysis unit is used for analyzing the algorithm model file to obtain a query algorithm model and each mapping relation;

the calling unit is used for obtaining an index corresponding to the identification of the target item according to the mapping relation, and calling the query algorithm model by using the index of the target item so as to obtain an index of a similar item similar to the target item;

and the return unit is used for obtaining the identification corresponding to the index of the similar item according to the mapping relation and returning the identification of the similar item to the service module.

The embodiment of the application also provides a storage medium which stores a plurality of instructions, wherein the instructions are suitable for being loaded by a processor to execute the query method of any similar item provided by the embodiment of the application.

According to the optimization scheme of the approximate nearest neighbor algorithm, a similar item query instruction of a business module for a target item is received, the similar item query instruction comprises an identification of the target item, a pre-generated algorithm model file is loaded into a memory, wherein the algorithm model file comprises query algorithm model data and mapping relations between indexes of the item and the identification of the item, the query algorithm model data is used for representing a query algorithm model, the algorithm model file is analyzed to obtain the query algorithm model and each mapping relation, indexes corresponding to the identifications of the target item are obtained according to the mapping relations, the query algorithm model is called by using the indexes of the target item to obtain indexes of similar items similar to the target item, the identifications corresponding to the indexes of the similar items are obtained according to the mapping relation, and the identifications of the similar items are returned to the business module. According to the scheme provided by the embodiment of the application, when the approximate nearest neighbor algorithm model is trained, the mapping data indexed to the item identification is stored in the final model, and the returned similar result is directly the item identification without mapping, so that the query efficiency and accuracy of nearest neighbor query are improved.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present invention, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1a is a schematic diagram of a first flow chart of a query method for similar items according to an embodiment of the present invention;

FIG. 1b is a second flow chart of a query method for similar items according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of an algorithm model file storage format provided by an embodiment of the present invention;

FIG. 3a is a schematic diagram of a first structure of a query device for similar items according to an embodiment of the present invention;

FIG. 3b is a schematic diagram of a second structure of a query device for similar items according to an embodiment of the present invention;

fig. 4 is a schematic structural diagram of a server according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to fall within the scope of the invention.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment may be included in at least one embodiment of the invention. The appearances of such phrases in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. Those of skill in the art will explicitly and implicitly appreciate that the embodiments described herein may be combined with other embodiments.

The embodiment of the invention provides a query method of similar items, and an execution subject of the query method of similar items can be a query device of similar items provided by the embodiment of the invention or a server integrated with the query device of similar items, wherein the query device of similar items can be realized in a hardware or software mode.

At present, when a near-nearest algorithm model (hereinafter referred to as algorithm model or query algorithm model) is used for neighbor query, an Annoy algorithm is taken as an example, and the input of the query algorithm model is the index of the item to be queried, so that a service module maintains a set of mapping conversion from the index to the item identifier for the Annoy model, and after the service module obtains the item identifier of the item to be queried, the service module needs to convert the item identifier into the index of the item to be queried. After the query algorithm model queries items similar to the items to be queried, outputting indexes of the similar items to a service module, and mapping the indexes into final item identifications by the service module through a mapping relation.

Therefore, the model file stored by the original Annoy of the algorithm query module is a binary file which is only stored and related to the index tree. The results returned by Annoy nearest neighbor approximate query are all integer indexes, and the query module gives the indexes to the service module, so that the service party is required to convert the integer indexes into the ID of the service side.

For example, a specific query example is that there are 5 item identifiers a, b, c, d, e, and the service module pre-establishes and stores a mapping relationship between the item identifiers a, b, c, d, e and indexes Index0,1,2,3, 4. Then, the caller is to query for a similar item of a, then the following steps are needed:

1. the query module finds the Index corresponding to a: i.e. 0.

2. The query module invokes the get_ nns _by_id method of the Annoy algorithm, which returns Index of similar items with 0 as a parameter, and if b, e is similar to a, the list returned by the method is 1,4.

3. The query module gives the returned 1,4 to the service module, and the service module queries the ID of the corresponding Item from the mapping relation stored by the service module as b, e.

4. And finally returning to the calling party.

In this way, when integrating into an online service, when generating a model of multiple versions, the data that each version needs to maintain includes three items: vector storage of Item, annoy model itself, and Index to Item ID mapping. Where Index represents the Index of the Item in the Annoy algorithm, (the Index is a consecutive number that must start from 1 for shaping), and Item ID is the Item identification. The more data that each version maintains, the more likely problems are to occur online, especially when the Annoy's query results and Index to Item ID mappings in multiple versions are wrong, the return of a near-similar result is entirely wrong. Therefore, the embodiment of the application provides a query method for similar items, and the steps of querying results first and mapping the results are avoided.

As shown in fig. 1a, fig. 1a is a first flow chart of a query method of a similar item according to an embodiment of the present invention, where a specific flow of the query method of a similar item may be as follows:

101. and adding mapping data of the index and the item identifier in the approximate nearest neighbor algorithm model file, and storing according to a preset storage format.

In one embodiment, the Annoy algorithm is described as the above-described approximate nearest neighbor algorithm. Wherein the objective of Annoy is to build a data structure such that the time complexity of querying the nearest neighbors of a point is sub-linear. Annoy makes each point find time complexity O (log n) by building a binary tree. For example, two points are randomly selected, the two nodes are taken as initial center nodes, a kmeans process with the cluster number of 2 is executed, two converged cluster center points are finally generated, a space is divided according to the two center points, then continuous recursion iteration is carried out in the divided subspaces for continuous division until at most K data nodes are left in each subspace, and if the K data nodes are divided through multiple recursion iterations, the final original data can form a binary tree structure similar to the following binary tree structure. The bottom layer of the binary tree is a leaf node recording original data nodes, and other intermediate nodes recording information of the segmentation hyperplane. Annoy builds such a binary tree structure in hopes of satisfying the assumption that similar data nodes should be located closer together on the binary tree and that a split hyperplane should not split similar data nodes across different branches of the binary tree, the above step being the indexing process.

After the index is established, a query can be performed, namely, a process of continuously looking at which side of the segmented hyperplane the node is on, and a process of continuously traversing from the root node to the leaf node is seen from the binary tree structure. The query process is completed in the above manner by performing a correlation calculation on each intermediate node (dividing hyperplane related information) of the binary tree and the query data node to determine whether the binary tree traversal process walks to the left child node or to the right child node of this intermediate node.

And returning to the final neighbor node after inquiring, specifically, inserting all the tree return neighbor points into a priority queue, solving a union set for duplication elimination, calculating and inquiring the distance between the points, and finally sequencing from a short distance to a long distance according to the distance value, and returning to the Top N neighbor node set.

In an embodiment, the approximate nearest neighbor algorithm model may be trained first, then the mapping data of the index and the item identifier may be added after the trained algorithm model, and then the mapping data may be stored according to a preset storage format.

For example, the model of Annoy itself is stored first in a model file, and then the mapping of Index to Item ID is stored, and the model file finally represents the number of items and the length of the model of Annoy itself, respectively.

102. And loading the added algorithm model file to obtain the length information of the algorithm model and the quantity information of the items.

In an embodiment, the storage format of the added algorithm model file may be read first, and the last n bytes may be read, where the n bytes include the number information of the item identifier added in step 101 and the length information of the model of Annoy itself, so that the number of item identifiers and the length of the model of Annoy itself may be obtained after the n bytes are read. It should be noted that, since one item has one identifier, and one identifier has a mapping relationship with one index, the number of items may also be regarded as the number of mapping relationships.

Specifically, for example, the last 4 bytes of the model file represent the length of the Annoy model, and the 5 th to 8 th bytes of the model file represent the number of storage items, that is, the last 8 bytes of the model file represent the length of the Annoy model and the number of storage items. Therefore, when the storage format of the added algorithm model file is read, the last 8 bytes are read, and can be converted into two shaping numbers M and N, the last 4 bytes are converted into M to represent the length of the Annoy model, and the 5 th to 8 th bytes are converted into N to represent the number of the items.

103. And reading the mapping data of the index and the item identifier according to the length information and the quantity information to obtain the mapping relation between the index and the item identifier.

In an embodiment, before the mapping data of the index and the item identifier is read according to the length information and the quantity information, the source code of the Annoy may be modified, and when the model file is mapped by adopting a preset method, the mapping length is designated as the byte length M. That is, after obtaining the length information of the algorithm model and the number information of items, the method further includes:

further, after the added algorithm model file is shifted by M, the mapping data of the index and the item identifier can be read, and the mapping data is read for a plurality of times according to the number of the item identifiers, so that a plurality of indexes and a plurality of corresponding item identifiers are obtained, and a mapping relation is established and stored. That is, the step of reading the mapping data of the index and the item identifier according to the length information and the quantity information includes:

setting an offset according to the length information and setting the reading times according to the quantity information;

and reading the mapping data of the index and the item identifier in the added algorithm model file according to the offset and the reading times.

Further, after the mapping relation between the indexed and the item identifier is obtained, the mapping relation may be stored in a HashMap (hash Map), where the HashMap is a hash table-based Map interface implementation, the hash table is used as a bottom implementation of the HashMap, a data structure implementing the hash table in Java is an array+linked list, the HashMap regards < key, value > as a Node object (an interface implementing Map. Entry < K, V >) and includes a hash, key, value, a next attribute, and a put (key, value), and when the HashMap determines a position of the Node object in the array according to a value of the hash (key. Hash code ()), if different hashes (keys) Map to the same array position, the Node object is inserted into the linked list by using a tail insertion method. When get (key), the position of the array is determined according to the hash (key), the linked list on the position has a plurality of values, the key is determined by using key.equivalent (k), the node object is returned, and the value of the node object is read. In the embodiment of the application, key can be set as an index and Value can be set as an item identifier in the HashMap.

104. And calling an approximate nearest neighbor algorithm to perform similar searching, and converting indexes in the searching result into corresponding item identifiers according to the mapping relation.

In an embodiment, when the approximate nearest neighbor algorithm is called to perform similar query on the items to be queried included in the item library, the method returns an index, and then the item identifier corresponding to the index is searched in the stored mapping relation, so that the method can return to the calling party. For example, the Index returned by the method is 1 and 4, and then Index is converted into ItemID according to mapping relation in HashMap, for example, 1 is converted into b, and 4 is converted into e.

It can be appreciated that when the algorithm model searches for similar items of a certain item to be searched from the item library, similarity calculation needs to be performed by using the feature vector of each item in the item library. The feature vectors of the items in the item library are calculated in a full-scale manner, namely, the feature vector of each item needs to be calculated by using the features of a plurality of items in the item library, and if the items in the item library are changed, such as added, deleted or changed, the feature vector of each item needs to be recalculated. It should be noted that, the feature vectors of the items are different, and the indexes thereof in the algorithm model are also different. Therefore, if the project library changes, it is necessary to reconstruct the algorithm model for the project library and build and save the index relation between the index on which the algorithm model depends and the project identifier. The existing similar query schemes have at least the following disadvantages:

1. The maintenance data is more and is easy to make mistakes. In an application scenario where a project library is often changed, multiple versions of algorithm models need to be built, and a mapping relationship between an index and a project identifier needs to be maintained for each algorithm model independently. Each algorithm model needs to maintain at least two levels of mapping, the first level is the mapping of the algorithm model and the mapping relation, and the second level is the mapping of the index and the item identification.

The query efficiency is low, the mapping relations are stored in the external storage device, when the query is actually performed, the mapping relations are required to be read from the external storage device to the memory through the read-write interface, the pressure of the read-write interface is increased, and the query steps are more, so that the query efficiency is reduced. This disadvantage is evident in scenes where the number of items is large, such as millions of items, and the latency requirements are high.

In the above-mentioned manner, the query method for similar items provided by the embodiment of the present application may add the mapping data of the index and the item identifier to the approximate nearest neighbor algorithm model file, store the mapping data according to the preset storage format, load the added algorithm model file to obtain the length information of the algorithm model and the quantity information of the items, read the mapping data of the index and the item identifier according to the length information and the quantity information to obtain the mapping relationship between the index and the item identifier, call the approximate nearest neighbor algorithm to perform similar search, and convert the index in the search result into the corresponding item identifier according to the mapping relationship. The scheme provided by the embodiment of the application can save the mapping data indexed to the item identifier into the final algorithm model file when the approximate nearest neighbor algorithm model is constructed. When the algorithm model file is used for inquiring, the algorithm model file is loaded into the memory, and the algorithm model file itself contains the mapping between the index and the item identifier, so that the loading algorithm model file records the mapping relation between the index and the item identifier, and the file stored with the mapping relation does not need to be read from the external storage device, so that the pressure of a read-write interface is reduced, the inquiring service flow is shortened, and the inquiring efficiency of neighbor inquiring is improved. In addition, only the improved model algorithm file is needed, so that the maintenance cost is reduced, the accuracy of data maintenance is improved, and the query accuracy is further improved.

The method according to the previous embodiments will be described in further detail below.

Referring to fig. 1b, fig. 1b is a second flow chart of a query method for similar items according to an embodiment of the invention. The method comprises the following steps:

201. and adding mapping data of the index and the item identifier in the algorithm model file, wherein the mapping data occupies a bytes.

It should be noted that each mapping data occupies a bytes, and each mapping data refers to a mapping relationship between an index and an item identifier.

202. And adding quantity information data of the items in the algorithm model file, wherein the quantity information data occupies b bytes.

203. And adding length information data of the algorithm model into the algorithm model file, wherein the length information data occupies c bytes.

In one embodiment, the algorithm model may be trained first to obtain a trained algorithm model. When the algorithm model file is generated, the model data of the trained algorithm model is stored according to a preset storage format, and the mapping data of the index and the item identification, the number of the item identifications and the length of the algorithm model data are sequentially stored after the algorithm model data. It should be noted that the algorithm model includes, but is not limited to, an Annoy algorithm model.

For example, the storage format of the improved algorithm model file is shown in fig. 2, fig. 2 is a schematic diagram of the storage format of the algorithm model file provided by the embodiment of the present application, where Index-to-Item ID mappings are stored after the Annoy algorithm model data, each mapping occupies 36 bytes, the last 4 bytes of the algorithm model file represent the length of the algorithm model data, and the reciprocal number of the algorithm model file is 5-8 bytes to store the number of items.

Further, the mapping data of the index of a bytes and the item identification can be further split into index data of d bytes and item identification data of e bytes. With continued reference to fig. 2, if the mapping of Index to Item ID takes 36 bytes, index is represented by 4 bytes, representing Index of Item in Annoy, item_id itself is String type, taking 32 bytes, wherein the length of the bytes can be modified according to actual scene requirement, which is not further limited by the present application.

204. And loading the algorithm model file, and reading the data of the last b+c bytes of the algorithm model file.

205. And converting the last c bytes of data in the b+c bytes of data into length information of an algorithm model.

206. The remaining b bytes of data among the b+c bytes of data are converted into the number of items information.

In one embodiment, an algorithm model file having a predetermined storage format may be read first, from which the last b+c bytes of data are read. For example, if b=4, c=4, then b+c=8, i.e. the last 8 bytes of data are read. Then respectively converting into two shaping numbers M and N, converting the last 4 bytes into M, representing the length of Annoy algorithm model data, converting the last 5-8 bytes into N, and representing the number of items.

In an embodiment, the source code of Annoy itself may be modified, where when the model file is mapped by using a preset method, the mapping length is designated as the byte length M.

In an embodiment, the preset method may be a mmap method, where the mmap method is a method for mapping a file in a memory, that is, mapping a file or other objects to an address space of a process, so as to implement a one-to-one mapping relationship between a file disk address and a segment of virtual address in a virtual address space of the process. After the mapping relation is realized, the process can read and write the memory section by adopting a pointer mode, and the system can automatically write back to the corresponding file disk, namely the operation on the file is completed without calling a system calling function. In contrast, the modification of the kernel space to the section of the region also directly reflects the user space, so that file sharing among different processes can be realized.

207. And reading the mapping data of the index and the item identifier according to the length information and the quantity information to obtain the mapping relation between the index and the item identifier.

For example, after shifting the length of M from the algorithm model file, the Index-Item mapping data can be read, and 36 bytes are read each time, and a total of N times (the number of items) are read, and the first 4 bytes represent the Index, and the last 32 bytes represent the ID of the Item. The item ID may be String type, and the maximum length is 32 bytes. That is, the step of reading the mapping data of the index and the item identifier to obtain the mapping relationship between the index and the item identifier includes:

reading the data of the first d bytes in the data of the a bytes to obtain an index;

reading the data of the last e bytes in the data of the a bytes to obtain an item identifier corresponding to the index;

multiple reads are performed to establish a mapping relationship between the multiple indices and the multiple item identities.

In an embodiment, after obtaining the above-mentioned mapping relationship between the index and the item identifier, the mapping relationship may be stored in the HashMap, where Key is set as the index and Value is set as the item identifier. When the mapping relation is needed to be used later, key and Value can be taken out only by traversing HashMap. Traversal is typically performed using key and entryset, which is required when a value is taken with a keySet (key), such as:

208. And calling an approximate nearest neighbor algorithm to perform similar searching, and converting indexes in the searching result into corresponding item identifiers according to the mapping relation.

In an embodiment, according to the length information of the algorithm model data, the algorithm model data is obtained by reading the data with the corresponding length from the algorithm model file. When the similar query is carried out by calling the approximate nearest neighbor algorithm, the method returns an index, and then the item identification corresponding to the index is searched in the stored mapping relation, namely the item identification can be returned to the calling party, namely the service module. For example, the Index returned by the method is 1 and 4, and then Index is converted into ItemID according to mapping relation in HashMap, for example, 1 is converted into b, and 4 is converted into e.

It should be noted that, the Annoy algorithm may be replaced by a Faiss algorithm, and although Faiss itself may support adding a map for Index, the provided API may only support the type that ItemID is 64-bit shaped, but not support the type that ItemID is a character string, and when Faiss is used, the method may also be used to append the mapping relationship to the back of the model itself, and then identify the length of the model itself for loading algorithm.

In the above, the query method for similar items provided by the embodiment of the application can add the mapping data of the index and the item identifier after the approximate nearest neighbor algorithm model, occupy a byte, add the quantity information data of the item after the mapping data, occupy b bytes, add the length information data of the algorithm model after the quantity information data, occupy c bytes, load the added algorithm model file, read the last b+c bytes of the algorithm model file, convert the last c bytes of the b+c bytes of the data into the length information of the algorithm model, convert the remaining b bytes of the b+c bytes of the data into the quantity information of the item, read the mapping data of the index and the item identifier according to the length information and the quantity information, obtain the mapping relation between the index and the item identifier, call the approximate nearest neighbor algorithm to perform similar search, and convert the index in the search result into the corresponding item identifier according to the mapping relation. According to the scheme provided by the embodiment of the application, when the approximate nearest neighbor algorithm model is trained, the mapping data indexed to the item identification is stored in the final model, the similar result returned to the service module is the item identification directly, the service module is not required to map any more, and the step of reading the mapping data from the external storage device is reduced, so that the query efficiency and accuracy of the nearest neighbor query are improved.

In order to implement the above method, the embodiment of the invention also provides a query device for similar items, which can be integrated in terminal equipment such as mobile phones, tablet computers and the like.

For example, as shown in fig. 3a, a first structural schematic diagram of a query device for similar items according to an embodiment of the present invention is shown. The query device of the similar items may include:

the receiving unit 301 is configured to receive a similar item query instruction of a service module for a target item, where the similar item query instruction includes an identifier of the target item.

In an embodiment, the above-mentioned approximate nearest neighbor algorithm model may be trained first, and then the adding unit 301 adds the mapping data of the index and the item identifier after the trained algorithm model, and then stores the mapping data according to a preset storage format.

For example, in the storage format of the added algorithm model file, the model of Annoy is stored first, then the mapping from Index to ID of Item is stored, and the model file itself finally represents the number of items and the length of the model of Annoy.

The loading unit 302 is configured to load a pre-generated algorithm model file into a memory, where the algorithm model file includes query algorithm model data, a mapping relationship between an index of an item and an identifier of the item, and the query algorithm model data is used to represent a query algorithm model.

In an embodiment, the loading unit 302 may first read the storage format of the added algorithm model file, and read the last n bytes, where the n bytes include the number information of the item identifiers added in step 101 and the length information of the model of Annoy itself, so that the number of item identifiers and the length of the model of Annoy itself can be obtained after the n bytes are read.

And the parsing unit 303 is configured to parse the algorithm model file to obtain a query algorithm model and each mapping relationship.

After shifting the added algorithm model file, the parsing unit 303 may start to read the mapping data of the index and the item identifier, and read the mapping data multiple times according to the number of the items, to obtain multiple indexes and multiple corresponding item identifiers, and establish a mapping relationship and store the multiple indexes and the multiple item identifiers.

In an embodiment, the mapping relationship may be stored in HashMap, where Key is set as an index and Value is set as an item identifier.

A calling unit 304, configured to obtain an index corresponding to the identifier of the target item according to the mapping relationship, and call the query algorithm model by using the index of the target item to obtain an index of a similar item similar to the target item;

And a returning unit 305, configured to obtain, according to the mapping relationship, an identifier corresponding to the index of the similar item, and return, to the service module, the identifier of the similar item.

In an embodiment, when the approximate nearest neighbor algorithm is called to perform similar query, the method returns an index, and then the item identifier corresponding to the index is searched in the stored mapping relation, so that the method can be returned to the calling party.

By the online of the scheme, the on-line Annoy query service flow is shortened, the maintenance cost is reduced because only the vector of the Item and the improved Annoy model per se are required to be maintained, the mapping between Index- > Item IDs which are most likely to cause errors is not required to be maintained, and the last Item ID is directly given by the model

In one embodiment, referring to FIG. 3b, the loading unit 302 may include:

a first adding subunit 3021, configured to add model data of a query algorithm model to the algorithm model file;

a second adding subunit 3022, configured to add, after the model data of the query algorithm model, a mapping relationship between an index of an item and an identifier of the item, where each mapping relationship occupies a bytes, add, after the mapping relationship, a number of mapping relationships, where the number occupies b bytes, add, after the number, a length occupied by the query algorithm model data in the algorithm model file, where the length occupies c bytes.

In an embodiment, the parsing unit 303 may include:

a first reading subunit 3031, configured to read data of the last b+c bytes of the algorithm model file, so as to obtain the length occupied by the query algorithm model and the number of mapping relationships respectively;

a second reading subunit 3032, configured to offset the length from the starting position of the algorithm model file, and circularly read the mapping relationship; wherein the mapping relation of each reading is a bytes, and the reading times are the number of the mapping relation.

In one embodiment, the mapping relationship of the a bytes includes an index of d bytes and an identification of e bytes;

the second reading subunit 3032 is specifically configured to, for each time of the a-byte data, read the first d-byte data in the a-byte to obtain an index of an item, read the last e-byte data in the a-byte to obtain an identifier of the item, and establish a mapping relationship between the index of the item and the identifier of the item.

In the implementation, each unit may be implemented as an independent entity, or may be implemented as the same entity or several entities in any combination, and the implementation of each unit may be referred to the foregoing method embodiment, which is not described herein again.

It should be noted that, the query device for similar items provided in the embodiment of the present application belongs to the same concept as the query method for similar items in the above embodiment, and any method provided in the query method embodiment for similar items may be run on the query device for similar items, and detailed implementation processes of the method provided in the query method embodiment for similar items are shown in the query method embodiment for similar items, which is not described herein again.

The query device for the similar items provided by the embodiment of the application receives a query instruction of a business module for the similar items of the target items, wherein the query instruction of the similar items comprises the identification of the target items, loads a pre-generated algorithm model file into a memory, wherein the algorithm model file comprises query algorithm model data and the mapping relation between indexes of the items and the identifications of the items, the query algorithm model data is used for representing the query algorithm model, the algorithm model file is analyzed to obtain the query algorithm model and each mapping relation, the indexes corresponding to the identifications of the target items are obtained according to the mapping relation, the query algorithm model is called by using the indexes of the target items to obtain the indexes of the similar items similar to the target items, the identifications corresponding to the indexes of the similar items are obtained according to the mapping relation, and the identifications of the similar items are returned to the business module. The scheme provided by the embodiment of the application can save the mapping data indexed to the item identifier into the final algorithm model file when the approximate nearest neighbor algorithm model is constructed. When the algorithm model file is used for inquiring, the algorithm model file is loaded into the memory, and the algorithm model file itself contains the mapping between the index and the item identifier, so that the loading algorithm model file records the mapping relation between the index and the item identifier, and the file stored with the mapping relation does not need to be read from the external storage device, so that the pressure of a read-write interface is reduced, the inquiring service flow is shortened, and the inquiring efficiency of neighbor inquiring is improved. In addition, only the improved model algorithm file is needed, so that the maintenance cost is reduced, the accuracy of data maintenance is improved, and the query accuracy is further improved.

The embodiment of the invention also provides a server, as shown in fig. 4, which shows a schematic structural diagram of the server according to the embodiment of the invention, specifically:

the server may include one or more processors 401 of a processing core, memory 402 of one or more computer readable storage media, a power supply 403, and an input unit 404, among other components. Those skilled in the art will appreciate that the server architecture shown in fig. 4 is not limiting of the server and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.

Wherein:

the processor 401 is a control center of the electronic device, connects various parts of the entire electronic device using various interfaces and lines, and performs various functions of the electronic device and processes data by running or executing software programs and/or units stored in the memory 402, and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device. Optionally, processor 401 may include one or more processing cores; preferably, the processor 401 may integrate an application processor and a modem processor, wherein the application processor mainly processes an operating system, a user interface, an application program, etc., and the modem processor mainly processes wireless communication. It will be appreciated that the modem processor described above may not be integrated into the processor 401.

The memory 402 may be used to store software programs and units, and the processor 401 executes various functional applications and data processing by running the software programs and units stored in the memory 402. The memory 402 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program (such as a sound playing function, an image playing function, etc.) required for at least one function, and the like; the storage data area may store data created according to the use of the electronic device, etc. In addition, memory 402 may include high-speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other volatile solid-state storage device. Accordingly, the memory 402 may also include a memory controller to provide the processor 401 with access to the memory 402.

The server also includes a power supply 403 for powering the various components, and preferably, the power supply 403 may be logically connected to the processor 401 by a power management system so as to implement functions such as charge, discharge, and power consumption management by the power management system. The power supply 403 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.

The server may also include an input unit 404, which input unit 404 may be used to receive entered numeric or character information and to generate keyboard, mouse, joystick, optical or trackball signal inputs related to user settings and function control.

Although not shown, the server may further include a display unit or the like, which is not described herein. In this embodiment, the processor 401 in the server loads executable files corresponding to the processes of one or more application programs into the memory 402 according to the following instructions, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions as follows:

and returning the identification of the similar items to the service module.

In some embodiments, when generating the algorithm model file, the processor 401 runs an application program stored in the memory 402, and may also implement the following functions:

adding model data of a query algorithm model into the algorithm model file;

adding mapping relations between indexes of items and identifications of the items after model data of the query algorithm model, wherein each mapping relation occupies a bytes;

adding the number of the mapping relations after the mapping relations, wherein the number occupies b bytes;

and adding the length occupied by the query algorithm model data in the algorithm model file after the number, wherein the length occupies c bytes.

In some embodiments, when parsing the algorithm model file to obtain the query algorithm model and each of the mapping relationships, the processor 401 runs an application program stored in the memory 402, and may further implement the following functions:

Reading data of the last b+c bytes of the algorithm model file to obtain the length occupied by the query algorithm model and the number of mapping relations respectively;

shifting the length from the initial position of the algorithm model file, and circularly reading the mapping relation; wherein the mapping relation of each reading is a bytes, and the reading times are the number of the mapping relation.

In some embodiments, the mapping relationship of the a bytes includes an index of d bytes and an identifier of e bytes, and when the mapping relationship is circularly read, the processor 401 executes an application program stored in the memory 402, and may further implement the following functions:

for a bytes of data read each time, reading the data of the first d bytes in the a bytes to obtain an index of an item, reading the data of the last e bytes in the a bytes to obtain an identification of the item, and establishing a mapping relation between the index of the item and the identification of the item.

In some embodiments, each of the mappings in memory is stored in a hash map.

The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.

Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions, or by instructions controlling associated hardware, which may be stored in a computer-readable storage medium and loaded and executed by a processor.

In view of the foregoing, the server provided in the embodiment of the present invention receives, through a receiving service module, a query instruction for a similar item of a target item, where the query instruction for the similar item includes an identifier of the target item, loads, into a memory, a pre-generated algorithm model file, where the algorithm model file includes query algorithm model data, a mapping relationship between an index of the item and the identifier of the item, the query algorithm model data is used to represent a query algorithm model, analyze the algorithm model file to obtain the query algorithm model and each mapping relationship, obtain an index corresponding to the identifier of the target item according to the mapping relationship, call the query algorithm model using the index of the target item to obtain an index of the similar item similar to the target item, obtain an identifier corresponding to the index of the similar item according to the mapping relationship, and return the identifier of the similar item to the service module. Therefore, the efficiency and the accuracy of neighbor query can be improved.

In addition, the embodiment of the invention also provides a storage medium, wherein a plurality of instructions are stored, and the instructions can be loaded by a processor to execute any similar item query method provided by the embodiment of the invention. For example, the instructions may perform:

and returning the identification of the similar items to the service module.

The specific implementation of the above operations may be referred to the previous embodiments, and will not be described herein.

Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.

Because the instructions stored in the storage medium can execute any similar item query method provided by the embodiment of the present invention, the beneficial effects that any similar item query method provided by the embodiment of the present invention can be achieved, and detailed descriptions of the previous embodiments are omitted. The foregoing describes in detail a query method, device and storage medium for similar items provided in the embodiments of the present invention, and specific examples are applied to illustrate the principles and embodiments of the present invention, where the foregoing examples are only used to help understand the method and core idea of the present invention; meanwhile, as those skilled in the art will vary in the specific embodiments and application scope according to the ideas of the present invention, the present description should not be construed as limiting the present invention in summary.

Claims

1. A method for querying similar items, comprising:

loading a pre-generated algorithm model file into a memory to obtain length information of a query algorithm model and quantity information of items, wherein the algorithm model file comprises mapping relation among query algorithm model data, indexes of the items and identifications of the items, and the query algorithm model data is used for representing the query algorithm model;

Analyzing the algorithm model file, setting offset according to the length information and reading times according to the quantity information, reading the mapping relation for multiple times according to the offset and the reading times to obtain mapping relation between multiple indexes and corresponding multiple item identifications, and obtaining a query algorithm model according to the length information;

and returning the identification of the similar items to the service module.

2. The method for querying similar items according to claim 1, wherein the generating manner of the algorithm model file comprises:

adding model data of a query algorithm model into the algorithm model file;

3. The method of claim 2, wherein said parsing said algorithm model file to obtain a query algorithm model and each of said mappings comprises:

4. The query method of similar items according to claim 3, wherein said a-byte mapping includes an index of d bytes and an identification of e bytes; the cyclic read mapping relationship includes:

5. The method of claim 1, wherein each of the mappings in the memory is stored in a hash map.

6. A query device for similar items, comprising:

the loading unit is used for loading a pre-generated algorithm model file into the memory to obtain length information of the query algorithm model and quantity information of the items, wherein the algorithm model file comprises mapping relation among query algorithm model data, indexes of the items and identifications of the items, and the query algorithm model data is used for representing the query algorithm model;

the analysis unit is used for analyzing the algorithm model file, setting offset according to the length information and reading times according to the quantity information, reading the mapping relation for a plurality of times according to the offset and the reading times to obtain mapping relation between a plurality of indexes and a plurality of corresponding item identifiers, and obtaining a query algorithm model according to the length information;

7. The query device for similar items according to claim 6, wherein said loading unit comprises:

the first adding subunit is used for adding the model data of the query algorithm model into the algorithm model file;

a second adding subunit, configured to add, after the model data of the query algorithm model, a mapping relationship between an index of an item and an identifier of the item, where each mapping relationship occupies a bytes, add, after the mapping relationship, a number of mapping relationships, the number occupying b bytes, add, after the number, a length occupied by the query algorithm model data in the algorithm model file, and the length occupies c bytes.

8. The query device for similar items according to claim 7, wherein said parsing unit includes:

the first reading subunit is used for reading the data of the last b+c bytes of the algorithm model file to respectively obtain the length occupied by the query algorithm model and the number of mapping relations;

The second reading subunit is used for shifting the length from the initial position of the algorithm model file and circularly reading the mapping relation; wherein the mapping relation of each reading is a bytes, and the reading times are the number of the mapping relation.

9. The query device for similar items according to claim 8, wherein said a-byte mapping includes an index of d bytes and an identification of e bytes;

the second reading subunit is specifically configured to, for a byte of data that is read each time, read data of a first d bytes of the a bytes to obtain an index of an item, read data of a last e bytes of the a bytes to obtain an identifier of the item, and establish a mapping relationship between the index of the item and the identifier of the item.

10. A storage medium storing a plurality of instructions adapted to be loaded by a processor to perform the method of querying a similar item as claimed in any one of claims 1 to 5.