CN110866127A - Method for establishing index and related device - Google Patents

Method for establishing index and related device Download PDF

Info

Publication number
CN110866127A
CN110866127A CN201810986041.0A CN201810986041A CN110866127A CN 110866127 A CN110866127 A CN 110866127A CN 201810986041 A CN201810986041 A CN 201810986041A CN 110866127 A CN110866127 A CN 110866127A
Authority
CN
China
Prior art keywords
feature
storage area
address
memory
features
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810986041.0A
Other languages
Chinese (zh)
Inventor
徐昀
陆元飞
彭超
潘锋烽
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Priority to CN201810986041.0A priority Critical patent/CN110866127A/en
Publication of CN110866127A publication Critical patent/CN110866127A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The application discloses a method for establishing an index and a related device, wherein the method comprises the following steps: determining a plurality of first features to be stored in an internal memory, wherein the plurality of first features belong to a first index group, the first features are used for feature calculation by an arithmetic processor, and the first features are features obtained by feature extraction and/or processing of multimedia data; storing the plurality of first features to the internal memory, wherein the plurality of first features are stored consecutively in a first storage area of the internal memory; establishing address index information of the first storage area, wherein the address index information is used for indicating the address of the first storage area in the internal memory; and reading the first characteristic to the operation processor according to the address index information. According to the technical scheme, the speed of reading the features to the operation processor can be increased, and therefore the utilization rate of the operation processor is increased.

Description

Method for establishing index and related device
Technical Field
The present application relates to the field of computer technologies, and in particular, to a method for creating an index and a related apparatus.
Background
A feature is data that describes a content attribute of one or more aspects of multimedia data. In a scenario of performing feature calculation by using features, since different algorithms for feature calculation use different data formats, fields, field types, field lengths, and the like, it is necessary to establish an index for the features, so that features related to the algorithms can be called according to the index when feature calculation is performed according to the algorithms. Feature computation may be particularly applicable to data retrieval scenarios, data mining scenarios, and the like.
The feature calculation is completed by an arithmetic processor of the computer. Because the storage space of the arithmetic processor is limited, the features are generally stored in the internal memory of the computer, and when the arithmetic processor needs to utilize the features to carry out feature calculation, the features are read into the arithmetic processor according to the pre-established index. In the current scenario of feature calculation using features, features and auxiliary information corresponding to the features are generally stored in an internal memory as a whole, and then a pointer pointing to a first storage address of a storage area in which a plurality of feature storage units are continuously stored is used as an address index of the storage area. Such problems are: if a plurality of features (such as a plurality of features in the storage area) in the internal memory are read to the operation processor at one time, the storage address of each feature needs to be calculated according to the storage address pointed by the pointer and the length of the storage space corresponding to the feature storage unit, and then the pointer jumps to the storage address of each feature to read the plurality of features into the operation processor, which involves the calculation of a plurality of storage addresses and a plurality of pointer jumps, so that the time for reading the features into the operation processor is long, and the improvement of the utilization rate of the operation processor is not facilitated.
Disclosure of Invention
The application provides a method and a related device for establishing an index, which solve the problem of low utilization rate of an arithmetic processor caused by long time spent on reading a feature to the arithmetic processor.
In a first aspect, a method for creating an index is provided, including:
determining a plurality of first features to be stored in an internal memory, wherein the plurality of first features are all features of a first index group, the first features are features used for feature calculation by an arithmetic processor, and the first features are features obtained by feature extraction and/or processing of multimedia data; storing the plurality of first characteristics into an internal memory, wherein the plurality of first characteristics are continuously stored in a first storage area of the internal memory; establishing address index information of the first storage area, wherein the address index information is used for indicating the address of the first storage area in the internal memory; and reading the first characteristics stored in the first area storage to the operation processor according to the address index information of the first storage area.
In the technical scheme, the first features belonging to one index group are continuously stored in the internal memory and address index information of the storage area for continuously storing the plurality of first features is established, so that when the first features of the index group are read to the operation processor, the plurality of first features can be continuously read to the operation processor according to the address index information in a continuous storage mode, the storage address of each first feature does not need to be calculated respectively, and pointer skipping is not needed for multiple times, the time for reading the plurality of first features to the operation processor is reduced, and the reading efficiency is improved.
In one possible embodiment, the plurality of first features to be stored to the internal memory are features that are not currently stored in the internal memory.
In one possible embodiment, the first plurality of characteristics to be stored in the internal memory is a characteristic of persistently storing the external memory. Further, the plurality of first features to be stored to the internal memory are features that are stored in the external memory in a persistent manner in a column storage manner. Because the first characteristics are stored in the external memory in a persistent mode in a column storage mode, when the plurality of first characteristics are stored in the internal memory, the plurality of first characteristics can be continuously stored in the internal memory, each first characteristic does not need to be addressed, the time for storing the plurality of first characteristics in the internal memory is reduced, and the time for reading the characteristics from the external memory to the operation processor in the device restarting stage can be further reduced.
In some possible embodiments, the address index information of the first storage area may be as follows:
1) the address index information of the first storage area comprises a first pointer and a first storage space length, the value of the first pointer is a first storage address of the first storage area, and the first storage space length is the storage space length of the first storage area.
2) The address index information of the first storage area comprises a first pointer and a second pointer, the value of the first pointer is the first storage address of the first storage area, and the value of the second pointer is the last storage address of the first storage area.
3) The address index information of the first storage area comprises a first pointer and a residual storage space length, the value of the first pointer is a first storage address of the first storage area, and the residual storage space length is a difference value between the maximum storage space length of the first storage area and the storage space length of the first storage area.
Not limited to these cases, there may be other cases of the address index information of the first storage area. For example, the address index information of the first storage area may also be content stored in a data table that has been created or is also created, and the content may be the first storage address of the first storage area and the storage space length of the first storage area; or, the first storage address and the last storage address of the first storage area; or the first memory address and the remaining memory space length of the first memory area, etc. By using the address index information of the first storage area, when the first feature in the first storage area is read to the arithmetic processor, the addresses of the plurality of first features which are continuously stored in the internal memory can be directly determined according to the address index information, and further, the plurality of first features can be continuously read according to the address index information.
In a possible implementation, while the plurality of first features are stored in the internal memory, or after the plurality of first features are stored in the internal memory, auxiliary information corresponding to the first features may be stored in the internal memory, where the auxiliary information is stored in at least one second storage area in the internal memory; establishing address index information of the second storage area and an association relation between the first storage area and the second storage area, wherein the second stored address index information is used for indicating an address of the second storage area in an internal memory, the address index information of the second storage area and the association relation between the first storage area and the second storage area are used by the operation processor for determining auxiliary information corresponding to a second characteristic, and the second characteristic is a first characteristic determined by the operation processor after characteristic calculation; the auxiliary information is descriptive information of the multimedia data corresponding to the first characteristic and/or descriptive information of the first characteristic. By storing the auxiliary information corresponding to the first feature in the second storage area of the internal memory and establishing the address index information of the second storage area and the association relationship between the first storage area and the second storage area, when the first feature is the feature determined after the feature calculation by the arithmetic processor, the storage address of the auxiliary information corresponding to the first feature in the internal memory can be found according to the association relationship between the second storage area and the first storage area and the address index information of the second storage area, the auxiliary information corresponding to the first feature is obtained from the storage address, and then the multimedia data corresponding to the first feature can be found according to the auxiliary information.
In a possible embodiment, after reading the first feature to the arithmetic processor, the grouping flag of the first index grouping may be further set to a first flag, the first flag indicating that the first feature has been stored in the arithmetic processor.
In one possible embodiment, after reading the first feature to the arithmetic processor, if the first feature is removed from the arithmetic processor, the group flag of the first index group is set to a second flag indicating that the first feature is not stored in the arithmetic processor. Whether the first features are stored in the operation processor or not is marked according to the index grouping, so that the first features can be marked according to batches, and compared with the method of marking the features respectively, the quantity of marks can be saved, and the storage space is saved to a certain extent.
In a possible implementation manner, after the first feature is read to the operation processor, the first feature may be stored in a third storage area of the operation processor all the time, and then address index information of the third storage area and an association relationship between the first storage area and the third storage area are established, where the address index information of the third storage area and the association relationship between the first storage area and the third storage area are used by the operation processor, and the first feature is directly read in the operation processor for feature calculation according to the address index information of the third storage area and the association relationship between the first storage area and the third storage area. By always storing the first feature in the first index packet in the operation processor, when the operation processor performs feature calculation by using the first feature, the storage address of the first feature in the operation processor can be determined according to the association relationship between the third storage area and the first storage area and the address index information of the third storage area, and then the first feature is directly acquired from the operation processor to perform feature calculation without reading from the internal memory, so that the time for reading the first feature to the internal memory is saved.
In a second aspect, an apparatus for building an index is provided, which includes modules for performing the method for building an index in the first aspect or any one of the possible implementations of the first aspect.
In a third aspect, another apparatus for establishing an index is provided, which includes a processor and a memory, where the processor is connected to the memory, where the processor includes an arithmetic processor, the memory includes an internal memory, the internal memory is a memory coupled to the arithmetic processor, the memory is used for storing program codes and features, and the processor is used for calling the program codes to execute any one of the foregoing first aspect or any one of the foregoing possible implementation manners of the first aspect.
In a fourth aspect, a computer-readable storage medium is provided, which stores instructions that, when executed on a computer, cause the computer to perform any one of the above-mentioned first aspect and each possible implementation manner of the first aspect.
In a fifth aspect, there is provided a computer program product comprising instructions which, when run on a computer, cause the computer to perform the method of any one of the above-described first aspect and the various possible implementations of the first aspect.
Drawings
FIG. 1 is a schematic diagram of a system architecture for performing feature computation using features to achieve multimedia data retrieval;
FIG. 2 is a schematic diagram of a hierarchical index;
FIG. 3 is a schematic diagram of a feature storage and indexing design in the prior art;
FIG. 4 is a schematic diagram illustrating a prior art process for reading features into an arithmetic processor;
FIG. 5 is a block diagram illustrating an apparatus for creating an index according to an embodiment of the present disclosure;
FIG. 6 is a flowchart illustrating a method for creating an index according to an embodiment of the present application;
7A-7C are schematic diagrams of storing a first feature in an internal memory according to an embodiment of the present application;
FIG. 8 is a diagram illustrating a specific design of feature storage and indexing provided by an embodiment of the present application;
fig. 9 is a schematic structural diagram of an apparatus for creating an index according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application.
First, a system architecture for feature computation using features and some current designs related to feature storage and indexing are described.
Referring to fig. 1, fig. 1 is a schematic diagram of a system architecture for performing feature calculation using features to realize multimedia data retrieval, and as shown, the system architecture includes a feature database creation system 11 and a multimedia data retrieval system 12. The database establishing system 11 includes a feature extraction module 111 and a data storage engine 112, where the feature extraction module 111 is configured to perform feature extraction on multimedia data stored in a multimedia database to obtain one or more features corresponding to each multimedia data; the data storage engine 112 is used for storing one or more features obtained by the feature extraction module in a feature database, building and storing an index, and building and storing a correlation model of an algorithm. The multimedia data retrieval system 12 comprises a feature extraction module 121 and a retrieval module 122, wherein the feature extraction module 121 is configured to perform feature extraction on multimedia data input by a user to obtain one or more features corresponding to the multimedia data; the retrieval module 122 is configured to perform feature calculation on one or more features corresponding to the multimedia data and some or all of the features in the feature database to complete similarity retrieval of the multimedia data, and output a final retrieval result according to a result of the similarity retrieval.
In the system architecture, the feature storage and the index are realized by a data storage engine, and the data storage engine comprises two modules of feature storage and index, wherein the feature storage module is used for storing features; the indexing module is used for establishing and storing indexes of the features and the algorithm model. The index stored in the index module may be a hierarchical index, which may be indexed to a feature index, a feature and an algorithm model by the top K (K is a positive integer greater than or equal to 0) layer data of the hierarchical index as shown in fig. 2. When other modules in the retrieval architecture interact with the data storage engine, the data storage engine can index downwards layer by layer according to the hierarchical index, so that the storage address of the required feature or algorithm model is indexed, and the feature or algorithm model stored by the feature storage module is found according to the storage address.
In a specific implementation manner, for a scene in which multimedia data is a picture, the design scheme of feature storage and indexing is as shown in fig. 3, and the hierarchical index is a three-layer index structure. The first layer index 31a is an index of a camera identifier for uniquely identifying a camera for taking a picture, and the first layer index 31a can index to a camera identifier storage area 31b for storing the camera identifier, and the camera identifier storage area 31b can point to the second layer index 32a in a pointer manner. The second-tier index 32a is an index of a shooting date for representing a shooting date of a picture, and the shooting date storage area 32b for storing the shooting date can be indexed by the second-tier index 32a, and the shooting date storage area 32b can point to the third-tier index by means of a pointer. The third-level index includes a feature storage area 33a for storing features, a feature index storage area 33b for storing a feature index, and a model storage area 33c for storing an algorithm model. The feature storage area 33a may include a feature storage area 33a1 in the internal memory and a feature storage area 33a2 in the external memory, and features stored in the feature storage area 33a1 are in one-to-one correspondence with stored features in the feature storage area 33a2 through a mapping relationship between the feature storage area 33a1 and the feature storage area 33a 2. In the feature storage area 33a1 and the feature storage area 33a2, features and side information of the features (such as the shooting time, the feature version, and the like) are stored together as one feature storage unit Fn. A plurality of feature storage units indexed by one shooting identification and one shooting date are continuously stored together to form a feature storage area. A shooting id and a shooting date are indexed to a feature storage area by the length of the storage space occupied by the pointer 1 and the feature storage unit Fn. The feature storage area 33a further includes a feature storage area 33a3 in the arithmetic processor, the features stored in the feature storage area 33a3 are read from the feature storage area 33a1, the features are stored continuously in the feature storage area 33a3, and the feature storage area 33b3 does not contain the auxiliary information of the features. The stored traits of trait storage area 33a1 may be associated with stored traits in trait storage area 33a2 by trait index 33 b. The feature index 33b is composed of a pointer 2, a pointer 3, and a feature number n of features successively stored in one feature storage area 33a1, wherein the pointer 2 points to the first storage address of the feature storage area 33a1, the pointer 3 points to the first storage address of the feature storage area 33a3, and the feature number n of features successively stored in one feature storage area 33a1 is equal to the number of features indexed by one shooting id and one shooting date. In some possible embodiments, pointer 1 and pointer 2 may be the same pointer.
In the process of using features for feature comparison to complete picture retrieval, when the features are read from the feature storage area 33a1 of the internal memory to the feature storage area 33a3 of the arithmetic processor, the features stored in one feature storage area 33a1 can be read into the feature storage area 33a3 in batch, so that the arithmetic processor can simultaneously perform feature calculation on the features of one feature storage area 33a1, thereby fully utilizing the multithread parallel processing capability of the arithmetic processor and improving the utilization rate of the arithmetic processor. A feature stored in the feature storage area 33a1 may be referred to as a feature of batch. The process of reading the characteristics of a batch into the arithmetic processor is shown in FIG. 4, which includes the following steps:
s101, initializing a pointer offset.
Steps S102 to S103n are executed in a loop, n is equal to the number of features in one batch:
s102, reading the characteristics in the storage space corresponding to the storage address pointed by the pointer 1 plus the pointer offset into the storage area of the arithmetic processor.
S103, the pointer offset and the length of the storage space occupied by one feature storage unit are determined as the pointer offset again, and the step S102 is executed.
As can be seen from the flow shown in fig. 4, in the design scheme of the feature storage and index shown in fig. 3, to read the feature of one batch into the arithmetic processor for feature calculation, the pointer 1 is added with the storage address pointed by the offset pointer as the feature storage address, steps S102 and S103n are repeatedly executed for n times of feature storage address calculation and n times of pointer jump, which requires a long time for calculating the storage address and the pointer jump, resulting in a long time for reading the feature of one batch into the arithmetic processor, which is not favorable for improving the utilization rate of the arithmetic processor.
The embodiment of the application provides a method and a related device for establishing a feature index, so as to solve the problem that the time required for reading a feature to an operation processor is long in the design scheme of feature storage and index shown in fig. 3.
The embodiment of the application can be applied to a scenario in which features are utilized to perform feature calculation to accomplish a certain purpose (such as data retrieval, data mining and the like). The embodiment of the present application is applicable to the system architecture shown in fig. 1, which uses features to perform feature calculation to implement multimedia data retrieval, and the solution of the embodiment of the present application may be specifically applied to a data storage engine in the architecture, where the data storage engine may include two modules, i.e., a feature storage module and an index module, as described above.
According to the embodiment of the application, through improvement of a scheme of feature storage and indexing, the features are continuously stored in the internal memory in groups according to the indexes, and the address indexes of the feature storage areas in which the features are continuously stored are established, so that when the features in the feature storage areas are read to the operation processor, the features in the feature storage areas can be read to the operation processor without address calculation, the time for reading the features to the operation processor is shortened, and the utilization rate of the operation processor is improved.
Before describing aspects of embodiments of the present application, some concepts related to the embodiments of the present application will be described first for ease of understanding.
1. Concept of features
In the embodiment of the present application, the feature refers to data obtained by performing feature extraction on multimedia data (such as pictures, audio streams, and the like) by using one or more feature extraction algorithms to describe content attributes of one or more aspects of the multimedia data. For example, if the multimedia data is a picture, the feature may be a texture feature for describing texture of content in the picture, a color feature for describing color of content in the picture, a shape feature for describing shape of content in the picture, and the like, and is not limited to the description herein.
Features may be divided into long features and short features. The long features are full features obtained after feature extraction is carried out on the multimedia data, namely the features which are not subjected to data compression or data dimension reduction; the short feature is obtained by performing feature reduction on the multimedia data to obtain a long feature, and then performing data compression or data dimension reduction on the long feature by using a dimension reduction algorithm or a compression algorithm, for example, the short feature is obtained by performing dimension reduction on the long feature by using a principal component analysis algorithm.
2. Concept of index grouping
In the embodiment of the application, the index groups are a set of features corresponding to the same one or more pieces of descriptive information, the one or more pieces of descriptive information are descriptive information of multimedia data corresponding to the features, the descriptive information of the multimedia data is used for describing objective attributes of the multimedia data, and the objective attributes are some attribute information except content attributes of the multimedia data. For example, the multimedia data is a picture, and the descriptive information of the multimedia data may be a picture size of the picture, a shooting date of the picture, a shooting time of the picture, and the like, and is not limited to the description herein; accordingly, the index grouping may be a set of features corresponding to the same picture size and/or the same shooting date and/or the same shooting time. The one or more descriptive information may be used to distinguish index packets.
In some possible implementations, if the number of features corresponding to the same one piece of descriptive information is less than or equal to the feature number threshold, the index group may be a set of features corresponding to the same one piece of descriptive information, such as a set of features corresponding to the same shooting date; if the number of features corresponding to the same piece of descriptive information is less than the feature number threshold, the number of features corresponding to the index packet is greater than the feature number threshold, and the index packet may be a set of features corresponding to the same pieces of descriptive information, such as a set of features corresponding to the same camera and shooting date. The feature quantity threshold may be positively correlated with the capability of the arithmetic processor to perform the feature calculation in a multithread manner, that is, the greater the number of features used by the arithmetic processor to perform the feature calculation in parallel, the greater the feature quantity threshold. In one possible implementation, if hierarchical indexing is employed, the features in the index packet may be features indexed to the top K layers of data in the hierarchical index. For example, as shown in fig. 3, the top K layers of data are the index of the camera id and the index of the shooting date, the features in the index group are the features indexed by one camera id and one shooting date, and the index group is the set of the features indexed by one camera id and one shooting date.
The scheme of the embodiment of the present application is described next. Referring to fig. 5, fig. 5 is a block diagram of an apparatus for establishing an index according to an embodiment of the present disclosure, and as shown, the apparatus 20 may include a processor 201, a storage 202, an input/output interface 203, and any other similar or suitable components, which may communicate over one or more communication buses 204, where the buses 204 may be memory buses, peripheral buses, and the like.
The processor 201 may receive commands from the above-described other components (such as the memory 202, the input/output interface 203, and the like) through the bus 204, and perform calculation or data processing according to the received commands. The processor 201 may include one or more arithmetic processors. Arithmetic processors can be used to perform complex mathematical and geometric calculations. Optionally, the processor 201 may also include one or more communication processors. The communication processor can be called as a foreground processor and is used for processing all information between the communication processor and external equipment so as to prevent the operation or step being executed by the operation processor from being interrupted due to the information of the external equipment; alternatively, the communication processor may receive information of the external device through the communication interface. In some possible designs, the functions performed by the arithmetic processor and the communication processor may be integrated on a single integrated circuit or chip.
In the embodiment of the application, the arithmetic processor can be used for carrying out feature extraction and/or feature calculation on multimedia data; optionally, the arithmetic processor may further cooperate with the input/output interface 203 to execute the method steps of establishing the index in the embodiment of the present application. Specifically, the arithmetic processor may be one or more of a Central Processing Unit (CPU), a Graphics Processing Unit (GPU), a field-programmable gate array (FPGA), and the like.
The memory 202 may store commands or data received from the processor 201 or other components (e.g., input output interface 203). The memory 202 may include an internal memory. The internal memory is coupled to the operation processor, and the internal memory may include volatile memories such as a Dynamic Random Access Memory (DRAM), a Static Random Access Memory (SRAM), a Synchronous Dynamic Random Access Memory (SDRAM), and the like; the internal memory may also include a programmable read-only memory (PROM), an erasable programmable read-only memory (EPROM), an electrically erasable programmable read-only memory (EEPROM), and the like. Optionally, the memory 202 may further include an external memory, which may be coupled with the internal memory, the external memory is a nonvolatile memory, and the external memory may include a Hard Disk Drive (HDD), a solid-state drive (SSD), and the like.
In the embodiment of the present application, the memory 202 may be used for storing the characteristics and indexes of the multimedia data; optionally, the memory 202 may also be used for storing program codes for supporting the arithmetic processor to execute the method for establishing the index in the embodiment of the present application.
The input/output interface 203 may receive commands or data input via an input means (e.g., a sensor, a keyboard, a mouse, etc.) and may transmit the received commands or data to the processor 201 or the memory 202 through the bus 204. The input-output interface 203 may also be used to output various information (e.g., multimedia data, text data, etc.) received from the other components described above (e.g., the processor 201, the memory 202) to a user. Specifically, the input/output interface 203 may include external components such as a touch panel, a button, and a mouse, and further include a control circuit that functions to control the external components such as the touch panel, the button, and the mouse.
The method of the embodiment of the present application may be implemented based on the apparatus for creating an index described in the embodiment of fig. 5. Referring to fig. 6, fig. 6 is a schematic flowchart of a method for creating an index according to an embodiment of the present application, where as shown in the figure, the method includes:
s201, determining a plurality of first characteristics to be stored in an internal memory, wherein the plurality of first characteristics belong to a first index group.
In the embodiment of the application, the first feature is a feature used by the arithmetic processor for feature calculation, and if the arithmetic processor performs feature calculation by using the long feature, the first feature is the long feature; if the arithmetic processor performs feature calculation using the short features, the first feature is a short feature. The definitions of the long and short features can be found in the foregoing description and are not described herein too much.
The first plurality of features to be stored to the internal memory may be in two cases:
1) the plurality of first features to be stored to the internal memory are newly added first features, i.e., first features that are not currently either persistently stored in the internal memory or persistently stored in the external memory.
For example, the first features stored in the external memory are the first feature 1 to the first feature 200, which belong to the index group 1 to the index group 5, respectively, wherein the first feature 1 to the first feature 25 belong to the index group 1, the first feature 26 to the first feature 70 belong to the index group 2, the first feature 70 to the first feature 125 belong to the index group 3, the first feature 126 to the first feature 170 belong to the index group 4, the first feature 171 to the first feature 200 belong to the index group 5, the first features stored in the internal memory are the first feature 1 to the first feature 125, corresponding to the first feature 1 to the first feature 125 in the external memory, and a plurality of features to be stored in the internal memory are the first feature 201 to the first feature 250, and then the first feature 201 to the first feature 250 are newly added first features.
2) The plurality of first features to be stored to the internal memory are first features persistently stored in the external memory, i.e., the first features are currently stored in the external memory and are not yet stored in the internal memory.
For example, the features stored in the external memory are first feature 1 to first feature 200, which belong to index group 1 to index group 5, respectively, wherein first feature 1 to first feature 25 belong to index group 1, first feature 26 to first feature 70 belong to index group 2, first feature 70 to first feature 125 belong to index group 3, first feature 126 to first feature 170 belong to index group 4, first feature 171 to first feature 200 belong to index group 5, the features stored in the internal memory are first feature 1 to first feature 125, corresponding to first feature 1 to first feature 125 in the external memory, and the plurality of features to be stored in the internal memory are first feature 126 to first feature 170, and then first feature 126 to first feature 170 are first features persistently stored in the external memory.
The first index group is an index group, and the definition of the index group can be referred to the foregoing description, which is not described herein again. The first index packet may have the following two cases:
1) the first index packet is an index packet that is present in neither the internal memory nor the external memory, which may also be referred to as a newly added index packet. The plurality of first features are added first features if the first index grouping is an added index grouping.
For example, if the index packets included in the external memory are index packet 1 to index packet 10, the index packets included in the internal memory are index packet 1 to index packet 5, which correspond to index packet 1 to index packet 5 in the external memory, and the first index packet is index packet 11, then index packet 11 is a newly added index packet, and the first feature in index packet 11 is a newly added first feature.
2) The first index packet is an index packet that exists in the internal memory and/or the external memory, which may also be referred to as an existing index packet.
For example, if the index packets included in the external memory are index packet 1 to index packet 10, the index packets included in the internal memory are index packet 1 to index packet 5, which correspond to index packet 1 to index packet 5 in the external memory, and the first index packet is index packet 5, the first index packet 5 is an already existing index packet.
S202, storing the plurality of first characteristics into an internal memory, wherein the plurality of first characteristics are continuously stored in a first storage area of the internal memory.
In this embodiment of the application, the first storage area is a storage area corresponding to the first index group, and is used for continuously storing all the first features in the first index group. Several possible embodiments of storing the plurality of first features in the internal memory are as follows:
1) if the first index is grouped into the case 1) above, a storage area may be divided in the internal memory, the storage area may be used as a first storage area, and the plurality of first features may be stored in the first storage area continuously.
For example, if the storage space occupied by one first feature in the internal memory is a storage space corresponding to one storage address, the plurality of first features are respectively the first feature f1 to the first feature fn, and after the first features f1 to the first feature fn are successively stored in the first storage area, as shown in fig. 7A, the storage address 1 of the first feature f1 is successive to the storage address 2 of the first feature f2, the storage address 2 of the first feature f2 is successive to the storage address 3 of the second feature f3, …, the storage address n-1 of the first feature fn-1 is successive to the storage address n of the first feature fn, and the storage area corresponding to the storage address 1 to the storage address n is the first storage area.
2) If the first index group is the case 2) above, and the plurality of first features are the case 1) above, a fourth storage area where other first features in the first index group are continuously stored in the internal memory may be determined, and a storage area where a storage address is subsequent to and continuous with the fourth storage area, which is a feature of the first index group that is not the plurality of first features, may be determined as a fifth storage area, where the fourth storage area and the fifth storage area constitute the first storage area.
For example, as shown in fig. 7B, assuming that the storage space occupied by one first feature in the internal memory is a storage space corresponding to one storage address, the first features in the first index group are the first features f1 to f10, which are continuously stored in the fourth storage area corresponding to the storage addresses 1 to 10 in the internal memory, and the plurality of first features are the first features f11 to f15, the first features f11 to f15 are stored in the fifth storage area having the storage address 11 as the starting storage address, that is, in the storage area corresponding to the storage addresses 11 to 15, and the storage area corresponding to the storage addresses 1 to 15 is the first storage area.
3) If the first index is grouped into the case 2) above, and the plurality of first features are the cases 2) above, one storage area may be divided in the internal storage area, and the plurality of first features may be stored in the first storage area consecutively.
For example, as shown in fig. 7C, assuming that the storage space occupied by a first feature in the internal memory is the storage space corresponding to a storage address, the index groups included in the external memory are index group 1 to index group 10, the index groups included in the internal memory are index group 1 to index group 5, which correspond to index group 1 to index group 5 in the external memory, the plurality of first features are first features in index group 6, which are first feature f61 to first feature f6n, respectively, and n is the number of first features in index group 6, after the first feature in index group 6 is continuously stored in the first storage region, storage address 61 of first feature f61 is adjacent to storage address 62 of first feature f62, storage address 62 of first feature f62 is adjacent to storage address 63 of second feature f63, …, storage address 6(n-1) of first feature f6(n-1) is adjacent to storage address 63 of first feature f6n, …, and storage address 6(n-1) of first feature f6(n-1) is adjacent to storage address n of first feature f n The addresses 6n are adjacent.
S203, establishing address index information of the first storage area, wherein the address index information is used for indicating the address of the first storage area in the internal memory.
In this embodiment of the application, the address index information of the first storage area may have the following several possible implementation forms:
1) the address index information comprises a first pointer and a first storage space length, the value of the first pointer is a first storage address of a first storage area, and the first storage space length is the storage space length of the first storage area, wherein the storage space length of the first storage area is equal to the length of a storage space occupied by all first features in the first index packet in the internal memory.
For example, as shown in fig. 7A, the value of the first pointer is storage address 1, and the length of the storage space of the first storage area is equal to the length of the storage space corresponding to a single storage address.
2) The address information comprises a first pointer and a second pointer, the value of the first pointer is the first storage address of the first storage area, and the value of the second pointer is the last storage address of the first storage area.
For example, as shown in fig. 7A, the first storage area has a value of storage address 1, and the second storage area has a value of storage address n.
3) The address index information comprises a first pointer and a remaining storage space length, wherein the value of the first pointer is a first storage address of the first storage area, and the remaining storage space length is a difference value between the maximum storage space length of the first storage area and the storage space length of the first storage area, wherein the maximum storage space length of the first storage area is the length of a storage space occupied by all first features in the first index packet in the internal memory when the number of the first features in the first index packet reaches a maximum number, and the maximum number can be the maximum number of features read to the arithmetic processor at one time; the length of the storage space of the first storage area is equal to the length of the storage space occupied in the internal memory by all the first features in the first index packet.
For example, as shown in fig. 7A, if the maximum storage space length of the first storage area is m, the value of the first pointer is storage address 1, and the remaining storage space length is equal to m — length of storage space corresponding to a single storage address ×.n.
It should be understood that the foregoing descriptions of the address index information are merely examples of specific implementation forms of the address index information in the embodiments of the present application, and in alternative implementations, the address index information may also have other implementation forms, for example, the address index information may also be contents stored in a data table that has been created or has not been created, and the contents may be a first storage address of the first storage area and a storage space length of the first storage area; or, the first storage address and the last storage address of the first storage area; or the first storage address and the remaining storage space length of the first storage area.
For example, if the first storage area is shown in fig. 7A, and the maximum storage space length of the first storage area is m, the contents stored in the data table may be as shown in table 1, table 2, or table 3.
First memory address Storage space length of first storage area
Memory address 1 Length n of address space corresponding to single memory address
TABLE 1
First memory address Last memory address
Memory address 1 Memory address n
TABLE 2
First memory address Length of remaining storage space
Memory address 1 m-length n of address space corresponding to single memory address
TABLE 3
Here, the address index information for establishing the first storage area may have the following possible embodiments:
1) if the first index grouping is the above-described case 1) or if the first index grouping is the above-described case 2) and the plurality of first features are the above-described case 2), address index information of the first storage area, that is, resources such as an allocation pointer or a data table, may be created to be used as the address index information of the first storage area.
2) If the first index grouping is the case 2) above, and the plurality of first features are the cases 1) above, the address index information of the first storage area may be modified, and in a specific implementation, the address index information of the first storage area may be modified by modifying a value of a pointer of the address index information, or modifying contents in a data table, or modifying a length of a storage space of the first storage area.
S204, reading the first characteristics in the first index grouping to the operation processor according to the address index information.
In the embodiment of the present application, the implementation manner of reading the first feature to the operation processor may include the following several implementations, corresponding to different forms of the address index information:
1) the first storage address of the first storage area can be determined according to the first pointer, and data with the length of the first storage space is read from the storage area corresponding to the first storage address and stored in the arithmetic processor.
For example, as shown in fig. 7A, if the value of the first pointer is storage address 1, and the length of the storage space of the first storage area is equal to the length of the address space corresponding to a single storage address, data having a length of the address space corresponding to a single storage address is read from the storage area corresponding to storage address 1, and stored in the arithmetic processor, that is, data in the storage areas corresponding to storage addresses 1 to n are read and stored in the arithmetic processor.
2) The first storage address of the first storage area can be determined according to the first pointer, the last storage address of the first storage area is determined according to the second pointer, data in the storage areas corresponding to the plurality of storage addresses are continuously read from the data in the storage area corresponding to the first storage address until the data in the storage area corresponding to the last storage address are read, and the continuously read data are stored in the operation processor.
For example, as shown in fig. 7A, if the value of the first pointer is storage address 1 and the value of the second pointer is storage address n, the data in the storage areas corresponding to a plurality of storage addresses are read continuously from the data in the storage area corresponding to storage address 1 until the data in the storage area corresponding to storage address n is read, and the continuously read data are stored in the arithmetic processor, that is, the data in the storage areas corresponding to storage addresses 1 to n are read continuously and stored in the arithmetic processor.
3) The first storage address of the first storage area can be determined according to the first pointer, the total length of the first feature stored in the first storage area is determined according to the maximum storage space length and the remaining storage space length, and data with the total length is read and stored to the arithmetic processor.
For example, as shown in fig. 7A, the first storage area has a value of storage address 1, a maximum storage space length is m, and a remaining storage space length is m — equal to the length of the address space corresponding to a single storage address, it is determined that the total length of the first feature stored in the first storage area is the length of the address space corresponding to a single storage address, and data having a length of the address space corresponding to a single storage address is read from the storage area corresponding to storage address 1 and stored in the arithmetic processor, that is, data in the storage areas corresponding to storage addresses 1 to n are read and stored in the arithmetic processor.
4) The first storage address of the first storage area and the storage space length of the first storage area, or the first storage address and the last storage address of the first storage area, may be obtained from the data table; or, the first storage address and the remaining storage space length of the first storage area read data according to the content acquired in the data table in the manner of 1) to 3) and store the data in the arithmetic processor.
In the embodiment of the application, the first features belonging to one index group are continuously stored in the internal memory, and the address index information of the storage area continuously storing the first features is established, so that when the first features of the index group are read into the operation processor, the plurality of first features in the index group can be directly read into the operation processor according to the address index information, the storage addresses of the first features in the index group do not need to be sequentially calculated, the pointer jump does not need to be carried out for a plurality of times, the time for reading the plurality of first features into the operation processor is saved, and the efficiency of the operation processor is improved.
In some possible scenarios, while storing the first feature in the first index packet to the internal memory, or after storing the first feature in the first index packet to the internal memory, the method may further store auxiliary information corresponding to the first feature to the internal memory, and the method embodiment may further include:
storing auxiliary information corresponding to the first characteristic into an internal memory, wherein the auxiliary information is stored in at least one second storage area of the internal memory; establishing address index information of a second storage area and an incidence relation between the first storage area and the second storage area, wherein the address index information of the second storage area is used for indicating an address of the second storage area in an internal memory; and the incidence relation between the first storage area and the second storage area is used by the operation processor to determine auxiliary information corresponding to a second characteristic, wherein the second characteristic is a first characteristic determined by the operation processor after characteristic calculation.
Here, the auxiliary information corresponding to the first feature may be descriptive information of the first multimedia data, which is not used to distinguish the index packet, and descriptive information of the first feature, which is used to describe the multimedia data or an objective attribute of the feature, the first multimedia data being multimedia data corresponding to the first feature. Taking multimedia data as an example, the descriptive information of the picture is an identifier of a shooting camera of the picture, the size of the picture, the shooting date of the picture, and the shooting time of the picture, wherein the identifier of the shooting camera of the picture and the shooting date of the picture are used for distinguishing index groups, that is, a feature corresponding to the identifier of one shooting camera and one shooting date is a feature in one index group, and the auxiliary information of the first multimedia data corresponding to the first feature information can be the size of the picture and the shooting time of the picture; the descriptive information of the first feature may also be version information of the first feature, etc.
The second storage area is a storage area in which auxiliary information corresponding to the first feature in the first index packet is stored.
In a possible embodiment, the form of the address index information of the second storage area may be the same as the form of the address index information of the first storage area, i.e. if the address index information of the first storage area is a pointer, the address index information of the second storage area is a pointer; if the address index information of the first storage area is the content in the data table, the address index information of the second storage area is the content of the data table, and the like.
Specifically, if the address index information of the first storage area is a pointer, the address index information of the second storage area may include at least one third pointer, the value of the third pointer being the first storage address of the second storage area, where the number of the third pointers is equal to the number of the second storage areas. If the address index information of the first storage area is the content in the data table, the first storage address of the at least one second storage area may be stored in the data table.
The association relationship between the first storage area and the second storage area can be established by arranging pointers (such as the first pointer and the second pointer) pointing to the first storage area and pointers pointing to the second storage area in one structural body. The address index information of the first storage area and the address index information of the second storage area may also be stored in the same data table to establish an association relationship between the first storage area and the second storage area.
Here, the second feature is a feature for determining a result finally output by the arithmetic processor, and the finally output result refers to a result obtained when the purpose is achieved by the feature calculation. For example, the image retrieval is realized by using feature calculation, if a piece of image with the highest similarity to the image input by the user is retrieved, the result finally output by the arithmetic processor is a piece of image with the highest similarity to the image used for input, and the second feature is the feature used for determining the piece of image with the highest similarity.
By storing the auxiliary information corresponding to the first feature in the second storage area of the internal memory and establishing the address index information of the second storage area and the association relationship between the first storage area and the second storage area, when the first feature is the feature determined after feature calculation, the storage address of the auxiliary information corresponding to the first feature in the internal memory can be found according to the association relationship between the second storage area and the first storage area and the address index information of the second storage area, so that the auxiliary information corresponding to the first feature is obtained, and the multimedia data corresponding to the auxiliary information can be found according to the auxiliary information, so that the purpose to be achieved through feature calculation is achieved.
In some possible scenarios, the first feature may also be tagged by an index packet to indicate whether the first feature is already stored in the arithmetic processor. After the first feature is read to the arithmetic processor, the grouping mark of the first index grouping can be set as a first mark, and the first mark is used for indicating that the first feature in the first index grouping is already stored in the arithmetic processor; in the event that the first feature in the first index packet has been removed from the arithmetic processor, the packet flag of the first index packet may be set to a second flag indicating that the first feature in the first index packet is not stored in the arithmetic processor.
In a specific implementation, 0 and 1 may be used as the first mark and the second mark, respectively. For example, 1 may be used as the first flag, 0 may be used as the second flag, if the packet flag of the first index packet is 1, the first feature in the first index packet is indicated in the arithmetic processor, and if the packet flag of the first index packet is 0, the first feature in the first index packet is not indicated in the arithmetic processor.
The first characteristics are marked according to the index grouping, so that the first characteristics are marked according to the batches, and compared with the method for marking the characteristics respectively, the quantity of marks can be saved, and the storage space is saved to a certain extent.
In some possible scenarios, in a case where the storage capacity of the arithmetic processor is relatively large, if the first feature is a common feature that is frequently used for feature calculation, the first feature may be continuously stored in the arithmetic processor at all times, so that the arithmetic processor may directly read the first feature from the arithmetic processor when feature calculation using the first feature is required. That is, after the first feature in the first index packet is read to the arithmetic processor, the first feature may be continuously stored in the third storage area of the arithmetic processor all the time, and the address index information of the third storage area and the association relationship between the first storage area and the third storage area may be established.
Here, the third storage area is a storage area in which the first feature in the first index packet is stored in the arithmetic processor continuously.
In a possible embodiment, the form of the address index information of the third storage area may be the same as the form of the address index information of the first storage area, i.e. if the address index information of the first storage area is a pointer, the address index information of the third storage area is a pointer; if the address index information of the first storage area is the content in the data table, the address index information of the third storage area is the content of the data table, and the like.
Specifically, if the address index information of the first storage area is a pointer, the address index information of the third storage area may be a fourth pointer whose value is the first storage address of the third storage area. If the address index information of the first storage area is the content in the data table, the first storage address of the at least one second storage area may be stored in the data table.
The association relationship between the first storage area and the third storage area can be established by arranging pointers (such as the first pointer and the second pointer) pointing to the first storage area and pointers pointing to the third storage area in one structural body. The address index information of the first storage area and the address index information of the third storage area can also be stored in the same data table to establish the association relationship between the first storage area and the third storage area.
By always storing the first feature in the first index packet in the operation processor, when the operation processor performs feature calculation by using the first feature, the storage address of the first feature in the operation processor can be determined according to the association relationship between the third storage area and the first storage area and the address index information of the third storage area, and then the first feature is directly acquired from the operation processor to perform feature calculation without reading from the internal memory, so that the time for reading the first feature to the internal memory is saved.
In a possible embodiment, the first feature may also persist the features stored in the external memory in a column-wise manner, i.e. the first feature is stored in series with the external memory. Persisting the first feature in the external memory in a column store may speed up the time to read the first feature from the external memory to the internal memory.
In order to better understand the scheme of the present application, a scheme of storing features, storing auxiliary information, and storing and indexing the features obtained after the index is established according to the scheme of the above method embodiment is described below by using a specific example. Still taking multimedia data as an example, assuming that the index grouping is a set of features corresponding to a camera identifier and a shooting date, the index is still a three-layer index, and the form of address index information is a pointer, a scheme of storing the features and establishing the index according to the above scheme can be as shown in fig. 8, and fig. 8 is a schematic diagram of a scheme of storing and indexing the features provided by the embodiment of the present application. In this scheme, the first-tier index 81a is an index of a camera id, and the first-tier index 81a can index to a camera id storage area 81b for storing the camera id, and the camera id storage area 81b can point to the second-tier index 82a by means of a pointer. The second-tier index 82a is an index of a shooting date, and the shooting date can be indexed to a shooting date storage area 82b for storing a shooting date by the second-tier index 82a, and the shooting date storage area 82b can point to the third-tier index by means of a pointer. The third-level index includes a feature storage area 83a for storing features and an auxiliary information storage area 83b for storing auxiliary information corresponding to the features, a feature index storage area 83c for storing a feature index, and a model storage area 83d (not shown in fig. 8) for storing an algorithm model. The feature storage area 83a includes a feature storage area 83a1 in the internal memory, a feature storage area 83a2 (not shown in fig. 8) in the external internal memory, and a feature storage area 83a3 in the arithmetic processor. In the feature storage area 83a1, the feature storage area 83a2, and the feature storage area 83a3, features indexed by one shooting id and one shooting date are successively stored together to form one feature storage area. One shot id and one shot date are indexed to the feature storage area 83a1 by the pointer 1 and the storage space length L of one feature storage area, respectively, and are indexed to the feature storage area 83a3 by the pointer 2. A shooting id and a shooting date are also indexed to the auxiliary information storage area 83c by the pointer 3. Associating together the pointer 1, the pointer 2, the pointer 3, and the storage space length L of one feature storage area establishes an association among the feature storage area 83a3, the auxiliary information storage area 83a1, and the feature storage area 83 c.
As can be seen from fig. 8, since the features of one index packet are continuously stored in one feature storage area of the internal memory, when the features of one index packet are to be read into the arithmetic processor, only the data in the storage area corresponding to the storage address with the address range { the storage address pointed by the pointer 1, the storage address pointed by the pointer 1 + L } needs to be read into the arithmetic processor, and the calculation of the storage address of each feature and the pointer jump are not needed, so that the time for reading the features of one index packet into the arithmetic processor is shortened, and the utilization rate of the arithmetic processor is improved.
Next, description will be made on the relationship between the time when the features of the index packet are read to the arithmetic processor and the utilization rate of the arithmetic processor. Still taking the retrieval scenario of multimedia data as an example, assume that there are m features to be input feature calculation, where the m input features are features obtained by feature extraction of one or more multimedia data input by a user. The method comprises the steps that k features exist in each index group, the average time required by the calculation processor for carrying out full-load parallel processing on the features between 1 input feature and the feature in one index group is t1, the calculation resource of the calculation processor is occupied by the calculation between 1 input feature and the feature in one index group, 1/% a < ═ m × k, and the time for reading the feature of one index group into the cache of the calculation processor is t 2.
The conditions that the optimized cache management needs to satisfy are:
mkt1 a% > (t 2 >) (m-1) kt1 a%; that is, the time to read the features in one index packet to the arithmetic processor is made as close as possible to the time required for the arithmetic processor to process feature calculations between one input feature and the features in one index packet.
Among the above conditions, when t2 is large, m is large, i.e., there are many input features, the full-load operation of the arithmetic processor can be performed only when there are many input features that require feature calculation, and when t2 is small, m is small, and when m is small, there are few inputs, i.e., there are few input features that require feature calculation, the full-load operation of the arithmetic processor can be performed.
According to the principle, under the condition that the input features required to be subjected to feature calculation are less, the scheme of the embodiment of the application can enable the operation processor to run at full load, and the utilization rate of the operation processor is improved.
The method of the embodiments of the present application is described above, followed by the apparatus of the embodiments of the present application.
Referring to fig. 9, fig. 9 is a schematic structural diagram of an apparatus for creating an index according to an embodiment of the present application, and as shown in the drawing, the apparatus 90 includes:
a feature determining module 901, configured to determine a plurality of first features to be stored in an internal memory, where the plurality of first features belong to a first index group, the first features are used by an arithmetic processor for feature calculation, and the first features are features obtained by feature extraction and/or processing of multimedia data;
a feature information storage module 902, configured to store the plurality of first features in the internal memory, where the plurality of first features are stored in a first storage area of the internal memory consecutively;
an address index establishing module 903, configured to establish address index information of the first storage area, where the address index information is used to indicate an address of the first storage area in the internal memory;
a feature reading module 904, configured to read the first feature to the operation processor according to the address index information.
In one possible implementation, the first feature is a feature that is not currently stored in the internal memory.
In one possible implementation, the first feature is a feature that persists in external memory.
In one possible embodiment, the first feature is a feature that is stored persistently in the external memory in a column store.
In one possible implementation, the address index information of the first storage area includes a first pointer and a first storage space length, the value of the first pointer is a first storage address of the first storage area, and the first storage space length is a storage space length of the first storage area.
In a possible implementation, the address index information of the first storage area includes a first pointer and a second pointer, the value of the first pointer is the first storage address of the first storage area, and the value of the second pointer is the last storage address of the first storage area.
In a possible implementation, the characteristic index of the first storage area includes a first pointer and a remaining storage space length, the value of the first pointer is a first storage address in the first storage area, and the remaining storage space length is a difference between a maximum storage space length of the first storage area and a storage space length of the first storage area.
In a possible embodiment, the apparatus 90 may further comprise:
an auxiliary information storage module 905, configured to store auxiliary information corresponding to the first feature in the internal memory, where the auxiliary information is descriptive information of multimedia data corresponding to the first feature and/or descriptive information of the first feature, and the auxiliary information is stored in at least one second storage area of the internal memory;
an auxiliary index establishing module 906, configured to establish address index information of the second storage area and an association relationship between the first storage area and the second storage area, where the address index information of the second storage area is used to indicate an address of the second storage area in the internal memory;
the address index information of the second storage area and the association relationship between the first storage area and the second storage area are used by the operation processor to determine auxiliary information corresponding to a second feature, wherein the second feature is a first feature determined by the operation processor after feature calculation.
In a possible embodiment, the apparatus 90 may further comprise:
a flag setting module 907, configured to set, after the feature reading module reads the first feature to the operation processor, a grouping flag of the first index grouping to a first flag, where the first flag is used to indicate that the first feature is already stored in the operation processor.
In a possible implementation, the flag setting module 907 is further configured to: if the first feature is removed from the operation processor, setting a grouping flag of the first index grouping to a second flag, the second flag indicating that the first feature is not stored in the operation processor.
In a possible embodiment, the apparatus 90 may further comprise:
a feature index establishing module 908, configured to continuously store the first feature in a third storage area of the arithmetic processor all the time after the feature reading module reads the first feature to the arithmetic processor, and establish address index information of the third storage area and an association relationship between the first storage area and the third storage area;
the address index information of the third storage area and the association relationship between the first storage area and the third storage area are used by the arithmetic processor, and the first feature is directly read in the arithmetic processor for feature calculation according to the address index information of the third storage area and the association relationship between the first storage area and the third storage area.
It should be noted that, for details that are not mentioned in the embodiment corresponding to fig. 9 and the specific implementation manner of the step executed by each module, reference may be made to the description of the method embodiment, and details are not described here again.
The embodiment of the present application further provides a computer storage medium, which can be used to store computer software instructions for the above apparatus for creating an index, and which contains a program designed to execute the above apparatus for creating an index in the embodiment. The storage medium includes, but is not limited to, flash memory, hard disk, solid state disk.
The embodiment of the present application further provides a computer program product, and when the computer program product is executed by a computing device, the method for creating an index, which is designed for the apparatus for creating an index in the foregoing method embodiments, may be executed.
In the above embodiments, the implementation may be wholly or partially realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer instructions may be stored in or transmitted over a computer-readable storage medium. The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device, such as a server, a data center, etc., that incorporates one or more of the available media. The usable medium may be a semiconductor medium (e.g., SSD), etc.
Those of ordinary skill in the art will appreciate that the various illustrative modules and method steps described in connection with the embodiments disclosed herein may be implemented as electronic hardware, or combinations of computer software and electronic hardware. Whether such functionality is implemented as hardware or software depends upon the particular application and design constraints imposed on the implementation. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
It should be noted that the first, second, third, fourth and various numbers related to the embodiments of the present application are merely for convenience of description and are not intended to limit the scope of the embodiments of the present application.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (24)

1. A method of creating an index, comprising:
determining a plurality of first features to be stored in an internal memory, wherein the plurality of first features belong to a first index group, the first features are used for feature calculation by an arithmetic processor, and the first features are features obtained by feature extraction and/or processing of multimedia data;
storing the plurality of first features to the internal memory, wherein the plurality of first features are stored consecutively in a first storage area of the internal memory;
establishing address index information of the first storage area, wherein the address index information is used for indicating the address of the first storage area in the internal memory;
and reading the first characteristic to the operation processor according to the address index information.
2. The method of claim 1, wherein the first feature is a feature not currently stored in the internal memory.
3. The method of claim 1 or 2, wherein the first characteristic is a characteristic persistently stored in an external memory.
4. The method of claim 3, wherein the first characteristic is a characteristic that is persistently stored in the external memory in a column store manner.
5. The method according to any of claims 1-4, wherein the address index information comprises a first pointer and a first memory space length, the value of the first pointer is a first memory address of the first memory region, and the first memory space length is a memory space length of the first memory region.
6. The method according to any one of claims 1 to 4, wherein the address index information includes a first pointer and a second pointer, the value of the first pointer is a first storage address of the first storage area, and the value of the second pointer is a last storage address of the first storage area.
7. The method according to any of claims 1-4, wherein the feature index comprises a first pointer and a remaining memory length, the first pointer having a first memory address of the first memory region, and the remaining memory length being a difference between a maximum memory length of the first memory region and a memory length of the first memory region.
8. The method according to any one of claims 1-7, further comprising:
storing auxiliary information corresponding to the first feature into the internal memory, wherein the auxiliary information is descriptive information of multimedia data corresponding to the first feature and/or descriptive information of the first feature, and the auxiliary information is stored in at least one second storage area of the internal memory;
establishing address index information of the second storage area and an association relation between the first storage area and the second storage area, wherein the address index information of the second storage area is used for indicating the address of the second storage area in the internal memory;
the address index information of the second storage area and the association relationship between the first storage area and the second storage area are used by the operation processor to determine auxiliary information corresponding to a second feature, wherein the second feature is a first feature determined by the operation processor after feature calculation.
9. The method according to any one of claims 1-8, wherein after reading the first feature to the operation processor according to the address index information, further comprising:
setting a packet flag of the first index packet to a first flag indicating that the first feature has been stored in the arithmetic processor.
10. The method of claim 9, wherein after reading the first feature to the operation processor according to the address index information, further comprising:
if the first feature is removed from the operation processor, setting a grouping flag of the first index grouping to a second flag, the second flag indicating that the first feature is not stored in the operation processor.
11. The method according to any one of claims 1-10, wherein after reading the first feature to the operation processor according to the address index information, further comprising:
continuously storing the first characteristics in a third storage area of the arithmetic processor all the time, and establishing address index information of the third storage area and an association relation between the first storage area and the third storage area;
the address index information of the third storage area and the association relationship between the first storage area and the third storage area are used by the arithmetic processor, and the first feature is directly read in the arithmetic processor for feature calculation according to the address index information of the third storage area and the association relationship between the first storage area and the third storage area.
12. An apparatus for creating an index, comprising:
the device comprises a characteristic determining module, a characteristic calculating module and a characteristic calculating module, wherein the characteristic determining module is used for determining a plurality of first characteristics to be stored in an internal memory, the plurality of first characteristics belong to a first index group, the first characteristics are used for characteristic calculation by an arithmetic processor, and the first characteristics are characteristics obtained by characteristic extraction and/or processing of multimedia data;
the characteristic information storage module is used for storing the first characteristics to the internal memory, wherein the first characteristics are continuously stored in a first storage area of the internal memory;
an address index establishing module, configured to establish address index information of the first storage area, where the address index information is used to indicate an address of the first storage area in the internal memory;
and the characteristic reading module is used for reading the first characteristic to the operation processor according to the address index information.
13. The apparatus of claim 12, wherein the first feature is a feature not currently stored in the internal memory.
14. The apparatus of claim 12, wherein the first feature is a feature persistently stored in external memory.
15. The apparatus of claim 12, wherein the first characteristic is a characteristic that is persistently stored in the external memory in a column store manner.
16. The apparatus according to any of claims 12-15, wherein the address index information comprises a first pointer and a first storage space length, a value of the first pointer is a first storage address of the first storage area, and the first storage space length is a storage space length of the first storage area.
17. The apparatus according to any one of claims 12-15, wherein the address index information comprises a first pointer and a second pointer, a value of the first pointer is a first storage address of the first storage area, and a value of the second pointer is a last storage address of the first storage area.
18. The apparatus according to any of claims 12-15, wherein the feature index comprises a first pointer and a remaining memory length, the first pointer having a value of a first memory address in the first memory area, and the remaining memory length being a difference between a maximum memory length of the first memory area and a memory length of the first memory area.
19. The apparatus of any one of claims 12-18, further comprising:
an auxiliary information storage module, configured to store auxiliary information corresponding to the first feature in the internal memory, where the auxiliary information is descriptive information of multimedia data corresponding to the first feature and/or descriptive information of the first feature, and the auxiliary information is stored in at least one second storage area of the internal memory;
an auxiliary index establishing module, configured to establish address index information of the second storage area and an association relationship between the first storage area and the second storage area, where the address index information of the second storage area is used to indicate an address of the second storage area in the internal memory;
the address index information of the second storage area and the association relationship between the first storage area and the second storage area are used by the operation processor to determine auxiliary information corresponding to a second feature, wherein the second feature is a first feature determined by the operation processor after feature calculation.
20. The apparatus of any one of claims 12-19, further comprising:
a flag setting module, configured to set a grouping flag of the first index grouping to a first flag after the feature reading module reads the first feature to the operation processor, where the first flag is used to indicate that the first feature has been stored in the operation processor.
21. The apparatus of claim 20, wherein the flag setting module is further configured to: if the first feature is removed from the operation processor, setting a grouping flag of the first index grouping to a second flag, the second flag indicating that the first feature is not stored in the operation processor.
22. The apparatus of any one of claims 12-21, further comprising:
a feature index establishing module, configured to continuously store the first feature in a third storage area of the arithmetic processor all the time after the feature reading module reads the first feature to the arithmetic processor, and establish address index information of the third storage area and an association relationship between the first storage area and the third storage area;
the address index information of the third storage area and the association relationship between the first storage area and the third storage area are used by the arithmetic processor, and the first feature is directly read in the arithmetic processor for feature calculation according to the address index information of the third storage area and the association relationship between the first storage area and the third storage area.
23. An apparatus for indexing, comprising a processor and a memory, the processor being connected to the memory, wherein the processor comprises an arithmetic processor, the memory comprises an internal memory, the internal memory is a memory coupled to the arithmetic processor, the memory is used for storing program codes, and the processor is used for calling the program codes and executing the following operations:
determining a plurality of first features to be stored in the internal memory, wherein the plurality of first features belong to a first index group, the first features are used for feature calculation by the arithmetic processor, and the first features are features obtained by feature extraction and/or processing of multimedia data;
storing the plurality of first features to the internal memory, wherein the plurality of first features are stored consecutively in a first storage area of the internal memory;
establishing address index information of the first storage area, wherein the address index information is used for indicating the address of the first storage area in the internal memory;
and reading the first characteristic to the operation processor according to the address index information.
24. A computer storage medium, characterized in that the computer storage medium stores a computer program comprising program instructions which, when executed by a computer, cause the computer to perform the method according to any one of claims 1-11.
CN201810986041.0A 2018-08-27 2018-08-27 Method for establishing index and related device Pending CN110866127A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810986041.0A CN110866127A (en) 2018-08-27 2018-08-27 Method for establishing index and related device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810986041.0A CN110866127A (en) 2018-08-27 2018-08-27 Method for establishing index and related device

Publications (1)

Publication Number Publication Date
CN110866127A true CN110866127A (en) 2020-03-06

Family

ID=69651693

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810986041.0A Pending CN110866127A (en) 2018-08-27 2018-08-27 Method for establishing index and related device

Country Status (1)

Country Link
CN (1) CN110866127A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835899A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Address library indexing method, address matching method and related equipment
CN113449155A (en) * 2021-07-16 2021-09-28 百度在线网络技术(北京)有限公司 Method, apparatus, device, medium and program product for feature representation processing

Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003198596A (en) * 2001-12-27 2003-07-11 Furukawa Electric Co Ltd:The Address table management method and address table management device
CN101731013A (en) * 2007-07-02 2010-06-09 弗劳恩霍夫应用研究促进协会 Apparatus and method for processing and reading a file having a media data container and a metadata container
US20110161618A1 (en) * 2009-12-30 2011-06-30 International Business Machines Corporation Assigning efficiently referenced globally unique identifiers in a multi-core environment
CN103646063A (en) * 2013-11-27 2014-03-19 中国航天科技集团公司第五研究院第五一三研究所 Satellite-borne high-speed file management system
CN103841135A (en) * 2012-11-22 2014-06-04 腾讯科技(深圳)有限公司 File accelerative download method and apparatus
CN103870544A (en) * 2014-02-25 2014-06-18 小米科技有限责任公司 Method and device for virtually operating file, and electronic equipment
CN104462141A (en) * 2013-09-24 2015-03-25 中国移动通信集团重庆有限公司 Data storage and query method and system and storage engine device
CN104657362A (en) * 2013-11-18 2015-05-27 深圳市腾讯计算机系统有限公司 Method and device for storing and querying data
CN104794065A (en) * 2015-05-04 2015-07-22 常州工学院 Multi-group fixed length data circulation access method
CN105335387A (en) * 2014-07-04 2016-02-17 杭州海康威视系统技术有限公司 Retrieval method for video cloud storage system
CN105468675A (en) * 2015-11-13 2016-04-06 四川长虹电器股份有限公司 Picture display system and method combining figure identification and holiday and festival information
CN106294352A (en) * 2015-05-13 2017-01-04 姚猛 A kind of document handling method, device and file system
CN108279943A (en) * 2017-01-05 2018-07-13 腾讯科技(深圳)有限公司 Index loading method and device
CN108376177A (en) * 2018-03-15 2018-08-07 百度在线网络技术(北京)有限公司 Method and distributed system for handling information

Patent Citations (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003198596A (en) * 2001-12-27 2003-07-11 Furukawa Electric Co Ltd:The Address table management method and address table management device
CN101731013A (en) * 2007-07-02 2010-06-09 弗劳恩霍夫应用研究促进协会 Apparatus and method for processing and reading a file having a media data container and a metadata container
US20110161618A1 (en) * 2009-12-30 2011-06-30 International Business Machines Corporation Assigning efficiently referenced globally unique identifiers in a multi-core environment
CN103841135A (en) * 2012-11-22 2014-06-04 腾讯科技(深圳)有限公司 File accelerative download method and apparatus
CN104462141A (en) * 2013-09-24 2015-03-25 中国移动通信集团重庆有限公司 Data storage and query method and system and storage engine device
CN104657362A (en) * 2013-11-18 2015-05-27 深圳市腾讯计算机系统有限公司 Method and device for storing and querying data
CN103646063A (en) * 2013-11-27 2014-03-19 中国航天科技集团公司第五研究院第五一三研究所 Satellite-borne high-speed file management system
CN103870544A (en) * 2014-02-25 2014-06-18 小米科技有限责任公司 Method and device for virtually operating file, and electronic equipment
CN105335387A (en) * 2014-07-04 2016-02-17 杭州海康威视系统技术有限公司 Retrieval method for video cloud storage system
CN104794065A (en) * 2015-05-04 2015-07-22 常州工学院 Multi-group fixed length data circulation access method
CN106294352A (en) * 2015-05-13 2017-01-04 姚猛 A kind of document handling method, device and file system
CN105468675A (en) * 2015-11-13 2016-04-06 四川长虹电器股份有限公司 Picture display system and method combining figure identification and holiday and festival information
CN108279943A (en) * 2017-01-05 2018-07-13 腾讯科技(深圳)有限公司 Index loading method and device
CN108376177A (en) * 2018-03-15 2018-08-07 百度在线网络技术(北京)有限公司 Method and distributed system for handling information

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112835899A (en) * 2021-01-29 2021-05-25 上海寻梦信息技术有限公司 Address library indexing method, address matching method and related equipment
CN113449155A (en) * 2021-07-16 2021-09-28 百度在线网络技术(北京)有限公司 Method, apparatus, device, medium and program product for feature representation processing
CN113449155B (en) * 2021-07-16 2024-02-27 百度在线网络技术(北京)有限公司 Method, apparatus, device and medium for feature representation processing

Similar Documents

Publication Publication Date Title
CN102129425B (en) The access method of big object set table and device in data warehouse
US20160117116A1 (en) Electronic device and a method for managing memory space thereof
CN107247722B (en) File scanning method and device and intelligent terminal
CN110061930B (en) Method and device for determining data flow limitation and flow limiting values
CN111597548B (en) Data processing method and device for realizing privacy protection
CN106649538A (en) Method and device for finding human faces
CN103294799B (en) A kind of data parallel batch imports the method and system of read-only inquiry system
CN110866127A (en) Method for establishing index and related device
CN110968585B (en) Storage method, device, equipment and computer readable storage medium for alignment
CN111813517A (en) Task queue allocation method and device, computer equipment and medium
CN105488176A (en) Data processing method and device
WO2022007596A1 (en) Image retrieval system, method and apparatus
CN112860412B (en) Service data processing method and device, electronic equipment and storage medium
CN112288340B (en) Logistics order dispatching method and device, computer equipment and storage medium
US11429660B2 (en) Photo processing method, device and computer equipment
CN116166583B (en) Data precision conversion method and device, DMA controller and medium
CN110764705B (en) Data reading and writing method, device, equipment and storage medium
CN112015718A (en) HBase cluster balancing method and device, electronic equipment and storage medium
CN113568877A (en) File merging method and device, electronic equipment and storage medium
CN116227599A (en) Inference model optimization method and device, electronic equipment and storage medium
CN113010570B (en) Power grid equipment vector data query method and device, computer equipment and medium
CN113807555B (en) Address selection method and device for distribution center, electronic equipment and storage medium
CN114547384A (en) Resource object processing method and device and computer equipment
CN114691612A (en) Data writing method and device and data reading method and device
CN111127592A (en) Picture color filling method and device, electronic equipment and readable storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination