WO2023124135A1 - 特征检索方法、装置、电子设备、计算机存储介质和程序 - Google Patents

特征检索方法、装置、电子设备、计算机存储介质和程序 Download PDF

Info

Publication number
WO2023124135A1
WO2023124135A1 PCT/CN2022/113931 CN2022113931W WO2023124135A1 WO 2023124135 A1 WO2023124135 A1 WO 2023124135A1 CN 2022113931 W CN2022113931 W CN 2022113931W WO 2023124135 A1 WO2023124135 A1 WO 2023124135A1
Authority
WO
WIPO (PCT)
Prior art keywords
retrieval
retrieved
feature
results
sub
Prior art date
Application number
PCT/CN2022/113931
Other languages
English (en)
French (fr)
Inventor
贺文峰
李想
李双超
金潇
门琪滨
Original Assignee
上海商汤智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海商汤智能科技有限公司 filed Critical 上海商汤智能科技有限公司
Publication of WO2023124135A1 publication Critical patent/WO2023124135A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/54Interprogram communication

Definitions

  • the present disclosure relates to computer vision technology, and relates to but not limited to a feature retrieval method, device, electronic equipment, computer storage medium and computer program.
  • Embodiments of the present disclosure provide a feature retrieval method, device, electronic equipment, computer storage medium, and computer program.
  • An embodiment of the present disclosure provides a feature retrieval method, the method comprising:
  • the retrieval results of the features to be retrieved in the base library to be retrieved are determined.
  • said creating at least one child process by using each main process in at least one main process includes:
  • each of the main processes obtain the index file of the base library fragment corresponding to the assigned retrieval subtask, create an output queue for the main process to output data to the at least one subprocess, and use An input queue for receiving data of the at least one sub-process, and a communication pipeline between the main process and the at least one sub-process;
  • the embodiment of the present disclosure can create an Sub-processes, so that the interaction between the main process and each sub-process can be reliably realized, which is conducive to enabling the sub-processes to realize the processing of sub-tasks based on the index file, and is beneficial to the main process to receive the retrieval results of the sub-processes according to the input queue.
  • the retrieval result of each sub-process in the at least one sub-process is obtained based on the acceleration device processing the corresponding retrieval sub-task;
  • the number of acceleration devices determined first can be determined as the number of sub-processes to be created, multiple acceleration devices can be fully utilized to realize parallel retrieval of retrieval subtasks, which improves the feature retrieval efficiency.
  • the retrieval result of each sub-process in the at least one sub-process is based on the features in the corresponding retrieval sub-task after quantifying the base library fragments in the corresponding retrieval sub-task into quantitative features Fragmentation and the result obtained by said quantitative feature;
  • determining the retrieval results of the features to be retrieved in the base library to be retrieved includes:
  • each of the main processes determine the similarity between the received retrieval results and the corresponding original features in the base library to be retrieved;
  • search results received by each main process select the search results whose quantity is the first preset value, and use the selected search results whose number is the first preset value as the output search results of the main process;
  • the retrieval results of the features to be retrieved in the bottom library to be retrieved are obtained.
  • the main process can be used to realize the function of feature rearrangement, that is, the main process can determine the similarity between the low-dimensional quantitative features and the original bottom database slice, and select the retrieval result according to the similarity; Therefore, compared with the scheme of using a search engine to realize feature rearrangement in the related art, the execution of the main process can be managed by the program, and when calculating the similarity between the original bottom database fragment and the retrieval results of each sub-process, there is no need to copy the original
  • the fragmentation of the base library reduces resource consumption; moreover, the main process can be used to realize task fragmentation of retrieval subtasks, which improves retrieval efficiency.
  • the retrieval results of the features to be retrieved in the base library to be retrieved are obtained according to the output retrieval results of each main process, including:
  • the retrieval results of each retrieval subtask in the bottom database to be retrieved are obtained by merging the retrieval results of each retrieval subtask in each bottom database fragment ;
  • the retrieval results of the features to be retrieved in the bottom database to be retrieved are obtained, and the retrieval results of the characteristics to be retrieved in the bottom database to be retrieved include: the retrieval results of each of the retrieval subtasks in the bottom database to be retrieved .
  • the embodiments of the present disclosure can accurately and comprehensively obtain the retrieval results of each retrieval subtask in the bottom database to be retrieved by merging the retrieval results of each retrieval subtask in each bottom database slice.
  • the retrieval results of each retrieval subtask in the bottom database to be retrieved are obtained by merging the retrieval results of each retrieval subtask in each bottom database fragment, include:
  • the embodiments of the present disclosure can filter the retrieval results in the merged results corresponding to each retrieval subtask according to the similarity with the corresponding original features, thereby maintaining a high level of retrieval data while reducing the amount of retrieval data. Feature retrieval accuracy.
  • the method before fragmenting the base library to be retrieved, the method further includes:
  • the features in the base library to be retrieved that do not belong to the first preset attribute range are filtered out;
  • the method further includes:
  • features that do not belong to the second preset attribute range among the features to be retrieved are filtered out.
  • the embodiment of the present disclosure can filter out the features in the base database to be retrieved according to the first preset attribute range, which is beneficial to obtain the features that meet the requirements; and can filter out the features in the features to be retrieved according to the second preset attribute range Filtering out is beneficial to get features that meet the requirements.
  • An embodiment of the present disclosure also proposes a feature retrieval device, which includes an acquisition part, a first processing part, a second processing part, a third processing part and a fourth processing part, wherein,
  • the obtaining part is used to obtain at least two feature fragments of the feature to be retrieved in the request to be retrieved;
  • the first processing part is used to determine a set of retrieval subtasks for searching in each of the bottom library fragments based on the at least two bottom library fragments and the at least two feature fragments after the bottom library fragments to be retrieved;
  • the second processing part is used to create at least one child process by utilizing each main process in at least one main process;
  • the third processing part is used to use each main process to divide the retrieval sub-task assigned to itself and distribute it to each sub-process, and receive the retrieval results of each sub-process;
  • the fourth processing part is used to determine the retrieval results of the features to be retrieved in the bottom library to be retrieved according to the retrieval results received by each of the main processes.
  • An embodiment of the present disclosure also provides an electronic device, including a processor and a memory for storing a computer program that can run on the processor; wherein, the processor is used to run the computer program to perform any one of the above features Retrieval method.
  • An embodiment of the present disclosure also provides a computer storage medium, on which a computer program is stored, and when the computer program is executed by a processor, any one of the above-mentioned feature retrieval methods is implemented.
  • An embodiment of the present disclosure also provides a computer program, including computer readable code, when the computer readable code is run in the electronic device, the processor in the electronic device executes any one of the above feature retrieval method.
  • each retrieval subtask can be assigned to a subprocess, and the corresponding retrieval subtask can be executed in the subprocess. Therefore, in the embodiment of the present disclosure, the execution of different retrieval subtasks can be realized by creating a main process and a subprocess. Parallel execution improves retrieval efficiency.
  • FIG. 1 is a flowchart of a feature retrieval method according to an embodiment of the present disclosure
  • FIG. 2 is a flow chart of another feature retrieval method according to an embodiment of the present disclosure.
  • FIG. 3 is a flowchart of creating multiple sub-processes in an embodiment of the present disclosure
  • FIG. 4 is a flowchart of creating an output queue, an input queue, and a communication pipeline in an embodiment of the present disclosure
  • Fig. 5 is a flow chart of obtaining the retrieval results of the features to be retrieved in the bottom library to be retrieved in the embodiment of the present disclosure
  • FIG. 6 is another flow chart for obtaining the retrieval results of the features to be retrieved in the bottom library to be retrieved in the embodiment of the present disclosure
  • FIG. 7 is a flow chart of obtaining the retrieval results of each retrieval subtask in the base database to be retrieved in an embodiment of the present disclosure
  • Fig. 8 is a flow chart of filtering out features of the base library to be retrieved in an embodiment of the present disclosure
  • FIG. 9 is a flow chart of filtering out features to be retrieved in an embodiment of the present disclosure.
  • FIG. 10 is a schematic structural diagram of a feature retrieval device according to an embodiment of the present disclosure.
  • FIG. 11 is a schematic diagram of a hardware structure of an electronic device according to an embodiment of the present disclosure.
  • clustering and real-name algorithms are used in archive service scenarios, and both clustering and real-name algorithms for person images involve a large number of feature retrieval operations for person images.
  • the clustering algorithm used by the archive service is a clustering algorithm based on connected graphs, which involves a large number of edge building operations.
  • the common clustering algorithm includes the k-means algorithm, which needs to continuously search for the nearest point of a certain feature point, and then update The class center is continuously iterated to form the final cluster; it can be seen that how to improve the efficiency of feature retrieval for images is a technical problem that needs to be solved urgently.
  • the term "comprises”, “comprises” or any other variation thereof is intended to cover a non-exclusive inclusion, so that a method or device comprising a series of elements not only includes the explicitly stated elements, but also include other elements not explicitly listed, or also include elements inherent in implementing the method or apparatus.
  • an element defined by the phrase “comprising a" does not exclude the presence of additional related elements (such as steps in the method or A unit in an apparatus, for example, a unit may be part of a circuit, part of a processor, part of a program or software, etc.).
  • the feature retrieval method provided by the embodiment of the present disclosure includes a series of steps, but the feature retrieval method provided by the embodiment of the present disclosure is not limited to the steps described, and similarly, the device provided by the embodiment of the present disclosure is not limited to include the specified steps.
  • the recorded part may also include the part that needs to be set for obtaining relevant information or processing based on the information.
  • Embodiments of the present disclosure may be applied to electronic devices such as terminals and servers.
  • a terminal can be a thin client, a thick client, a handheld or laptop device, a microprocessor-based system, a set-top box, programmable consumer electronics, a network personal computer, a small computer system, etc.
  • a server can be a server computer Systems Small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above systems, etc.
  • Electronic devices such as terminals and servers may include program modules for executing instructions.
  • program modules may include routines, programs, objects, components, logic, data structures, etc., that perform particular tasks.
  • the computer system/server can be practiced in distributed cloud computing environments where tasks are performed by remote processing devices that are linked through a communications network.
  • program modules may be located in both local and remote computing system storage media including storage devices.
  • Embodiments of the present disclosure propose a feature retrieval method, which can be applied to intelligent video analysis, smart city or other image analysis scenarios.
  • FIG. 1 is a flow chart of a feature retrieval method in an embodiment of the present disclosure. As shown in FIG. 1, the process may include:
  • Step 101 Obtain at least two feature segments of the feature to be retrieved in the request to be retrieved.
  • the feature to be retrieved is carried in the request to be retrieved, and the feature to be retrieved can be a feature of a person image or other features; when the request to be retrieved is received, the feature to be retrieved in the request to be retrieved can be divided into at least Two feature slices; in one implementation, the feature to be retrieved can be divided into at least two feature slices by using the spark computing framework.
  • the spark computing framework is a fast and general-purpose computing framework designed for large-scale data processing, and it is an open source general-purpose parallel framework like Hadoop MapReduce.
  • each feature slice in the feature 202 to be retrieved has a corresponding identity identification number (identity document, id).
  • query1 to querym represent the m feature slices obtained by division, and m is an integer greater than 1; after obtaining the m feature slices, the m feature slices can be divided into Store in Hadoop Distributed File System (Hadoop Distributed File System, HDFS) 203.
  • HDFS Hadoop Distributed File System
  • the embodiment of the present disclosure can start a retrieval task for the feature to be retrieved according to the request to be retrieved, so that the feature retrieval can be realized according to the actual requirement.
  • Step 102 Based on at least two base library segments and at least two feature segments after the base library fragments to be retrieved, determine a set of retrieval subtasks for retrieval in each base library segment.
  • the base database to be retrieved refers to a database storing original features
  • the original features may be features of person images or other features.
  • the spark computing framework can be used to split the bottom library to be searched into at least two bottom library fragments.
  • each feature in the base library to be retrieved 201 has a corresponding id.
  • the repartition (reparation) interface in the spark computing framework can be used to divide the bottom library to be retrieved
  • the features in 201 are split into n bottom library fragments of unit size.
  • db1 to dbn represent the n bottom library fragments obtained by splitting, and n is an integer greater than 1; After the library is sharded, the above n bottom library shards can be evenly distributed to each computing node, so as to facilitate subsequent processing on the computing node.
  • FIG. 2 A collection of retrieval subtasks for retrieval in each bottom library fragment is obtained.
  • (db1, query1) to (db1, querym) respectively represent the retrieval subtasks of the first feature slice to the mth feature slice of the feature to be retrieved on db1
  • (dbn, query1) to (dbn,querym) respectively represent the retrieval subtasks of retrieving the first feature slice to the mth feature slice of the feature to be retrieved on the dbn.
  • At least two bottom database fragments can be obtained by performing one bottom database fragmentation operation for the bottom database to be retrieved; each subsequent feature retrieval does not need to perform the bottom database fragmentation operation again, but
  • the retrieval subtask set may be determined according to at least two pre-obtained bottom library fragments.
  • the spark computing framework can be used to complete the task to be retrieved Task splitting, so that the embodiments of the present disclosure can support feature retrieval in scenarios with a large amount of data.
  • Step 103 Create at least one child process by using each main process in at least one main process.
  • each main process of the at least one main process may be assigned a retrieval subtask in the retrieval subtask set.
  • each main process 204 is executed in a resource set slot; in each main process 204, at least one sub-process can be created, and in the example of Fig. 2, sub-process 1 to sub-process q represent in the main process Create q child processes, where q is an integer greater than 1.
  • Step 104 Utilize each main process to divide the retrieval sub-tasks allocated to itself and distribute to each sub-process, and receive the retrieval results of each sub-process.
  • each subprocess can process a corresponding retrieval subtask through an acceleration device to obtain a corresponding retrieval result.
  • each sub-process will create a search engine (SearchEngine), and use the created search engine as an independent search unit, which can realize the processing of search subtasks based on the acceleration device.
  • SearchEngine search engine
  • Step 105 According to the retrieval results received by each main process, obtain the retrieval results of the features to be retrieved in the base library to be retrieved.
  • the retrieval results received by each main process may be combined to obtain a retrieval result 205 of the features to be retrieved in the base library to be retrieved.
  • the above step 101 to step 105 can be implemented based on the processor of the electronic device, and the above processor can be an application specific integrated circuit (Application Specific Integrated Circuit, ASIC), a digital signal processor (Digital Signal Processor, DSP), Digital Signal Processing Device (Digital Signal Processing Device, DSPD), Programmable Logic Device (Programmable Logic Device, PLD), Field Programmable Gate Array (Field Programmable Gate Array, FPGA), Central Processing Unit (Central Processing Unit, CPU), At least one of controller, microcontroller, microprocessor.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor
  • DSPD Digital Signal Processing Device
  • PLD Programmable Logic Device
  • Field Programmable Gate Array Field Programmable Gate Array
  • FPGA Field Programmable Gate Array
  • CPU Central Processing Unit
  • CPU Central Processing Unit
  • each retrieval subtask can be assigned to a subprocess, and the corresponding retrieval subtask can be executed in the subprocess. Therefore, in the embodiment of the present disclosure, the execution of different retrieval subtasks can be realized by creating a main process and a subprocess. Parallel execution improves retrieval efficiency.
  • each sub-process uses an acceleration device to implement the processing of the retrieval subtask
  • the embodiments of the present disclosure may use a multi-process call method to implement calls to multiple acceleration devices, thereby improving retrieval efficiency.
  • the spark computing framework and acceleration devices can be combined to provide a method that can simultaneously use multiple acceleration devices to achieve efficient retrieval under large amounts of data, and realize distributed computing technology and hardware-accelerated feature retrieval technology
  • the spark computing framework can be used to achieve parallel acceleration at the machine level, and at the same time split and schedule tasks to achieve parallel processing of tasks at the chip level, thereby improving retrieval efficiency and stability under large amounts of data .
  • the query rate per second (Query Per Second, QPS) of high-dimensional feature retrieval can reach 20,000 features/s.
  • the process of creating at least one subprocess by using each main process in at least one main process may include:
  • Step 301 In each main process, obtain the index file of the bottom library fragment corresponding to the assigned retrieval subtask, create an output queue for the main process to output data to at least one subprocess, and create an output queue for receiving at least one subprocess An input queue for the process's data, and a communication pipe between the main process and at least one child process.
  • the main process can use the search engine to train an index file for the bottom database fragment corresponding to the assigned retrieval subtask based on the bottom database fragment corresponding to the assigned retrieval subtask, and then, the The index file is serialized to obtain a serialized index file.
  • the main process can establish a pipeline for communicating with the child process based on the shared memory; referring to Figure 2, the main process can also establish a queue for communicating with the child process based on the shared memory; here, for communicating with the child process
  • the queues for process communication may include an output queue for outputting data to at least one sub-process, and an input queue for receiving data of at least one sub-process.
  • the inter-process queue belongs to an input queue that receives retrieval results of at least one child process.
  • Step 302 Create at least one child process according to the index file, output queue, input queue and communication pipeline.
  • each sub-process after at least one sub-process is created, each sub-process first performs initialization processing, and then creates a thread for reading a batch of feature slices from the input queue. At the same time, the thread can monitor and The communication channel of the main process, in order to terminate the sub-process when the termination condition of the sub-process is satisfied; after the sub-process obtains the retrieval result of the sub-process, it can send the retrieval result of the sub-process to the main process through the inter-process queue.
  • the sub-process termination condition may be that the sub-process completes the corresponding retrieval sub-task and has sent the retrieval result of the sub-process to the main process; in another implementation, the sub-process termination condition may be that the sub-process passes The above-mentioned communication channel receives a termination command.
  • the termination command may be a command issued to the main process according to a user's termination retrieval request.
  • the embodiment of the present disclosure can create an Sub-processes, so that the interaction between the main process and each sub-process can be reliably realized, which is conducive to enabling the sub-processes to realize the processing of sub-tasks based on the index file, and is beneficial to the main process to receive the retrieval results of the sub-processes according to the input queue.
  • the above-mentioned process of creating an output queue, an input queue, and a communication pipeline may include:
  • Step 401 Determine the predetermined number of acceleration devices as the number of sub-processes to be created.
  • Step 402 Create an output queue, an input queue, and a communication pipeline according to the number of child processes to be created.
  • accelerator cards 1 to q represent q acceleration devices.
  • the number of acceleration devices determined first can be determined as the number of sub-processes to be created, multiple acceleration devices can be fully utilized to realize parallel retrieval of retrieval subtasks, which improves the feature retrieval efficiency.
  • the base database to be retrieved and the features to be retrieved are split by using the spark computing framework, which can support the feature retrieval of a large amount of data in the embodiments of the present disclosure, but the performance of the feature retrieval is not substantially improved; for the feature retrieval task, the retrieval The efficiency improvement depends on the execution of each retrieval subtask.
  • a single acceleration device can be used, and an inverted index (Inverted File, IVF) algorithm and a feature quantization algorithm can be used to realize accelerated retrieval of high-dimensional features; illustratively, the feature quantization algorithm can be based on deep code (DeepCode, DC) software implementation.
  • the search engine can quantify the characteristics of the base database into lower-dimensional quantitative features through the training of the base database to be retrieved, and can use the IVF algorithm to accelerate retrieval.
  • the search engine can be bound to an acceleration device, and quantitative features and indexes can be stored in the memory of the acceleration device, using hardware computing acceleration to achieve fast retrieval.
  • search engine provides the ability to bind multiple acceleration devices, it does not provide the ability to distribute tasks.
  • the embodiments of the present disclosure may not use the feature rearrangement function provided by the search engine, but may use the main process to realize the feature rearrangement function.
  • each sub-process after each sub-process quantizes the bottom database fragmentation in the corresponding retrieval subtask into quantitative features, it can perform retrieval based on the feature fragmentation and quantitative features in the corresponding retrieval subtask, and obtain The retrieval result of this subprocess.
  • the process of obtaining the retrieval results of the features to be retrieved in the base library to be retrieved may include:
  • Step 501 In each main process, determine the similarity between the received retrieval results and the corresponding original features in the base database to be retrieved.
  • Step 502 From the retrieval results received by each main process, select the retrieval results whose quantity is the first preset value, and use the selected retrieval results as the output retrieval results of the main process.
  • the first preset value can be set according to actual needs; for example, among the retrieval results received by each main process, the top k ones with the highest similarity with the corresponding original features in the bottom database to be retrieved can be selected For the search results, the selected first k search results are used as the output search results of the main process, k is an integer greater than or equal to 1, and k is the above-mentioned first preset value.
  • Step 503 According to the output retrieval results of each main process, obtain the retrieval results of the features to be retrieved in the base library to be retrieved.
  • the main process can be used to realize the function of feature rearrangement, that is, the main process can determine the similarity between the low-dimensional quantitative features and the original bottom database slice, and select the retrieval result according to the similarity; Therefore, compared with the scheme of using a search engine to realize feature rearrangement in the related art, the execution of the main process can be managed by the program, and when calculating the similarity between the original bottom database fragment and the retrieval results of each sub-process, there is no need to copy the original
  • the fragmentation of the base library reduces resource consumption; moreover, the main process can be used to realize task fragmentation of retrieval subtasks, which improves retrieval efficiency.
  • the top k retrieval results that have the highest similarity with the corresponding original features in the bottom database to be retrieved can be selected, and the selected top k retrieval results can be used as the output retrieval results of the main process, thereby reducing the On the basis of the amount of retrieved data, a high feature retrieval accuracy is maintained.
  • the process of obtaining the retrieval results of the features to be retrieved in the base library to be retrieved may include:
  • Step 601 In the output retrieval results of each main process, the retrieval results of each retrieval subtask in the bottom database to be retrieved are obtained by merging the retrieval results of each retrieval subtask in each bottom database fragment.
  • Step 602 Obtain the retrieval results of the features to be retrieved in the bottom database to be retrieved, the retrieval results of the features to be retrieved in the bottom database to be retrieved include: the retrieval results of each retrieval subtask in the bottom database to be retrieved.
  • the embodiments of the present disclosure can accurately and comprehensively obtain the retrieval results of each retrieval subtask in the bottom database to be retrieved by merging the retrieval results of each retrieval subtask in each bottom database slice.
  • the flow of obtaining the retrieval results of each retrieval subtask in the bottom database to be retrieved can be obtained.
  • Step 701 Merge the retrieval results of each retrieval subtask in each bottom database fragment to obtain the merged result corresponding to each retrieval subtask.
  • Step 702 From the merged results corresponding to each retrieval subtask, select the retrieval results whose quantity is the second preset value, and use the selected retrieval results as the second preset value as each retrieval subtask at the bottom to be retrieved. Search results in the library.
  • the second preset value can be set according to actual needs; for example, among the merged results corresponding to each retrieval subtask, the top p retrievals with the highest similarity with the corresponding original features in the base database to be retrieved can be selected As a result, the selected first p search results are used as the search results of each search subtask in the bottom database to be searched; p is an integer greater than or equal to 1, and p is the above-mentioned second preset value.
  • step 702 can be performed by using the map-reduce operator in the spark computing framework.
  • the embodiments of the present disclosure can filter the retrieval results in the merged results corresponding to each retrieval subtask according to the similarity with the corresponding original features, thereby maintaining a high level of retrieval data while reducing the amount of retrieval data. Feature retrieval accuracy.
  • the feature retrieval method described above may further include:
  • Step 801 Determine the attributes of each feature in the bottom database to be retrieved.
  • Step 802 According to the attributes of each feature in the base database to be retrieved, the features in the base database to be retrieved that do not belong to the first preset attribute range are filtered out.
  • the attribute of each feature in the base library to be retrieved can be any attribute specified by the user; in one implementation, when the feature in the base library to be retrieved is the feature of the collected image,
  • the attribute of the feature in the bottom database to be retrieved may include at least one of the following: the collection time of the image corresponding to the feature, and the collection location of the image corresponding to the feature.
  • the first preset attribute range may be at least one of a time range and a space range.
  • the embodiment of the present disclosure can filter out the features in the base database to be retrieved according to the first preset attribute range, which is beneficial to obtain features that meet the requirements; Filtering is beneficial to obtain features that meet the requirements of the space-time range.
  • the above feature retrieval method may further include:
  • Step 901 Determine the attribute of each feature of the feature to be retrieved.
  • Step 902 According to the attributes of each feature in the features to be retrieved, filter out those features that do not belong to the second preset attribute range among the features to be retrieved.
  • the first preset attribute range and the second preset attribute range may be the same or different.
  • the attribute of each feature in the features to be retrieved can be any attribute specified by the user;
  • the attribute of the feature in the feature may include at least one of the following: the collection time of the image corresponding to the feature, and the collection location of the image corresponding to the feature.
  • the second preset attribute range may be at least one of a time range and a space range.
  • the embodiment of the present disclosure can filter out the features in the features to be retrieved according to the second preset attribute range, which is beneficial to obtain the features that meet the requirements; for example, the features in the features to be retrieved can be filtered according to the preset time and space range In addition, it is beneficial to obtain features that meet the requirements of the space-time range.
  • the embodiments of the present disclosure also provide a feature retrieval device.
  • Fig. 10 is a schematic structural diagram of a feature retrieval device according to an embodiment of the present disclosure. As shown in Fig. 11, the device may include: an acquisition part 1000, a first processing part 1001, a second processing part 1002, a third processing part 1003 and a fourth processing portion 1004, wherein
  • the obtaining part 1000 is configured to obtain at least two feature slices of the feature to be retrieved in the request to be retrieved;
  • the first processing part 1001 is configured to determine a set of retrieval subtasks for searching in each of the bottom database slices based on the at least two bottom database slices and the at least two feature slices after the bottom database fragmentation;
  • the second processing part 1002 is configured to use each main process in at least one main process to create at least one child process;
  • the third processing part 1003 is configured to use each main process to divide the retrieval sub-tasks assigned to itself and distribute them to each sub-process, and receive the retrieval results of each sub-process;
  • the fourth processing part 1004 is configured to determine the retrieval results of the features to be retrieved in the base library to be retrieved according to the retrieval results received by each main process.
  • the second processing part 1002 is configured to use at least one main process in each main process to create at least one sub-process, including:
  • each of the main processes obtain the index file of the base library fragment corresponding to the assigned retrieval subtask, create an output queue for the main process to output data to the at least one subprocess, and use An input queue for receiving data of the at least one sub-process, and a communication pipeline between the main process and the at least one sub-process;
  • the retrieval result of each sub-process in the at least one sub-process is obtained based on the acceleration device processing the corresponding retrieval sub-task;
  • the second processing part 1002 is configured to create an output queue for the main process to output data to the multiple sub-processes, an input queue for receiving data from the multiple sub-processes, and the main process to communicate with the multiple sub-processes.
  • the communication pipeline between multiple child processes is described, including:
  • the retrieval result of each sub-process in the at least one sub-process is based on the feature fragmentation and the result obtained for said quantified feature
  • the fourth processing part 1004 is configured to determine the retrieval results of the features to be retrieved in the base library to be retrieved according to the retrieval results received by each of the main processes, including:
  • each of the main processes determine the similarity between the received retrieval results and the corresponding original features in the base library to be retrieved;
  • search results received by each main process select the search results whose quantity is the first preset value, and use the selected search results whose number is the first preset value as the output search results of the main process;
  • the retrieval results of the features to be retrieved in the bottom library to be retrieved are obtained.
  • the fourth processing part 1004 is configured to obtain the retrieval results of the features to be retrieved in the base library to be retrieved according to the retrieval results output by each of the main processes, including:
  • the retrieval results of each retrieval subtask in the bottom database to be retrieved are obtained by merging the retrieval results of each retrieval subtask in each bottom database fragment ;
  • the retrieval results of the features to be retrieved in the bottom database to be retrieved are obtained, and the retrieval results of the characteristics to be retrieved in the bottom database to be retrieved include: the retrieval results of each of the retrieval subtasks in the bottom database to be retrieved .
  • the fourth processing part 1004 is configured to combine the retrieval results of each retrieval subtask in each bottom database fragment to obtain the Search results in , including:
  • the acquisition part 1000 is further configured to determine the attribute of each feature in the base database to be retrieved before segmenting the base database to be retrieved; Attributes of each feature, filtering out features in the base library to be retrieved that do not belong to the first preset attribute range;
  • the obtaining part 1000 is further configured to determine the attribute of each feature of the feature to be retrieved before obtaining at least two feature slices of the feature to be retrieved in the request to be retrieved; according to the attribute of each feature in the feature to be retrieved attribute, filtering out the features that do not belong to the second preset attribute range among the features to be retrieved.
  • the acquisition part 1000, the first processing part 1001, the second processing part 1002, the third processing part 1003 and the fourth processing part 1004 mentioned above can be realized based on the processor of the electronic device.
  • each functional part in this embodiment may be integrated into one processing unit, each unit may exist separately physically, or two or more units may be integrated into one unit.
  • the above-mentioned integrated units can be implemented in the form of hardware or in the form of software function modules.
  • the integrated unit is implemented in the form of a software function module and is not sold or used as an independent product, it can be stored in a computer-readable storage medium.
  • the technical solution of this embodiment is essentially or It is said that the part that contributes to the prior art or the whole or part of the technical solution can be embodied in the form of a software product, the computer software product is stored in a storage medium, and includes several instructions to make a computer device (which can It is a personal computer, a server, or a network device, etc.) or a processor (processor) that executes all or part of the steps of the method described in this embodiment.
  • the aforementioned storage medium includes: U disk, mobile hard disk, read only memory (Read Only Memory, ROM), random access memory (Random Access Memory, RAM), magnetic disk or optical disk and other various media that can store program codes.
  • the computer program instructions corresponding to a feature retrieval method in this embodiment can be stored on storage media such as optical discs, hard disks, and U disks.
  • storage media such as optical discs, hard disks, and U disks.
  • FIG. 11 shows an electronic device 110 provided by an embodiment of the present disclosure, which may include: a memory 111 , a processor 112 , and an A computer program running on ;
  • memory 111 for storing computer programs and data
  • the processor 112 is configured to execute the computer program stored in the memory, so as to implement any feature retrieval method in the foregoing embodiments.
  • the above-mentioned memory 111 can be a volatile memory (volatile memory), such as RAM; or a non-volatile memory (non-volatile memory), such as ROM, flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above-mentioned types of memory, and provide instructions and data to the processor 112.
  • volatile memory such as RAM
  • non-volatile memory such as ROM, flash memory (flash memory), hard disk (Hard Disk Drive, HDD) or solid-state drive (Solid-State Drive, SSD); or a combination of the above-mentioned types of memory, and provide instructions and data to the processor 112.
  • the aforementioned processor 112 may be at least one of ASIC, DSP, DSPD, PLD, FPGA, CPU, controller, microcontroller, and microprocessor.
  • An embodiment of the present disclosure also provides a computer program, including computer readable code, when the computer readable code is run in the electronic device, the processor in the electronic device executes any one of the above feature retrieval method.
  • the functions or modules included in the apparatus provided by the embodiments of the present disclosure may be used to execute the methods described in the above method embodiments, and for specific implementation, refer to the descriptions of the above method embodiments.
  • the methods of the above embodiments can be implemented by means of software plus a necessary general-purpose hardware platform, and of course also by hardware, but in many cases the former is better implementation.
  • the technical solution of the present disclosure can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, disk, CD) contains several instructions to enable a terminal (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) to execute the methods described in various embodiments of the present disclosure.
  • a terminal which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.

Abstract

本实施例公开了一种特征检索方法、装置、电子设备、计算机存储介质和程序,所述方法包括:获取待检索请求中待检索特征的至少两个特征分片;基于待检索底库分片后的至少两个底库分片和至少两个特征分片,确定在各底库分片进行检索的检索子任务集合;利用至少一个主进程中的每个主进程创建至少一个子进程;利用每个主进程将已分配给自身的检索子任务再次划分后分配至各个子进程,并接收各个子进程的检索结果;根据各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果。

Description

特征检索方法、装置、电子设备、计算机存储介质和程序
相关申请的交叉引用
本申请基于申请号为202111640138.4、申请日为2021年12月29日,名称为“特征检索方法、装置、电子设备及计算机存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此引入本申请作为参考。
技术领域
本公开涉及计算机视觉技术,涉及但不限于一种特征检索方法、装置、电子设备、计算机存储介质和计算机程序。
背景技术
目前,在档案服务的场景中,需要把前端传回的大量抓拍的图像数据通过聚类、实名化算法形成匿名档和实名档,而聚类与实名化算法都涉及到了大量的针对人员图像的特征检索操作,特征检索的效率直接影响了算法的性能。
发明内容
本公开实施例提供了特征检索理方法、装置、电子设备、计算机存储介质和计算机程序。
本公开实施例提供了一种特征检索方法,所述方法包括:
获取待检索请求中待检索特征的至少两个特征分片;
基于待检索底库分片后的至少两个底库分片和所述至少两个特征分片,确定在各所述底库分片进行检索的检索子任务集合;
利用至少一个主进程中的每个主进程创建至少一个子进程;
利用每个主进程将已分配给自身的检索子任务再次划分后分配至各个子进程,并接收各个子进程的检索结果;
根据所述各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果。
在本公开的一些实施例中,所述利用至少一个主进程中的每个主进程创建至少一个子进程,包括:
在所述每个主进程中,获取针对所述已分配的检索子任务对应的底库分片的索引文件,创建所述主进程用于向所述至少一个子进程输出数据的输出队列、用于接收所述至少一个子进程的数据的输入队列、以及所述主进程与所述至少一个子进程之间的通信管道;
根据所述索引文件、所述输出队列、所述输入队列和所述通信管道,创建所述至少一个子进程。
可以看出,本公开实施例可以根据用于向多个子进程输出数据的输出队列、用于接收多个子进程的数据的输入队列、以及主进程与所述多个子进程之间的通信管道,创建子进程,从而能够可靠地实现主进程与各个子进程的交互,有利于使子进程基于索引文件实现检索子任务的处理,并有利于主进程根据输入队列接收子进程的检索结果。
在本公开的一些实施例中,所述至少一个子进程中每个子进程的检索结果是基于加速设备处理对应的检索子任务得到的;
所述创建所述主进程用于向所述多个子进程输出数据的输出队列、用于接收所述多个子进程的数据的输入队列、以及所述主进程与所述多个子进程之间的通信管道,包括:
将预先确定的加速设备的数量确定为待创建的子进程的数量;
根据所述待创建的子进程的数量,创建所述输出队列、所述输入队列和所述通信管道。
可以看出,本公开实施例中,由于可以将先确定的加速设备的数量确定为待创建的子进程的数量,因而,可以充分利用多个加速设备实现检索子任务的并行检索,提高了特征检索效率。
在本公开的一些实施例中,所述至少一个子进程中每个子进程的检索结果是将对应的检索子任务中的底库分片量化为量化特征后,基于对应的检索子任务中的特征分片和所述量化特征得到的结果;
所述根据所述各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果,包括:
在所述每个主进程中,确定接收到的检索结果与所述待检索底库中相应的原始特征的相似度;
在所述每个主进程接收到的检索结果中,选取数量为第一预设值的检索结果,将选取的所述数量为第一预设值的检索结果作为主进程的输出检索结果;
根据所述各个主进程的输出检索结果,得出所述待检索特征在所述待检索底库中的检索结果。
可以看出,本公开实施例中可以采用主进程实现特征重排的功能,即,主进程可以确定低维的量化特征与原始的底库分片的相似度,并根据相似度选取检索结果;因此,与相关技术中使用检索引擎实现特征重排的方案相比,主进程的执行可以通过程序管理,在计算原始的底库分片与各个子进程的检索结果的相似度时,无需复制原始的底库分片,降低了资源消耗;并且,可以利用主进程实现检索子任务的任务分片,提高了检索效率。
在本公开的一些实施例中,所述根据所述各个主进程的输出检索结果,得出待检索特征在待检索底库中的检索结果,包括:
在所述各个主进程的输出检索结果中,通过将每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务在所述待检索底库中的检索结果;
得出所述待检索特征在待检索底库中的检索结果,所述待检索特征在待检索底库中的检索结果包括:所述各个检索子任务在所述待检索底库中的检索结果。
可以看出,本公开实施例通过将每个检索子任务在各个底库分片的检索结果进行合并,可以准确全面地得到每个检索子任务在待检索底库中的检索结果。
在本公开的一些实施例中,所述通过将每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务在所述待检索底库中的检索结果,包括:
将所述每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务对应的合并结果;
在所述每个检索子任务对应的合并结果中,选取数量为第二预设值的检索结果,将选取的所述数量为第二预设值的检索结果作为所述每个检索子任务在所述待检索底库中的检索结果。
可以看出,本公开实施例可以根据与相应的原始特征的相似度,在每个检索子任务对应的合并结果进行检索结果的筛选,从而,可以在减少检索数据量的基础上保持较高的特征检索精度。
在本公开的一些实施例中,在对所述待检索底库进行分片之前,所述方法还包括:
确定所述待检索底库中每个特征的属性;
按照所述待检索底库中每个特征的属性,将所述待检索底库中不属于第一预设属性范围的特征滤除;
在获取待检索请求中待检索特征的至少两个特征分片之前,所述方法还包括:
确定所述待检索特征每个特征的属性;
按照所述待检索特征中每个特征的属性,将所述待检索特征中不属于第二预设属性范围的特征滤除。
可以看出,本公开实施例可以按照第一预设属性范围对待检索底库中的特征进行滤除,有利于得到符合要求的特征;并且可以按照第二预设属性范围对待检索特征中的特征进行滤除,有利于得到符合要求的特征。
本公开实施例还提出了一种特征检索装置,所述装置包括获取部分、第一处理部分、第二处理部分、第三处理部分和第四处理部分,其中,
获取部分,用于获取待检索请求中待检索特征的至少两个特征分片;
第一处理部分,用于基于待检索底库分片后的至少两个底库分片和所述至少两个特征分片,确定在各所述底库分片进行检索的检索子任务集合;
第二处理部分,用于利用至少一个主进程中的每个主进程创建至少一个子进程;
第三处理部分,用于利用每个主进程将已分配给自身的检索子任务再次划分后分配至各个子进程,并接收各个子进程的检索结果;
第四处理部分,用于根据所述各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果。
本公开实施例还提供了一种电子设备,包括处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,所述处理器用于运行所述计算机程序以执行上述任意一种特征检索方法。
本公开实施例还提供了一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现上述任意一种特征检索方法。
本公开实施例还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现上述任意一种特征检索方法。
可以看出,本公开实施例可以将各个检索子任务分配至子进程中,在子进程中执行相应的检索子任务,从而,本公开实施例可以通过创建主进程和子进程实现不同检索子任务的并行执行,提升了检索效率。
应当理解的是,以上的一般描述和后文的细节描述仅是示例性和解释性的,而非限制本公开。
附图说明
此处的附图被并入说明书中并构成本说明书的一部分,这些附图示出了符合本公开的实施例,并与说明书一起用于说明本公开的技术方案。
图1为本公开实施例的一种特征检索方法的流程图;
图2为本公开实施例的另一种特征检索方法的流程图;
图3为本公开实施例中创建多个子进程的流程图;
图4为本公开实施例中创建输出队列、输入队列、以及通信管道的流程图;
图5为本公开实施例中得出待检索特征在待检索底库中的检索结果的一个流程图;
图6为本公开实施例中得出待检索特征在待检索底库中的检索结果的另一个流程图;
图7为本公开实施例中得出每个检索子任务在所述待检索底库中的检索结果的一个流程图;
图8为本公开实施例中对待检索底库的特征进行滤除的流程图;
图9为本公开实施例中对待检索特征进行滤除的流程图;
图10为本公开实施例的特征检索装置的结构示意图;
图11为本公开实施例的电子设备的硬件结构示意图。
具体实施方式
在相关技术中,在档案服务的场景中,会使用聚类和实名化算法,针对人员图像的聚类和实名化算法都涉及到了大量的针对人员图像的特征检索操作。档案服务使用的聚类算法是基于连通图的一种聚类算法,涉及到了大量的建边操作,常见的聚类算法入k-means算法,需要不断查找某一特征点的最近点,然后更新类中心,不断迭代形成最终的类簇;可以看出,如何提高针对图像的特征检索效率,是亟待解决的技术问题。
针对上述技术问题,提出本公开实施例的技术方案。
以下结合附图及实施例,对本公开进行进一步详细说明。应当理解,此处所提供的实施例仅仅用以解释本公开,并不用于限定本公开。另外,以下所提供的实施例是用于实施本公开的部分实施例,而非提供实施本公开的全部实施例,在不冲突的情况下,本公开实施例记载的技术方案可以任意组合的方式实施。
需要说明的是,在本公开实施例中,术语“包括”、“包含”或者其任何其他变体意在涵盖非排他性的包含,从而使得包括一系列要素的方法或者装置不仅包括所明确记载的要素,而且还包括没有明确列出的其他要素,或者是还包括为实施方法或者装置所固有的要素。在没有更多限制的情况下,由语句“包括一个......”限定的要素,并不排除在包括该要素的方法或者装置中还存在另外的相关要素(例如方法中的步骤或者装置中的单元,例如的单元可以是部分电路、部分处理器、部分程序或软件等等)。
例如,本公开实施例提供的特征检索方法包含了一系列的步骤,但是本公开实施例提供的特征检索方法不限于所记载的步骤,同样地,本公开实施例提供的装置不限于包括所明确记载的部分,还可以包括为获取相关信息、或基于信息进行处理时所需要设置的部分。
本公开实施例可以应用于终端、服务器等电子设备中。这里,终端可以是瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统,等等,服务器可以是服务器计算机系统小型计算机系统﹑大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
终端、服务器等电子设备可以包括用于执行指令的程序模块。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
本公开实施例提出了一种特征检索方法,可以应用于智能视频分析、智慧城市或其它图像分析场景。
图1为本公开实施例的特征检索方法的一个流程图,如图1所示,该流程可以包括:
步骤101:获取待检索请求中待检索特征的至少两个特征分片。
本公开实施例中,待检索请求中携带有待检索特征,待检索特征可以是人员图像的特征或其它特征;在接收到待检索请求的情况下,可以将待检索请求中待检索特征划分为至少两个特征分片;在一种实现方式中,可以利用spark计算框架将待检索特征划分至少两个特征分片。
这里,spark计算框架是专为大规模数据处理而设计的快速通用的计算框架,是一种所开源的类Hadoop MapReduce的通用并行框架。
在一些实施例中,参照图2,待检索特征202中的每个特征分片都具有相应的身份标识号(identity document,id),在实际应用中,可以spark计算框架对待检索特征202划分待检索特征202的至少两个分片,图2中,query1至querym表示划分得到的m个特征分片,m为大于1的整数;在得到m个特征分片后,可以将m个特征分片存储至Hadoop分布式文件系统(Hadoop Distributed File System,HDFS)203中。
可以看出,本公开实施例可以根据待检索请求,启动针对待检索特征的检索任务,从而,可以按照实际需求实现特征检索。
步骤102:基于待检索底库分片后的至少两个底库分片和至少两个特征分片,确定在各底库分片进行检索的检索子任务集合。
本公开实施例中,待检索底库表示存储原始特征的数据库,原始特征可以是人员图像的特征或其它特征。
在一些实施例中,可以利用spark计算框架将待检索底库拆分为至少两个底库分片。
在一些实施例中,参照图2,待检索底库201中的每个特征都具有相应的id,在实际应用中,可以利用spark计算框架中的重分区(reparation)接口,将待检索底库201中的特征拆分为单位大小的n个底库分片,图2中,db1至dbn表示拆分得到的n个底库分片,n为大于1的整数;在拆分得到n个底库分片后,可以将上述n个底库分片均匀分别到各个计算节点,以便于后续在计算节点进行处理。
在一些实施例中,在得到至少两个底库分片和待检索特征的至少两个特征分片后,可以通过计算至少两个底库分片和至少两个特征分片的笛卡尔积,得出在各个底库分片进行检索的检索子任务集合。图2中,(db1,query1)至(db1,querym)分别表示在db1上分别检索待检索特征的第1个特征分片至第m个特征分片的检索子任务,(dbn,query1)至(dbn,querym)分别表示在dbn上分别检索待检索特征的第1个特征分片至第m个特征分片的检索子任务。
需要说明的是,针对待检索底库只需要进行一次底库分片操作,便可以得到至少两个底库分片;后续每次进行特征检索时,无需再次执行底库分片操作,而是可以根据预先得到的至少两个底库分片确定检索子任务集合。
可以理解地,在大数据量的场景下,待检索底库与待检索特征中特征数量都很大,需要进行合理拆分,在本公开实施例中,可以利用spark计算框架完成了待检索任务的任务拆分,从而使得本公开实施例可以支持在大数据量场景中进行特征检索。
步骤103:利用至少一个主进程中的每个主进程创建至少一个子进程。
本公开实施例中,在创建至少一个主进程后,可以向至少一个主进程的每个主进程分配检索子任务集合中的检索子任务。
参照图2,每个主进程204在一个资源集合slot中执行;在每个主进程204中,可以创建至少一个子进程,在图2的示例中,子进程1至子进程q表示主进程中创建的q个子进程,q为大于1的整数。
步骤104:利用每个主进程将已分配给自身的检索子任务再次划分后分配至各个子进程,并接收各个子进程的检索结果。
在一些实施例中,参照图2,每个子进程可以通过加速设备处理相应的检索子任务,得到对应的检索结果。
在实际应用中,每个子进程会创建一个检索引擎(SearchEngine),将创建的检索引擎作为一个独立的检索单元,该检索单元可以基于加速设备实现检索子任务的处理。
步骤105:根据各个主进程接收到的检索结果,得出待检索特征在待检索底库中的检索结果。
在一些实施例中,参照图2,可以将各个主进程接收到的检索结果进行合并,得到待检索特征在待检索底库中的检索结果205。
在实际应用中,上述步骤101至步骤105可以基于电子设备的处理器实现,上述处理器可以为特定用途集成电路(Application Specific Integrated Circuit,ASIC)、数字信号处理器(Digital Signal Processor,DSP)、数字信号处理装置(Digital Signal Processing Device,DSPD)、可编程逻辑装置(Programmable Logic Device,PLD)、现场可编程门阵列(Field Programmable Gate Array,FPGA)、中央处理器(Central Processing Unit, CPU)、控制器、微控制器、微处理器中的至少一种。
可以看出,本公开实施例可以将各个检索子任务分配至子进程中,在子进程中执行相应的检索子任务,从而,本公开实施例可以通过创建主进程和子进程实现不同检索子任务的并行执行,提升了检索效率。
在一些实施例中,在每个子进程采用加速设备实现检索子任务的处理的情况下,本公开实施例可以采用多进程调用的方式实现对多个加速设备的调用,提升了检索效率。
在一些实施例中,可以结合spark计算框架和加速设备,提供了一种能在大数据量下同时使用多个加速设备实现高效检索的方法,实现了分布式计算技术与硬件加速的特征检索技术的融合;在实际实施时,可以利用spark计算框架实现机器级别的并行加速,同时对任务进行拆分调度,实现芯片级别的任务并行处理,从而实现了大数据量下检索效率与稳定性的提升。示例性地,在千万级底库检索场景下,高维特征的检索的每秒查询率(Query Per Second,QPS)能达到20000特征/s。
示例性地,在相关技术中,对于人脸聚类算法,由于内存限制,一次最多只能实现两千万特征的聚类,这极大限制了聚类算法的能力,可能会导致出现一人多类的情况;通过采用本公开实施例的技术方案,可以在待检索底库中存储大量的待聚类特征,可以大幅提高聚类算法输入数据量的上限,能在更大的范围内进行聚类,便于更准确高效地提取人员档案。
在一些实施例中,参照图3,利用至少一个主进程中的每个主进程创建至少一个子进程的流程,可以包括:
步骤301:在每个主进程中,获取针对已分配的检索子任务对应的底库分片的索引文件,创建主进程用于向至少一个子进程输出数据的输出队列、用于接收至少一个子进程的数据的输入队列、以及主进程与至少一个子进程之间的通信管道。
在一些实施例中,主进程可以基于已分配的检索子任务对应的底库分片,利用检索引擎训练一个针对已分配的检索子任务对应的底库分片的索引文件,然后,可以将该索引文件进行序列化处理,得到序列化处理后的索引文件。
在一种实现方式中,主进程可以基于共享内存建立用于与子进程通信的管道;参照图2,主进程还可以基于共享内存建立用于与子进程通信的队列;这里,用于与子进程通信的队列可以包括用于向至少一个子进程输出数据的输出队列、用于接收至少一个子进程的数据的输入队列。示例性地,参照图2,进程间队列属于接收至少一个子进程的检索结果的输入队列。
步骤302:根据索引文件、输出队列、输入队列和通信管道,创建至少一个子进程。
本公开实施例中,在创建至少一个子进程后,每个子进程首先进行初始化处理,然后会创建一个线程用于从输入队列中读取一个批次的特征分片,同时,该线程可以监控与主进程的通信管道,以便在满足子进程终止条件的情况下终止子进程;子进程在得到子进程的检索结果后,可以将子进程的检索结果通过进程间队列发送至主进程。
在一种实现方式中,子进程终止条件可以是子进程完成相应的检索子任务并已向主进程发送子进程的检索结果;在另一种实现方式中,子进程终止条件可以是子进程通过上述通信管道接收到终止命令,这里,终止命令可以是根据用户的终止检索请求向主进程下发的命令。
可以看出,本公开实施例可以根据用于向多个子进程输出数据的输出队列、用于接收多个子进程的数据的输入队列、以及主进程与所述多个子进程之间的通信管道,创建子进程,从而能够可靠地实现主进程与各个子进程的交互,有利于使子进程基于索引文件实现检索子任务的处理,并有利于主进程根据输入队列接收子进程的检索结果。
在一些实施例中,参照图4,上述创建输出队列、输入队列、以及通信管道的流程,可以包括:
步骤401:将预先确定的加速设备的数量确定为待创建的子进程的数量。
步骤402:根据待创建的子进程的数量,创建输出队列、输入队列和通信管道。
参照图2,预先确定的加速设备的数量为q,则待创建的子进程的数量为q,图2中,加速卡1至加速卡q表示q个加速设备。
可以看出,本公开实施例中,由于可以将先确定的加速设备的数量确定为待创建的子进程的数量,因而,可以充分利用多个加速设备实现检索子任务的并行检索,提高了特征检索效率。
在一些实施例中,利用spark计算框架对待检索底库和待检索特征进行拆分,可以本公开实施例支持大数据量的特征检索,但是特征检索的性能没有实质提升;对于特征检索任务,检索效率的提升依赖于每个检索子任务的执行。在相关技术中,可以使用单个加速设备,并采用倒排索引(Inverted File,IVF)算法和特征量化算法实现对高维特征的加速检索;示例性地,特征量化算法可以基于深码(DeepCode,DC)软件实现。在软件层面,检索引擎可以通过对待检索底库的训练,将底库特征量化为更低维度的量化特征,同时可以使用IVF算法实现检索的加速。在硬件层面,检索引擎可以绑定加速设备,量化特征以及索引可以存放至加速设备的内存中,利用硬件的计算加速,实现快速的检索。
可以看出,由于将高维的底库特征量化为更低维度的量化特征,会导致检索精度的一定损失;因此,在检索引擎中提供了特征重排的调用,用来确定低维的量化特征与原始特征的相似度,但是,为了实现特征重排的调用,需要保存原始的底库特征。而存储原始底库特征的内存是不受调用程序管理的,创建多个检索引擎对象会导致内存中相同特征的复制。
另外,相关技术中,检索引擎虽然提供了绑定多个加速设备的能力,但是没有提供任务分发的能力。
针对上述问题,本公开实施例可以不使用检索引擎提供的特征重排功能,而是可以利用主进程实现特征重排的功能。
在本公开的一些实施例中,每个子进程在将对应的检索子任务中的底库分片量化为量化特征后,可以基于对应的检索子任务中的特征分片和量化特征进行检索,得到该子进程的检索结果。
相应地,参照图5,根据各个主进程接收到的检索结果,得出待检索特征在待检索底库中的检索结果的流程,可以包括:
步骤501:在每个主进程中,确定接收到的检索结果与待检索底库中相应的原始特征的相似度。
步骤502:在每个主进程接收到的检索结果中,选取数量为第一预设值的检索结果,将选取的数量为第一预设值的检索结果作为主进程的输出检索结果。
这里,第一预设值可以根据实际需求设置;示例性地,可以在每个主进程在接收到的检索结果中,选取与待检索底库中相应的原始特征的相似度最高的前k个检索结果,将选取的前k个检索结果作为主进程的输出检索结果,k为大于或等于1的整数,且k为上述第一预设值。
步骤503:根据各个主进程的输出检索结果,得出待检索特征在待检索底库中的检索结果。
可以看出,本公开实施例中可以采用主进程实现特征重排的功能,即,主进程可以确定低维的量化特征与原始的底库分片的相似度,并根据相似度选取检索结果;因此,与相关技术中使用检索引擎实现特征重排的方案相比,主进程的执行可以通过程序管理,在计算原始的底库分片与各个子进程的检索结果的相似度时,无需复制原始的底库分片,降低了资源消耗;并且,可以利用主进程实现检索子任务的任务分片,提高了检索效率。
在一些实施例中,可以选取与待检索底库中相应的原始特征的相似度最高的前k个 检索结果,将选取的前k个检索结果作为主进程的输出检索结果,从而,可以在减少检索数据量的基础上保持较高的特征检索精度。
在一些实施例中,参照图6,根据各个主进程的输出检索结果,得出待检索特征在待检索底库中的检索结果的流程,可以包括:
步骤601:在各个主进程的输出检索结果中,通过将每个检索子任务在各个底库分片的检索结果进行合并,得到每个检索子任务在待检索底库中的检索结果。
步骤602:得出待检索特征在待检索底库中的检索结果,待检索特征在待检索底库中的检索结果包括:各个检索子任务在待检索底库中的检索结果。
可以看出,本公开实施例通过将每个检索子任务在各个底库分片的检索结果进行合并,可以准确全面地得到每个检索子任务在待检索底库中的检索结果。
在一些实施例中,参照图7,通过将每个检索子任务在各个底库分片的检索结果进行合并,得到每个检索子任务在所述待检索底库中的检索结果的流程,可以包括:
步骤701:将每个检索子任务在各个底库分片的检索结果进行合并,得到每个检索子任务对应的合并结果。
步骤702:在每个检索子任务对应的合并结果中,选取数量为第二预设值的检索结果,将选取的数量为第二预设值的检索结果作为每个检索子任务在待检索底库中的检索结果。
这里,第二预设值可以根据实际需求设置;示例性地,可以在每个检索子任务对应的合并结果中,选取与待检索底库中相应的原始特征的相似度最高的前p个检索结果,将选取的前p个检索结果作为每个检索子任务在待检索底库中的检索结果;p为大于或等于1的整数,且p为上述第二预设值。
在一些实施例中,可以利用spark计算框架中的map-reduce算子执行步骤702。
可以看出,本公开实施例可以根据与相应的原始特征的相似度,在每个检索子任务对应的合并结果进行检索结果的筛选,从而,可以在减少检索数据量的基础上保持较高的特征检索精度。
在一些实施例中,在对待检索底库进行分片之前,参照图8,上述特征检索方法还可以包括:
步骤801:确定待检索底库中每个特征的属性。
步骤802:按照待检索底库中每个特征的属性,将待检索底库中不属于第一预设属性范围的特征滤除。
示例性地,待检索底库中每个特征的属性可以是用户指定的任意一种属性;在一种实现方式中,在待检索底库中的特征为采集到的图像的特征的情况下,待检索底库中的特征的属性可以包括以下至少一项:特征对应的图像的采集时间、特征对应的图像的采集地点。相应地,第一预设属性范围可以是时间范围和空间范围中的至少一项。
可以看出,本公开实施例可以按照第一预设属性范围对待检索底库中的特征进行滤除,有利于得到符合要求的特征;例如,可以按照预设时空范围对待检索底库中的特征进行滤除,有利于得到符合时空范围要求的特征。
在一些实施例中,在获取待检索请求中待检索特征的至少两个特征分片之前,参照图9,上述特征检索方法还可以包括:
步骤901:确定待检索特征每个特征的属性。
步骤902:按照待检索特征中每个特征的属性,将待检索特征中不属于第二预设属性范围的特征滤除。
本公开实施例中,第一预设属性范围与第二预设属性范围可以相同,也可以不相同。
示例性地,待检索特征中每个特征的属性可以是用户指定的任意一种属性;在一种实现方式中,在待检索特征中的特征为采集到的图像的特征的情况下,待检索特征中的特征的属性可以包括以下至少一项:特征对应的图像的采集时间、特征对应的图像的采 集地点。相应地,第二预设属性范围可以是时间范围和空间范围中的至少一项。
可以看出,本公开实施例可以按照第二预设属性范围对待检索特征中的特征进行滤除,有利于得到符合要求的特征;例如,可以按照预设时空范围对待检索特征中的特征进行滤除,有利于得到符合时空范围要求的特征。
在前述实施例提出的特征检索方法的基础上,本公开实施例还提出了一种特征检索装置。
图10为本公开实施例的特征检索装置的结构示意图,如图11所示,该装置可以包括:获取部分1000、第一处理部分1001、第二处理部分1002、第三处理部分1003和第四处理部分1004,其中
获取部分1000,配置为获取待检索请求中待检索特征的至少两个特征分片;
第一处理部分1001,配置为基于底库分片后的至少两个底库分片和所述至少两个特征分片,确定在各所述底库分片进行检索的检索子任务集合;
第二处理部分1002,配置为利用至少一个主进程中的每个主进程创建至少一个子进程;
第三处理部分1003,配置为利用每个主进程将已分配给自身的检索子任务再次划分后分配至各个子进程,并接收各个子进程的检索结果;
第四处理部分1004,配置为根据所述各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果。
在一些实施例中,所述第二处理部分1002,配置为利用至少一个主进程中的每个主进程创建至少一个子进程,包括:
在所述每个主进程中,获取针对所述已分配的检索子任务对应的底库分片的索引文件,创建所述主进程用于向所述至少一个子进程输出数据的输出队列、用于接收所述至少一个子进程的数据的输入队列、以及所述主进程与所述至少一个子进程之间的通信管道;
根据所述索引文件、所述输出队列、所述输入队列和所述通信管道,创建所述至少一个子进程。
在一些实施例中,所述至少一个子进程中每个子进程的检索结果是基于加速设备处理对应的检索子任务得到的;
所述第二处理部分1002,配置为创建所述主进程用于向所述多个子进程输出数据的输出队列、用于接收所述多个子进程的数据的输入队列、以及所述主进程与所述多个子进程之间的通信管道,包括:
将预先确定的加速设备的数量确定为待创建的子进程的数量;
根据所述待创建的子进程的数量,创建所述输出队列、所述输入队列和所述通信管道。
在一些实施例中,所述至少一个子进程中每个子进程的检索结果是将对应的检索子任务中的底库分片量化为量化特征后,基于对应的检索子任务中的特征分片和所述量化特征得到的结果;
所述第四处理部分1004,配置为根据所述各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果,包括:
在所述每个主进程中,确定接收到的检索结果与所述待检索底库中相应的原始特征的相似度;
在所述每个主进程接收到的检索结果中,选取数量为第一预设值的检索结果,将选取的所述数量为第一预设值的检索结果作为主进程的输出检索结果;
根据所述各个主进程的输出检索结果,得出所述待检索特征在所述待检索底库中的检索结果。
在一些实施例中,所述第四处理部分1004,配置为根据所述各个主进程的输出检索结果,得出待检索特征在待检索底库中的检索结果,包括:
在所述各个主进程的输出检索结果中,通过将每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务在所述待检索底库中的检索结果;
得出所述待检索特征在待检索底库中的检索结果,所述待检索特征在待检索底库中的检索结果包括:所述各个检索子任务在所述待检索底库中的检索结果。
在一些实施例中,所述第四处理部分1004,配置为通过将每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务在所述待检索底库中的检索结果,包括:
将所述每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务对应的合并结果;
在所述每个检索子任务对应的合并结果中,选取数量为第二预设值的检索结果,将选取的所述数量为第二预设值的检索结果作为所述每个检索子任务在所述待检索底库中的检索结果。
在一些实施例中,所述获取部分1000,还配置为在对所述待检索底库进行分片之前,确定所述待检索底库中每个特征的属性;按照所述待检索底库中每个特征的属性,将所述待检索底库中不属于第一预设属性范围的特征滤除;
所述获取部分1000,还配置为在获取待检索请求中待检索特征的至少两个特征分片之前,确定所述待检索特征每个特征的属性;按照所述待检索特征中每个特征的属性,将所述待检索特征中不属于第二预设属性范围的特征滤除。
上述获取部分1000、第一处理部分1001、第二处理部分1002、第三处理部分1003和第四处理部分1004可以基于电子设备的处理器实现。
另外,在本实施例中的各功能部分可以集成在一个处理单元中,也可以是各个单元单独物理存在,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。
所述集成的单元如果以软件功能模块的形式实现并非作为独立的产品进行销售或使用时,可以存储在一个计算机可读取存储介质中,基于这样的理解,本实施例的技术方案本质上或者说对现有技术做出贡献的部分或者该技术方案的全部或部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)或processor(处理器)执行本实施例所述方法的全部或部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read Only Memory,ROM)、随机存取存储器(Random Access Memory,RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
具体来讲,本实施例中的一种特征检索方法对应的计算机程序指令可以被存储在光盘,硬盘,U盘等存储介质上,当存储介质中的与一种特征检索方法对应的计算机程序指令被一电子设备读取或被执行时,实现前述实施例的任意一种特征检索方法。
基于前述实施例相同的技术构思,参见图11,其示出了本公开实施例提供的一种电子设备110,可以包括:存储器111、处理器112及存储在存储器111上并可在处理器112上运行的计算机程序;其中,
存储器111,用于存储计算机程序和数据;
处理器112,用于执行所述存储器中存储的计算机程序,以实现前述实施例的任意一种特征检索方法。
在实际应用中,上述存储器111可以是易失性存储器(volatile memory),例如RAM;或者非易失性存储器(non-volatile memory),例如ROM,快闪存储器(flash memory),硬盘(Hard Disk Drive,HDD)或固态硬盘(Solid-State Drive,SSD);或者上述种类的 存储器的组合,并向处理器112提供指令和数据。
上述处理器112可以为ASIC、DSP、DSPD、PLD、FPGA、CPU、控制器、微控制器、微处理器中的至少一种。
本公开实施例还提供了一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现上述任意一种特征检索方法。
在一些实施例中,本公开实施例提供的装置具有的功能或包含的模块可以用于执行上文方法实施例描述的方法,其具体实现可以参照上文方法实施例的描述。
上文对各个实施例的描述倾向于强调各个实施例之间的不同之处,其相同或相似之处可以互相参考。
本公开所提供的各方法实施例中所揭露的方法,在不冲突的情况下可以任意组合,得到新的方法实施例。
本公开所提供的各产品实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的产品实施例。
本公开所提供的各方法或设备实施例中所揭露的特征,在不冲突的情况下可以任意组合,得到新的方法实施例或设备实施例。
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本公开的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本公开各个实施例所述的方法。
上面结合附图对本公开的实施例进行了描述,但是本公开并不局限于上述的具体实施方式,上述的具体实施方式仅仅是示意性的,而不是限制性的,本领域的普通技术人员在本公开的启示下,在不脱离本公开宗旨和权利要求所保护的范围情况下,还可做出很多形式,这些均属于本公开的保护之内。

Claims (17)

  1. 一种特征检索方法,应用于电子设备中,所述方法包括:
    获取待检索请求中待检索特征的至少两个特征分片;
    基于待检索底库分片后的至少两个底库分片和所述至少两个特征分片,确定在各所述底库分片进行检索的检索子任务集合;
    利用至少一个主进程中的每个主进程创建至少一个子进程;
    利用每个主进程将已分配给自身的检索子任务再次划分后分配至各个子进程,并接收各个子进程的检索结果;
    根据所述各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果。
  2. 根据权利要求1所述的方法,其中,所述利用至少一个主进程中的每个主进程创建至少一个子进程,包括:
    在所述每个主进程中,获取针对所述已分配的检索子任务对应的底库分片的索引文件,创建所述主进程用于向所述至少一个子进程输出数据的输出队列、用于接收所述至少一个子进程的数据的输入队列、以及所述主进程与所述至少一个子进程之间的通信管道;
    根据所述索引文件、所述输出队列、所述输入队列和所述通信管道,创建所述至少一个子进程。
  3. 根据权利要求2所述的方法,其中,所述至少一个子进程中每个子进程的检索结果是基于加速设备处理对应的检索子任务得到的;
    所述创建所述主进程用于向所述多个子进程输出数据的输出队列、用于接收所述多个子进程的数据的输入队列、以及所述主进程与所述多个子进程之间的通信管道,包括:
    将预先确定的加速设备的数量确定为待创建的子进程的数量;
    根据所述待创建的子进程的数量,创建所述输出队列、所述输入队列和所述通信管道。
  4. 根据权利要求1至3任一项所述的方法,其中,所述至少一个子进程中每个子进程的检索结果是将对应的检索子任务中的底库分片量化为量化特征后,基于对应的检索子任务中的特征分片和所述量化特征得到的结果;
    所述根据所述各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果,包括:
    在所述每个主进程中,确定接收到的检索结果与所述待检索底库中相应的原始特征的相似度;
    在所述每个主进程接收到的检索结果中,选取数量为第一预设值的检索结果,将选取的所述数量为第一预设值的检索结果作为主进程的输出检索结果;
    根据所述各个主进程的输出检索结果,得出所述待检索特征在所述待检索底库中的检索结果。
  5. 根据权利要求4所述的方法,其中,所述根据所述各个主进程的输出检索结果,得出待检索特征在待检索底库中的检索结果,包括:
    在所述各个主进程的输出检索结果中,通过将每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务在所述待检索底库中的检索结果;
    得出所述待检索特征在待检索底库中的检索结果,所述待检索特征在待检索底库中的检索结果包括:所述各个检索子任务在所述待检索底库中的检索结果。
  6. 根据权利要求5所述的方法,其中,所述通过将每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务在所述待检索底库中的检索结果,包括:
    将所述每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务对应的合并结果;
    在所述每个检索子任务对应的合并结果中,选取数量为第二预设值的检索结果,将选取的所述数量为第二预设值的检索结果作为所述每个检索子任务在所述待检索底库中的检索结果。
  7. 根据权利要求1至6任一项所述的方法,其中,在对所述待检索底库进行分片之前,所述方法还包括:
    确定所述待检索底库中每个特征的属性;
    按照所述待检索底库中每个特征的属性,将所述待检索底库中不属于第一预设属性范围的特征滤除;
    在获取待检索请求中待检索特征的至少两个特征分片之前,所述方法还包括:
    确定所述待检索特征每个特征的属性;
    按照所述待检索特征中每个特征的属性,将所述待检索特征中不属于第二预设属性范围的特征滤除。
  8. 一种特征检索装置,所述装置包括获取部分、第一处理部分、第二处理部分、第三处理部分和第四处理部分,其中,
    获取部分,配置为获取待检索请求中待检索特征的至少两个特征分片;
    第一处理部分,配置为基于待检索底库分片后的至少两个底库分片和所述至少两个特征分片,确定在各所述底库分片进行检索的检索子任务集合;
    第二处理部分,配置为利用至少一个主进程中的每个主进程创建至少一个子进程;
    第三处理部分,配置为利用每个主进程将已分配给自身的检索子任务再次划分后分配至各个子进程,并接收各个子进程的检索结果;
    第四处理部分,配置为根据所述各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果。
  9. 根据权利要求8所述的装置,其中,所述第二处理部分,配置为利用至少一个主进程中的每个主进程创建至少一个子进程,包括:
    在所述每个主进程中,获取针对所述已分配的检索子任务对应的底库分片的索引文件,创建所述主进程用于向所述至少一个子进程输出数据的输出队列、用于接收所述至少一个子进程的数据的输入队列、以及所述主进程与所述至少一个子进程之间的通信管道;
    根据所述索引文件、所述输出队列、所述输入队列和所述通信管道,创建所述至少一个子进程。
  10. 根据权利要求9所述的装置,其中,所述至少一个子进程中每个子进程的检索结果是基于加速设备处理对应的检索子任务得到的;
    所述第二处理部分,配置为创建所述主进程用于向所述多个子进程输出数据的输出队列、用于接收所述多个子进程的数据的输入队列、以及所述主进程与所述多个子进程之间的通信管道,包括:
    将预先确定的加速设备的数量确定为待创建的子进程的数量;
    根据所述待创建的子进程的数量,创建所述输出队列、所述输入队列和所述通信管道。
  11. 根据权利要求8至10任一项所述的装置,其中,所述至少一个子进程中每个子进程的检索结果是将对应的检索子任务中的底库分片量化为量化特征后,基于对应的检索子任务中的特征分片和所述量化特征得到的结果;
    所述第四处理部分,配置为根据所述各个主进程接收到的检索结果,确定待检索特征在待检索底库中的检索结果,包括:
    在所述每个主进程中,确定接收到的检索结果与所述待检索底库中相应的原始特征的相似度;
    在所述每个主进程接收到的检索结果中,选取数量为第一预设值的检索结果,将选取的所述数量为第一预设值的检索结果作为主进程的输出检索结果;
    根据所述各个主进程的输出检索结果,得出所述待检索特征在所述待检索底库中的检索结果。
  12. 根据权利要求11所述的装置,其中,所述第四处理部分,配置为根据所述各个主进程的输出检索结果,得出待检索特征在待检索底库中的检索结果,包括:
    在所述各个主进程的输出检索结果中,通过将每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务在所述待检索底库中的检索结果;
    得出所述待检索特征在待检索底库中的检索结果,所述待检索特征在待检索底库中的检索结果包括:所述各个检索子任务在所述待检索底库中的检索结果。
  13. 根据权利要求12所述的装置,其中,所述第四处理部分,配置为通过将每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务在所述待检索底库中的检索结果,包括:
    将所述每个检索子任务在各个底库分片的检索结果进行合并,得到所述每个检索子任务对应的合并结果;
    在所述每个检索子任务对应的合并结果中,选取数量为第二预设值的检索结果,将选取的所述数量为第二预设值的检索结果作为所述每个检索子任务在所述待检索底库中的检索结果。
  14. 根据权利要求8至13任一项所述的装置,其中,所述获取部分,还配置为在对所述待检索底库进行分片之前,确定所述待检索底库中每个特征的属性;按照所述待检索底库中每个特征的属性,将所述待检索底库中不属于第一预设属性范围的特征滤除;
    所述获取部分,还配置为在获取待检索请求中待检索特征的至少两个特征分片之前,确定所述待检索特征每个特征的属性;按照所述待检索特征中每个特征的属性,将所述待检索特征中不属于第二预设属性范围的特征滤除。
  15. 一种电子设备,包括处理器和用于存储能够在处理器上运行的计算机程序的存储器;其中,
    所述处理器用于运行所述计算机程序以执行权利要求1至7任一项所述的特征检索方法。
  16. 一种计算机存储介质,其上存储有计算机程序,该计算机程序被处理器执行时实现权利要求1至7任一项所述的特征检索方法。
  17. 一种计算机程序,包括计算机可读代码,当所述计算机可读代码在电子设备中运行时,所述电子设备中的处理器执行用于实现权利要求1至7任一所述的特征检索方法。
PCT/CN2022/113931 2021-12-29 2022-08-22 特征检索方法、装置、电子设备、计算机存储介质和程序 WO2023124135A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202111640138.4 2021-12-29
CN202111640138.4A CN114296965A (zh) 2021-12-29 2021-12-29 特征检索方法、装置、电子设备及计算机存储介质

Publications (1)

Publication Number Publication Date
WO2023124135A1 true WO2023124135A1 (zh) 2023-07-06

Family

ID=80971000

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/113931 WO2023124135A1 (zh) 2021-12-29 2022-08-22 特征检索方法、装置、电子设备、计算机存储介质和程序

Country Status (2)

Country Link
CN (1) CN114296965A (zh)
WO (1) WO2023124135A1 (zh)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114296965A (zh) * 2021-12-29 2022-04-08 北京市商汤科技开发有限公司 特征检索方法、装置、电子设备及计算机存储介质

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062697A (zh) * 2018-08-07 2018-12-21 北京超图软件股份有限公司 一种提供空间分析服务的方法和装置
CN110413386A (zh) * 2019-06-27 2019-11-05 深圳市富途网络科技有限公司 多进程处理方法、装置、终端设备及计算机可读存储介质
CN111488492A (zh) * 2020-04-08 2020-08-04 北京百度网讯科技有限公司 用于检索图数据库的方法和装置
CN111522969A (zh) * 2020-03-31 2020-08-11 北京旷视科技有限公司 图像检索方法、装置、计算机设备和存储介质
WO2021168815A1 (zh) * 2020-02-28 2021-09-02 华为技术有限公司 图像检索方法和图像检索装置
CN114296965A (zh) * 2021-12-29 2022-04-08 北京市商汤科技开发有限公司 特征检索方法、装置、电子设备及计算机存储介质

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109062697A (zh) * 2018-08-07 2018-12-21 北京超图软件股份有限公司 一种提供空间分析服务的方法和装置
CN110413386A (zh) * 2019-06-27 2019-11-05 深圳市富途网络科技有限公司 多进程处理方法、装置、终端设备及计算机可读存储介质
WO2021168815A1 (zh) * 2020-02-28 2021-09-02 华为技术有限公司 图像检索方法和图像检索装置
CN111522969A (zh) * 2020-03-31 2020-08-11 北京旷视科技有限公司 图像检索方法、装置、计算机设备和存储介质
CN111488492A (zh) * 2020-04-08 2020-08-04 北京百度网讯科技有限公司 用于检索图数据库的方法和装置
CN114296965A (zh) * 2021-12-29 2022-04-08 北京市商汤科技开发有限公司 特征检索方法、装置、电子设备及计算机存储介质

Also Published As

Publication number Publication date
CN114296965A (zh) 2022-04-08

Similar Documents

Publication Publication Date Title
US11544623B2 (en) Consistent filtering of machine learning data
US20220335338A1 (en) Feature processing tradeoff management
US10713589B1 (en) Consistent sort-based record-level shuffling of machine learning data
US10318882B2 (en) Optimized training of linear machine learning models
US11100420B2 (en) Input processing for machine learning
US10366053B1 (en) Consistent randomized record-level splitting of machine learning data
US10339465B2 (en) Optimized decision tree based models
US11868359B2 (en) Dynamically assigning queries to secondary query processing resources
US9672474B2 (en) Concurrent binning of machine learning data
US11394794B2 (en) Fast ingestion of records in a database using data locality and queuing
US9996593B1 (en) Parallel processing framework
US10922316B2 (en) Using computing resources to perform database queries according to a dynamically determined query size
US10452702B2 (en) Data clustering
US11301425B2 (en) Systems and computer implemented methods for semantic data compression
JP2017512338A (ja) 第一クラスデータベース要素としての半構造データの実装
KR20130049111A (ko) 분산 처리를 이용한 포렌식 인덱스 방법 및 장치
US10025645B1 (en) Event Processing System
WO2017174013A1 (zh) 数据存储管理方法、装置及数据存储系统
WO2023124135A1 (zh) 特征检索方法、装置、电子设备、计算机存储介质和程序
US10599626B2 (en) Organization for efficient data analytics
CN116132448B (zh) 基于人工智能的数据分流方法及相关设备
CN110909072B (zh) 一种数据表建立方法、装置及设备
Doulkeridis et al. Parallel and distributed processing of spatial preference queries using keywords
US20190294701A1 (en) Data replication in a distributed file system
CN111428140B (zh) 高并发数据检索方法、装置、设备及存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22913419

Country of ref document: EP

Kind code of ref document: A1