CN112307247A

CN112307247A - Distributed face retrieval system and method

Info

Publication number: CN112307247A
Application number: CN202011094648.1A
Authority: CN
Inventors: 赵万磊; 邬松渊; 赵奕; 丁泽良; 赵捷
Original assignee: Ningbo Boden Intelligent Technology Co ltd
Current assignee: Ningbo Boden Intelligent Technology Co ltd
Priority date: 2020-10-14
Filing date: 2020-10-14
Publication date: 2021-02-02
Anticipated expiration: 2040-10-14
Also published as: CN112307247B

Abstract

The invention discloses a distributed face retrieval system, which relates to the technical field of rapid identification of face data and comprises a database, a message queue, a distributed file system, a cold start service module, a feature extraction service module, a feature slicing service module, a retrieval service module, a hot update service module, a backup and recovery service module and a log service module. In addition, the distributed face retrieval method is also disclosed, and comprises the following steps: s100, training a model; step S200, establishing a face feature library; step S300, writing into a distributed file system; step S400, processing image information and retrieval request; step S500, updating the information of the retrieval service module in real time; and step S600, regularly backing up or restoring. The invention solves the problem of high concurrency, improves the retrieval speed, reduces the complexity of the face retrieval system and promotes the high expansibility of the face retrieval system.

Description

Distributed face retrieval system and method

Technical Field

The invention relates to the technical field of rapid identification of face data, in particular to a distributed face retrieval system and a distributed face retrieval method.

Background

A rapid retrieval service based on large-scale face data is showing its important value in many application scenarios. For example, in the aspect of identity authentication, the rapid face retrieval service can effectively relieve the cost for purchasing and maintaining the identity card reading terminal equipment, and meanwhile, the burden of a person to be authenticated is reduced, so that the authentication can be realized without extra operation. And in the application of personnel trajectory tracking, the personnel positioning efficiency can be greatly improved by carrying out face recognition and retrieval on the video in each monitoring camera. However, the huge face data associated with the face search method becomes one of the main bottlenecks that the search service is restricted to be close to real-time loss in a massive face database to accurately query a plurality of pieces of face information most similar to the face information.

For a face retrieval system, the face retrieval system can be simply divided into two stages of face recognition, feature extraction and feature retrieval. For the first stage, with the continuous development and progress of the face recognition technology based on deep learning in recent years, the face target can be recognized and feature extracted in real time at present. However, the data set used for deep learning network model training has great differences in distribution of race, age, gender, and the like and in practical application scenes, which may cause that the recognition and feature extraction model cannot effectively recognize the face and generate feature information with discrimination, and then indirectly introduces retrieval noise. How to change the deep learning model to be more suitable for the application scene is also a problem which needs to be solved urgently at present.

In addition, the huge face database also causes that a large amount of feature comparison calculation needs to be performed during the retrieval and query process of the retrieved feature target. In practical applications, it is impractical to compare the retrieved feature targets with each of the face features in the face database. In order to improve the face retrieval speed, a large-scale face database generally divides all face features into sub-databases according to certain rules, such as the slice rules of the distance between the face features and a central point and the like, so that the calculation amount is reduced during retrieval, namely, slice processing is performed, one or more feature slices needing to be queried can be quickly positioned by comparing the slice rules in the retrieval process, and the calculation amount in the matching process is effectively reduced. How to establish valid slice index rules will directly affect the accuracy and real-time of the service. At present, establishing a feature index by using clustering is a common way, and specifically, it can be understood that a face feature library is clustered by some clustering algorithms to finally obtain a plurality of clustered clusters, and a central vector of each cluster is regarded as a feature comparison standard. When the searched features are input, the potential candidate clusters are obtained by comparing the searched features with the central vector of each cluster, the operation is continuously repeated until the cluster granularity is minimum, and finally all the candidate cluster members are compared one by one to obtain a final matching result, so that the aim of acceleration is fulfilled.

Although the method effectively reduces the time consumption during retrieval, the face information stored in the face library is always in a dynamic updating state, and the index center cluster vector deviates from the real center cluster vector along with the continuous execution of the operations of adding, deleting and changing the face feature library, so that the retrieval error is gradually increased, and the reliability of service is reduced. In order to effectively alleviate the phenomenon, the hierarchical index generation operation needs to be executed again after the operations of face feature addition, deletion and modification are completed, and the original cluster members are regrouped, which directly results in that the complexity of the retrieval service is improved and the expandability is greatly reduced.

The current common face retrieval platform on the market has the following problems: firstly, the retrieval matching speed of million-level and ten million-level massive face data is slow and the accuracy is poor: with the continuous advance of intelligent construction, people can find the figure of face retrieval matching service in various application scenes such as schools, parks and the like. However, the retrieved face data involved in the scene is relatively closed and the number of faces is relatively small, so that the performance requirement for the retrieval service is not very high. If the face data set is applied to the government affair environment, the matched face data set reaches the order of millions or even tens of millions, and the multiple network points simultaneously request the retrieval service, so that a high concurrency phenomenon is caused. Therefore, how to design an effective service architecture and accurately and rapidly search matching algorithms become the crucial factors. In order to solve the above problems, most of the current mainstream retrieval matching platforms divide features by performing routing combination in a slice structure, so as to implement a multi-feature sub-graph node (feature information of a node, i.e. certain face data, which is used for feature comparison to determine similar face data.) service request, rather than directly processing a retrieval graph with ten million nodes. Under the condition of high feature dimension, due to the natural defect of the distance calculation method in slice retrieval, the slice structure cannot guarantee that the model structure can accurately find the matching target.

Secondly, the updating nodes of the face retrieval platform are high in complexity and poor in expandability, and as the number of face data is continuously increased, higher requirements are provided for the expandability and the stability of the platform. However, for a general face retrieval platform, due to the limitation of the traditional hierarchical retrieval algorithm, the requirement of the added and changed nodes on the input features is high, and a partition where a new node of the input features is located needs to be effectively judged. As the node deletion and addition operations are increased continuously, slice reference deviation is finally caused, so that the retrieval result is inaccurate, the slice cutting mechanism needs to be updated regularly, the platform complexity is increased, and the expansion is not easy.

Therefore, those skilled in the art are devoted to developing a distributed face retrieval system and method.

Disclosure of Invention

In view of the above defects in the prior art, the technical problems to be solved by the present invention are that the retrieval matching speed of the mass face data is slow, the accuracy is not good, and the complexity of the update node of the face retrieval platform is high and the expandability is not sufficient.

The inventor establishes a retrieval graph for the face input characteristics by utilizing a nearest neighbor network structure in order to directly retrieve a face library with millions and millions of nodes during face retrieval and avoid the problem of inaccurate retrieval caused by hierarchical routing. The nearest neighbor network structure can easily cope with nodes with millions and millions of levels, extra levels do not need to be divided, the complexity of the model is reduced, and meanwhile the structure can easily and transversely expand the retrieval service module.

In one embodiment of the invention, a distributed face retrieval system is provided, which comprises a database, a message queue, a distributed file system, a cold start service module, a feature extraction service module, a feature slicing service module, a retrieval service module, a hot update service module, a backup and recovery service module and a log service module;

the database comprises a face image library, a face feature library and a log library, wherein the face image library stores face image information, the face feature library stores face feature information, and the log library stores logs;

a message queue is a container that holds messages, i.e. information that needs to be transmitted between two computers, during the transmission of the messages, which are sent into the message queue, which acts as a man-in-the-middle role in relaying the message from its source to its destination; the message queue comprises an image message queue, a feature message queue, a slice message queue and a retrieval message queue, wherein the image message queue is a container for face image information extracted by features, the feature message queue is a container for feature information after face feature extraction, the slice message queue is a container for feature slice information when the feature extraction is finished, and the retrieval message queue is a container for retrieval request information;

the distributed file system deploys storage resources in a distributed mode to meet the requirements of continuous amplification of storage capacity and node migration;

the cold start service module sends a start beacon to other modules in the system;

the feature extraction service module extracts features of the face image by using a pre-trained feature extraction model, and sends a feature extraction completion beacon and a slicing rule to the feature slicing service module after the feature extraction is completed;

in response to the slicing rule, the feature slicing service module performs grouping slicing on the extracted features, writes the extracted features into a distributed file system, and sends a slicing completion beacon to the retrieval service module after the slicing is completed;

responding to the slicing completion beacon, the retrieval service module establishes a retrieval image to prepare for receiving a retrieval request and informs the hot update service module of starting;

the log service module monitors the feature extraction process of the face image and the recovery and backup of the retrieval image;

responding to the updating request, the hot updating service module schedules the feature extraction service module and the retrieval service module to update the retrieval map;

and in response to the persistence request sent by the recovery and backup service module at regular time, the retrieval service module writes the retrieval graph into the distributed file system.

Optionally, in the distributed face retrieval system in the embodiment, the number of the feature extraction service modules is multiple, so as to improve concurrency performance.

Optionally, in the distributed face retrieval system in the above embodiment, the distributed file system is connected to the nodes through a computer network using physical storage resources managed by the file system.

Optionally, in the distributed face retrieval system in the above embodiment, the distributed file system uses a logical storage resource, and several different logical disk partitions or volume labels are combined together to form a complete hierarchical system.

Optionally, in the distributed face retrieval system in any embodiment of the foregoing, the retrieval map is a feature retrieval map generated by using a KNN algorithm, and is used for target quick retrieval.

Optionally, in the distributed face retrieval system in any of the embodiments, there are a plurality of retrieval service modules, which are used to improve concurrency performance.

Optionally, in the distributed face retrieval system in any of the above embodiments, the above slice rule is at most 100 records in a single file.

Based on any one of the above embodiments, in another embodiment of the present invention, a distributed face retrieval method is provided, including the following steps:

step S100, training a model, a face target recognition model and a face feature extraction model;

step S200, establishing a face feature library, extracting corresponding face features from face images in the face image library by using a deep learning neural network model, and writing the extracted face features into the face feature library;

step S300, writing the face features in the face feature library into a file by writing the face features into a distributed file system, and ensuring that the face features can be quickly read;

step S400, processing image information and a retrieval request, responding to the image information, establishing a retrieval image by the retrieval service module, and responding to the retrieval request, and retrieving by the retrieval service module;

step S500, updating information of a retrieval service module in real time, and scheduling the retrieval service module to update a retrieval graph in real time by the hot update service module;

step S600, performing backup or recovery at regular time, and in response to a backup or recovery request issued at regular time by the backup and recovery service module, executing backup or recovery operation by the retrieval service module.

Optionally, in the distributed face retrieval method in the above embodiment, step S100 specifically includes:

s110, reading an open source data set, and reading face image information in the open source data set;

s120, training a face target recognition model, namely, taking a face target frame and a target classification loss function as targets, ending the training of the whole deep learning frame until the precision is not obviously improved any more, and storing corresponding neural network parameters;

s130, preprocessing a face recognition result, namely preprocessing the size of a face image, of an input face recognition result, and adjusting all face images to be the same in width and height so as to generate face features with the same dimensionality;

s140, training a face feature extraction model, namely, finishing the training of the whole deep learning frame by taking a face classification loss function as a target until the precision is not obviously improved any more, and storing corresponding neural network parameters;

s150, reasoning acceleration is achieved, namely model conversion is achieved, after training is completed, the face target identification model and the face feature extraction model are converted into an onnx or trt file structure which can be read by a TensorRT framework, and the reasoning speed of the model is improved.

Optionally, in the distributed face retrieval method in any of the above embodiments, the step S200 specifically includes:

and S210, starting the system, wherein the cold start service module reads the face image information in the face image library and sends a start beacon to each module in the distributed face retrieval system.

S220, pushing face image information, and pushing the face image information in the face image library to the image message queue one by one;

s230, extracting features, wherein the feature extraction service module acquires the face image information through subscription and performs feature extraction to obtain corresponding face feature information, and records the result of the feature extraction into a log; the feature extraction service module can be continuously and transversely expanded under the condition of resource permission;

s240, pushing the face feature information, wherein the feature extraction service module pushes the face feature information to the feature message queue;

s250, acquiring face feature information, wherein the cold start service acquires the face feature information through subscription;

and S260, establishing a face feature library, and storing the face feature information into the face feature library by the cold start service module.

Optionally, in the distributed face retrieval method in any of the above embodiments, the step S300 specifically includes:

s310, image feature extraction is monitored and detected, the log service module monitors and detects the image feature extraction in real time, and when all face feature information extraction is confirmed to be completed, the face feature information is pushed to the feature slicing service module;

s320, inquiring the face feature data in batch, wherein the feature slicing service module inquires the face feature data in batch from the face feature library according to a slicing rule;

s330, storing the human face features into the distributed file system, wherein the feature slicing service module stores the feature files of the human face feature data read in batches into the distributed file system according to a slicing rule;

s340, pushing a file storage path and slice information, wherein the characteristic slice service module pushes the file storage path and the slice information to the slice message queue;

and S350, generating a retrieval image, wherein the retrieval service module retrieves according to the acquired human face features and returns a retrieval result, acquires slice information and a feature file address according to subscription, acquires a feature file through the address and generates a nearest neighbor retrieval image.

Further, in the distributed face retrieval method in any of the above embodiments, the retrieval service module interacts with the user in step S400, and the user sends an add, delete, modify, and view request through a web page.

Optionally, in the distributed face retrieval method in any of the above embodiments, the step S400 specifically includes:

s410, receiving image information and a retrieval request, wherein the retrieval service module receives the image information and the retrieval request and stores an operation log into the log library;

s420, pushing a query request, wherein the retrieval service module pushes the query request to different subscription topics according to the slices during pushing; the subscription topic is a mode of mutual communication between message queues, and one-to-many content broadcasting is realized by subscribing the same topic

S430, feature extraction is carried out, and the feature extraction service module acquires the face image information to be processed according to subscription and carries out feature extraction;

s440, pushing the face feature information to a feature message queue, wherein the feature extraction service module pushes the face feature information to the feature message queue;

s450, pushing the face feature retrieval information to a retrieval message queue, and pushing the face feature retrieval information to the retrieval message queue by the retrieval service module to initiate a retrieval request;

s460, obtaining the retrieval request according to the subscription, wherein the retrieval service module obtains the retrieval request according to the subscription;

s470, returning a retrieval result, wherein the retrieval service module returns the retrieval result;

s480, acquiring a retrieval result of each slice, wherein the retrieval service module acquires the retrieval result of each slice;

and S490, returning a retrieval result, wherein the retrieval service module returns the retrieval result.

Optionally, in the distributed face retrieval method in any of the above embodiments, step S500 specifically includes:

s510, the hot update service module receives a request for adding, deleting and modifying;

s520, storing the operation request to be updated into a log library;

s530, responding to the deletion request, executing S540, otherwise, the hot updating service module performs image information subscription and feature extraction, and pushes the human face features to the feature message queue;

s540, the hot updating service module acquires the characteristic information through subscription and determines a subgraph which needs to be updated;

s550, the hot update service module broadcasts the subgraph to be updated;

s560, the retrieval service module receives the broadcast for updating;

s570, the retrieval service module pushes the update result;

and S580, if the updating of all the retrieval modules is completed, identifying that the updating is completed, and otherwise, returning to the step S570.

Optionally, in the distributed face retrieval method in any of the above embodiments, the step S600 specifically includes:

s610, the backup and recovery service module distributes backup or recovery requests to the retrieval service module at regular time according to the slicing rule;

s620, if the request is a backup request, the retrieval service module writes corresponding feature slices and sub-graph information into the distributed file system; if not, the retrieval service module pulls the latest feature slice and sub-graph information to the file system, checks whether the data is latest, and if not, executes updating operation to update; if it is up to date, no action is taken.

The invention solves the problems of low retrieval matching speed and poor accuracy of mass human face data by constructing the nearest neighbor retrieval network graph, can continuously expand the retrieval module under the condition of resource permission, effectively solves the problem of high concurrency, improves the response speed of the whole retrieval service, promotes the utilization rate of system resources, realizes the high-precision retrieval with the highest speed, reduces the complexity of a human face retrieval system and promotes the high expansibility of the human face retrieval system.

The conception, the specific structure and the technical effects of the present invention will be further described with reference to the accompanying drawings to fully understand the objects, the features and the effects of the present invention.

Drawings

FIG. 1 is a block diagram illustrating a distributed face retrieval system in accordance with an illustrative embodiment;

FIG. 2 is a flow diagram illustrating a distributed face retrieval method according to an exemplary embodiment;

FIG. 3 is a flow diagram illustrating training an extraction model in accordance with an illustrative embodiment;

FIG. 4 is a flow diagram illustrating the creation of a face feature library in accordance with an illustrative embodiment;

FIG. 5 is a flowchart illustrating a write distributed file system in accordance with an illustrative embodiment;

FIG. 6 is a flowchart illustrating processing image information and retrieval requests according to an exemplary embodiment;

FIG. 7 is a flow diagram illustrating updating module information in real-time in accordance with an illustrative embodiment;

FIG. 8 is a flowchart illustrating a timed backup or restore according to an example embodiment.

Detailed Description

The technical contents of the preferred embodiments of the present invention will be more clearly and easily understood by referring to the drawings attached to the specification. The present invention may be embodied in many different forms of embodiments and the scope of the invention is not limited to the embodiments set forth herein.

In the drawings, structurally identical elements are represented by like reference numerals, and structurally or functionally similar elements are represented by like reference numerals throughout the several views. The size and thickness of each component shown in the drawings are arbitrarily illustrated, and the present invention is not limited to the size and thickness of each component. The thickness of the components is exaggerated somewhat schematically and appropriately in order to make the illustration clearer.

The inventor establishes a retrieval graph for the face input characteristics by utilizing a nearest neighbor network structure in order to directly retrieve a face library with millions and millions of nodes during face retrieval and avoid the problem of inaccurate retrieval caused by hierarchical routing. In addition, the nearest neighbor network structure can easily deal with nodes with the number of millions and millions, extra hierarchy division is not needed, the complexity of the model is reduced, and meanwhile, the structure can easily and transversely expand the retrieval service module.

The inventor designs a distributed face retrieval system, as shown in fig. 1, which comprises a database, a message queue, a distributed file system, a cold start service module, a feature extraction service module, a feature slicing service module, a retrieval service module, a hot update service module, a backup and recovery service module and a log service module; wherein

the distributed file system deploys storage resources in a distributed manner, and the embodiment uses physical storage resources managed by the file system and is connected with nodes through a computer network so as to meet the requirements of continuous storage capacity expansion and node migration;

the feature extraction service module extracts features of the face image by using a pre-trained feature extraction model, and sends a feature extraction completion beacon and a slicing rule to the feature slicing service module after the feature extraction is completed; in order to improve concurrency performance, the inventor designs a plurality of feature extraction service modules; the inventor sets a slicing rule to be a maximum of 100 records of a single file;

responding to the slicing completion beacon, the retrieval service module establishes a retrieval image to prepare for receiving a retrieval request and informs the hot update service module of starting; the retrieval map is a feature retrieval map generated by a KNN algorithm and used for target quick retrieval. The number of the retrieval service modules is multiple, and the retrieval service modules are used for improving concurrency performance.

Based on the above embodiment, the inventor has designed a distributed face retrieval method, as shown in fig. 2, including the following steps:

step S100, training a model, a face target recognition model and a face feature extraction model; as shown in fig. 3, the method specifically includes:

Step S200, establishing a face feature library, extracting corresponding face features from face images in the face image library by using a deep learning neural network model, and writing the extracted face features into the face feature library; as shown in fig. 4, the method specifically includes:

Step S300, writing the face features in the face feature library into a file by writing the face features into a distributed file system, and ensuring that the face features can be quickly read; as shown in fig. 5, the method specifically includes:

Step S400, processing the image information and the retrieval request, responding to the image information, establishing a retrieval graph by the retrieval service module, and responding to the retrieval request, and retrieving by the retrieval service module; the retrieval service module interacts with the user, and the user sends an adding, deleting, modifying and viewing request through a webpage. As shown in fig. 6, the method specifically includes:

Step S500, updating information of a retrieval service module in real time, and scheduling the retrieval service module to update a retrieval graph in real time by the hot update service module; as shown in fig. 7, the method specifically includes:

s520, storing the operation request to be updated into a log library

s550, the hot update service module broadcasts the subgraph to be updated;

s560, the retrieval service module receives the broadcast for updating;

s570, the retrieval service module pushes the update result;

Step S600, backing up or recovering at regular time, and responding to a backing up or recovering request sent by the backing up and recovering service module at regular time, wherein the retrieval service module executes backing up or recovering operation; as shown in fig. 8, the method specifically includes:

s620, if the request is a backup request, the retrieval service module writes corresponding feature slices and sub-graph information into the file system; if not, the retrieval service module pulls the latest feature slice and the sub-graph information to the file system, checks whether the data is latest, and if not, executes the updating operation to update; if it is up to date, no action is taken.

The foregoing detailed description of the preferred embodiments of the invention has been presented. It should be understood that numerous modifications and variations could be devised by those skilled in the art in light of the present teachings without departing from the inventive concepts. Therefore, the technical solutions available to those skilled in the art through logic analysis, reasoning and limited experiments based on the prior art according to the concept of the present invention should be within the scope of protection defined by the claims.

Claims

1. A distributed face retrieval system is characterized by comprising a database, a message queue, a distributed file system, a cold start service module, a feature extraction service module, a feature slicing service module, a retrieval service module, a hot update service module, a backup and recovery service module and a log service module;

the message queue comprises an image message queue, a feature message queue, a slice message queue and a retrieval message queue, wherein the image message queue is a container for face image information subjected to feature extraction, the feature message queue is a container for feature information after the face feature extraction, the slice message queue is a container for feature slice information when the feature extraction is finished, and the retrieval message queue is a container for retrieval request information;

the distributed file system deploys storage resources in a distributed mode to meet the requirements of continuous expansion of storage capacity and node migration;

the cold start service module sends a start beacon to other modules in the distributed face retrieval system;

in response to the slicing rule, the feature slicing service module performs grouping slicing on the extracted human face features, writes the human face features into the distributed file system, and sends a slicing completion beacon to the retrieval service module after the slicing is completed;

responding to the slice completion beacon, the retrieval service module establishes a retrieval image to be prepared for receiving a retrieval request and informs the hot update service module of starting;

the log service module monitors the face feature extraction process and the recovery and backup of the retrieval image;

responding to an updating request, and scheduling the feature extraction service module and the retrieval service module to update the retrieval map by the hot updating service module;

and responding to a persistence request sent by the recovery and backup service module at regular time, and writing the retrieval graph into the distributed file system by the retrieval service module.

2. The distributed face retrieval system of claim 1, wherein the plurality of feature extraction service modules are configured to improve concurrency.

3. The distributed face retrieval system of claim 1, wherein there are a plurality of retrieved service modules for improving concurrency performance.

4. A distributed face retrieval method based on the distributed face retrieval system according to any one of claims 1 to 3, comprising the steps of:

step S400, processing image information and a retrieval request, responding to the image information, establishing a retrieval graph by the retrieval service module, and responding to the retrieval request, and retrieving by the retrieval service module;

step S500, updating information of a retrieval service module in real time, and scheduling the retrieval service module to update a retrieval map in real time by the hot update service module;

step S600, performing timed backup or recovery, and in response to the backup or recovery request periodically distributed by the backup and recovery service module, the retrieval service module performs backup or recovery operation.

5. The distributed face retrieval method of claim 4, wherein the step S100 comprises:

s150, reasoning acceleration is achieved, namely model conversion is achieved, after training is completed, the face target recognition model and the face feature extraction model are converted into an onnx or trt file structure which can be read by a TensorRT framework, and the reasoning speed of the model is improved.

6. The distributed face retrieval method of claim 4, wherein the step S200 comprises:

s230, extracting features, wherein the feature extraction service module acquires the face image information through subscription and performs feature extraction to obtain corresponding face feature information, and records the result of the feature extraction to a log;

s240, pushing face feature information, and pushing the face feature information to the feature message queue by the feature extraction service module;

7. The distributed face retrieval method of claim 4, wherein the step S300 comprises:

s320, inquiring the human face feature data in batch, wherein the feature slicing service module inquires the human face feature data in batch from the human face feature library according to a slicing rule;

8. The distributed face retrieval method of claim 4, wherein the step S400 comprises:

s420, pushing a query request, wherein the retrieval service module pushes the query request to different subscription topics according to the slices during pushing;

9. The distributed face retrieval method of claim 4, wherein the step S500 comprises:

s520, storing the operation request to be updated into a log library;

s530, responding to the deletion request, executing S540, otherwise, the hot update service module subscribes to acquire image information and extracts features, and pushes the face features to the feature message queue;

s550, the hot update service module broadcasts the subgraph to be updated;

s560, the retrieval service module receives the broadcast for updating;

s570, the retrieval service module pushes the update result;

10. The distributed face retrieval method of claim 4, wherein the step S600 comprises: