WO2019042230A1 - 人脸图像检索方法和系统、拍摄装置、计算机存储介质 - Google Patents

人脸图像检索方法和系统、拍摄装置、计算机存储介质 Download PDF

Info

Publication number
WO2019042230A1
WO2019042230A1 PCT/CN2018/102267 CN2018102267W WO2019042230A1 WO 2019042230 A1 WO2019042230 A1 WO 2019042230A1 CN 2018102267 W CN2018102267 W CN 2018102267W WO 2019042230 A1 WO2019042230 A1 WO 2019042230A1
Authority
WO
WIPO (PCT)
Prior art keywords
image
information
retrieved
face
neural network
Prior art date
Application number
PCT/CN2018/102267
Other languages
English (en)
French (fr)
Inventor
赖海斌
毛宁元
李清正
刘文志
Original Assignee
深圳市商汤科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳市商汤科技有限公司 filed Critical 深圳市商汤科技有限公司
Priority to JP2019571526A priority Critical patent/JP7038744B2/ja
Priority to SG11202000075QA priority patent/SG11202000075QA/en
Publication of WO2019042230A1 publication Critical patent/WO2019042230A1/zh
Priority to US16/732,225 priority patent/US11182594B2/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/532Query formulation, e.g. graphical querying
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/53Querying
    • G06F16/538Presentation of query results
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/50Information retrieval; Database structures therefor; File system structures therefor of still image data
    • G06F16/58Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/583Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/70Information retrieval; Database structures therefor; File system structures therefor of video data
    • G06F16/78Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually
    • G06F16/783Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content
    • G06F16/7837Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content
    • G06F16/784Retrieval characterised by using metadata, e.g. metadata not derived from the content or metadata generated manually using metadata automatically derived from the content using objects detected or recognised in the video content the detected or recognised objects being people
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/0463Neocognitrons
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/751Comparing pixel values or logical combinations thereof, or feature values having positional relevance, e.g. template matching
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/82Arrangements for image or video recognition or understanding using pattern recognition or machine learning using neural networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content
    • G06V20/46Extracting features or characteristics from the video content, e.g. video fingerprints, representative shots or key frames
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • G06V40/166Detection; Localisation; Normalisation using acquisition arrangements
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • G06V40/169Holistic features and representations, i.e. based on the facial image taken as a whole
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/172Classification, e.g. identification

Definitions

  • the present disclosure relates to computer vision technology, and in particular to a face image retrieval method and system, a camera device, and a computer storage medium.
  • the embodiment provides a face image retrieval method and system, a photographing device, and a computer storage medium, which can reduce the calculation amount of face recognition based on a convolutional neural network, thereby improving the efficiency of face image retrieval.
  • the convolutional neural network calculates configuration information by convolution corresponding to the processor configuration, the convolutional neural network including at least one convolution layer,
  • the convolution calculation configuration information includes a data bit width value corresponding to each convolution layer in the convolutional neural network; the image to be retrieved includes at least one face region;
  • a photographing apparatus including:
  • a convolution calculation portion configured to obtain, by a convolutional neural network, face information to be retrieved corresponding to the image to be retrieved;
  • the convolutional neural network is configured with convolution calculation configuration information, and the convolutional neural network includes at least one convolution layer
  • the convolution calculation configuration information includes a data bit width value corresponding to each convolution layer in the convolutional neural network; and the image to be retrieved includes at least one face region;
  • a processor configured to configure corresponding convolution calculation configuration information by the convolutional neural network
  • a search part configured to search for matching preset face image information from the database based on the face information to be retrieved; the database stores at least one preset face image information; and outputs the face information to be retrieved to match Preset face image information.
  • a face image retrieval system provided with a photographing apparatus as described above.
  • a computer storage medium for storing computer readable instructions that, when executed, perform the operations of the face image retrieval method as described above.
  • the face image retrieval method and system, the photographing apparatus, and the computer storage medium provided by the above embodiments of the present disclosure obtain the face information to be retrieved corresponding to the image to be retrieved by the convolutional neural network;
  • the convolutional neural network is configured by the processor Corresponding convolution calculation configuration information; since the convolutional neural network sets convolution calculation configuration information, the bit width of the image input into each convolutional layer in the convolutional neural network corresponds to the convolutional layer, reducing the volume-based
  • the computational complexity of the face recognition network improves the processing efficiency of the convolutional layer, and the input image to be retrieved can quickly and accurately obtain the face information to be retrieved, which solves the low precision of the fixed point calculation and affects the accuracy of the calculation result.
  • the problem is that the operation precision of the convolutional neural network is improved; the matching face image information is searched from the database based on the face information to be retrieved; and the preset face image information matching the face information to be retrieved is output; Retrieving matching preset face image information in the database set by the device, realizing the effect of real-time face retrieval, The face image retrieval efficiency.
  • FIG. 1 is a flow chart of an embodiment of a face image retrieval method according to the present disclosure
  • FIG. 2 is a flow chart of another embodiment of a face image retrieval method according to the present disclosure.
  • FIG. 3 is a schematic structural diagram of an embodiment of a face image retrieval device according to the present disclosure.
  • FIG. 4 is a schematic structural diagram of another embodiment of a face image retrieval device according to the present disclosure.
  • FIG. 5 is a schematic structural diagram of an example of the above embodiments of the photographing apparatus of the present disclosure.
  • FIG. 6 is a schematic structural diagram of an example of the above embodiments of the photographing apparatus of the present disclosure.
  • FIG. 7 is a schematic structural diagram of an electronic device 600 suitable for implementing a terminal device or a server of the embodiment.
  • This embodiment can be applied to computer systems/servers that can operate with numerous other general purpose or special purpose computing system environments or configurations.
  • Examples of well-known computing systems, environments, and/or configurations suitable for use with computer systems/servers include, but are not limited to, personal computer systems, server computer systems, thin clients, thick clients, handheld or laptop devices, based on Microprocessor systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems, and distributed cloud computing technology environments including any of the above, and the like.
  • the computer system/server can be described in the general context of computer system executable instructions (such as program modules) being executed by a computer system.
  • program modules may include routines, programs, target programs, components, logic, data structures, and the like that perform particular tasks or implement particular abstract data types.
  • the computer system/server can be implemented in a distributed cloud computing environment where tasks are performed by remote processing devices that are linked through a communication network.
  • program modules may be located on a local or remote computing system storage medium including storage devices.
  • the inventors have found through research that in the existing video surveillance system, face detection and recognition are implemented in the backend server, and the front end is only responsible for the acquisition, encoding and transmission of image data.
  • This combination of front and rear end operation mode requires a large network bandwidth; at the same time, since most of the transmitted video stream data is useless information, the efficiency of effective data extraction by the backend server is greatly reduced; Before the transmission, the lossy coding is performed, and the data obtained by the backend server is not the original image data, which may lead to missed detection or misdetection to some extent.
  • the prior art provides a front-end face capture machine that improves the accuracy of face recognition, but only deploys the central processing unit, the identification and storage module originally deployed on the backend server.
  • the front-end video surveillance equipment because the number of surveillance images and video is huge, it means that the power consumption and cost of the monitor are high, and it is difficult to achieve the effect of detecting the face in real time. Therefore, the face capture machine is actually applied. There is no application value in the scene.
  • FIG. 1 is a flow chart of an embodiment of a face image retrieval method according to the present disclosure. As shown in FIG. 1, it is applied to a photographing device, and the method of this embodiment includes:
  • Step 104 Obtain a face information to be retrieved corresponding to the image to be retrieved by using a convolutional neural network.
  • the convolutional neural network includes convolution calculation configuration information corresponding to the processor configuration, the convolutional neural network includes at least one convolution layer, and the convolution calculation configuration information includes a data bit width corresponding to each convolution layer in the convolutional neural network.
  • the image to be retrieved includes at least one face region, and the image to be retrieved may be acquired according to the retrieval instruction.
  • the retrieval instruction may be sent by the server or the cloud, or directly receive the externally input image to be retrieved and the retrieval instruction. To retrieve and identify images, it is usually based on the corresponding face information. In this step, the corresponding face information to be retrieved is obtained by convolutional neural network processing, and the image retrieval in subsequent retrieval is converted into face information retrieval. To make retrieval faster and without subsequent conversions, where the convolutional neural network can be pre-trained.
  • Step 105 Search for matching preset face image information from the database based on the face information to be retrieved.
  • At least one preset face image information is stored in the database.
  • the face detection and recognition work is realized in the camera by setting the database, which greatly reduces the network bandwidth requirement and improves the data transmission efficiency.
  • the process of face detection and retrieval can be completed by Field-Programmable Gate Array System on Chip (FPGA SoC), which integrates FPGA logic block and CPU into a single crystal silicon, FPGA
  • FPGA SoC Field-Programmable Gate Array System on Chip
  • AXI Advanced EXtensible Interface
  • FPGA can simultaneously perform data parallel and task parallel computing, a task Divided into multi-stage flow operations (simultaneous processing), reducing the detection time of each frame to less than 40ms, greatly improving the real-time detection effect
  • data parallel refers to the image data input by each convolution layer, transmitted between the network layers Data can be set up according to different channels simultaneously
  • task parallel refers to convolution, pooling and full connectivity in neural networks It can be executed in parallel.
  • the traditional embedded system on chip only integrates the CPU and each functional module into a single crystal silicon. Without the FPGA logic block, it is difficult to realize real-time recognition of the face, and it is necessary to use the back-end server or A powerful processor can be implemented.
  • Step 106 Output preset face image information matching the face information to be retrieved.
  • the face information to be retrieved corresponding to the image to be retrieved is obtained by the convolutional neural network; the convolutional neural network calculates the configuration information by the convolution corresponding to the processor configuration; The convolutional neural network sets the convolution calculation configuration information, and the bit width of the image input into each convolutional layer in the convolutional neural network corresponds to the convolutional layer, which reduces the calculation of face recognition based on convolutional neural network.
  • the quantity improves the processing efficiency of the convolutional layer, and the input image to be retrieved can quickly and accurately obtain the face information to be retrieved, solves the problem that the calculation accuracy of the fixed point operation is low and affects the accuracy of the calculation result, and the convolutional neural network is improved.
  • the operation precision is obtained; the matching face image information is searched from the database based on the face information to be retrieved; the preset face image information matching the face information to be retrieved is output; and the matching pre-search is retrieved in the database set by the camera
  • the face image information is set to realize the effect of real-time face retrieval, which improves the efficiency of face image retrieval.
  • operation 104 includes:
  • the convolution calculation is performed on the retrieved image by the convolutional neural network to obtain the face information to be retrieved.
  • each convolution layer can perform calculation without processing the input data, which solves the problem that the calculation accuracy of the fixed point operation is low and affects the accuracy of the calculation result, and the operation precision of the convolutional neural network is improved.
  • the convolution calculation configuration information further includes a convolution kernel size corresponding to each convolutional layer in the convolutional neural network, or a storage address of an image to be retrieved.
  • the storage address of the image to be retrieved is used to read the image to be retrieved in the front end memory according to the storage address for the photographing device; and the input data is read by configuring the input data bit width and the weight data bit width of the convolution layer.
  • the complete input data can be input into the convolution layer by multiple readings, since each The data bit width of the second read corresponds to the convolution layer, thus improving the computational efficiency of the convolution layer, while ensuring the integrity of the input data, and the result of the input data missing due to the setting of the input data bit width Inaccurate.
  • the convolutional neural network performs convolution calculation on the retrieved image to obtain the face information to be retrieved, including:
  • the current convolution layer is a convolution layer in each convolutional layer in the convolutional neural network
  • the following convolutional layer acts as the current convolutional layer, with the feature map as the image to be retrieved, and iteratively performs the calculation of the configuration information from the front end according to the convolutional configuration configured for the current convolutional layer in the convolutional neural network.
  • the memory reads the image to be retrieved; performs convolution calculation on the image to be retrieved by the current convolutional layer to obtain a feature map until the next convolution layer does not exist;
  • the feature map is output to obtain the face information to be retrieved.
  • the next convolutional layer is used as the current convolutional layer, and the calculation result data calculated by the previous convolutional layer is calculated by an iterative method.
  • the input data of the next convolutional layer the input data and the weight data are also read according to the set input data bit width and the weight bit width, and the weight data at this time is the configured weight corresponding to the convolution layer.
  • Data after obtaining the feature map, store the feature map in the front-end memory for the next convolutional layer reading, until the convolutional calculation of the current convolutional layer is completed, there is no next convolution layer, and the current output is obtained.
  • the feature map is the acquired face information to be retrieved by the convolutional neural network.
  • the feature map corresponding to each layer of the convolution layer is stored in the front end memory, so that the next convolution layer can be directly obtained from the front end memory when reading the image to be retrieved, which facilitates data acquisition and data bit width setting.
  • the convolution calculation configuration information further includes an offset address
  • Configuring convolution calculation configuration information for the convolutional neural network also includes:
  • Writing the feature map to the front end memory includes writing the feature map to the storage address of the input data corresponding to the next convolution layer in the front end memory.
  • the storage address of the input data corresponding to the next convolution layer is obtained by superimposing the offset address by the storage address of the input data. Since the output data of the previous convolutional layer is the input data of the next convolution layer in the convolutional neural network. Therefore, the output data of the previous convolutional layer is processed into the input data of the next convolutional layer and stored in the storage address of the input data corresponding to the determined next convolutional layer, and the convolution calculation is started at the next convolutional layer. When you just need to go to the corresponding storage address to read.
  • FIG. 2 is a flow chart of still another embodiment of a face image retrieval method of the present disclosure. As shown in FIG. 2, the method of this embodiment includes:
  • Step 201 Collect a video stream, and filter at least one image in the collected video stream based on each face image appearing in the video stream.
  • the image includes an identifiable face image, and each face image corresponds to at least one image;
  • the collected video stream may be collected in real time through a set monitoring device (eg, a camera, etc.), because the front end storage space is limited, at the front end It is not feasible to save the video stream in the database. Therefore, this step decomposes the video stream into one frame and one frame image, and there are a large number of repeated images, meaningless images (unmanned face images) and blurring for images obtained based on the video stream. Phenomenon, screening all the images obtained by the video stream to obtain an identifiable face image, and collecting at least one image for each face image to ensure that the face appearing in the video is not missed. Subsequent identification can be found more accurately.
  • Step 202 Perform quality screening on at least one image to obtain at least one first image.
  • the at least one first image is an image whose face image quality reaches a set threshold; in order to facilitate the recognition of the face in the image obtained in the previous step, a dedicated image signal processing (ISP, Image Signal Processing) is used to process the chip pair original.
  • ISP Image Signal Processing
  • the image is optimized.
  • the optimization process here can include automatic exposure, automatic white balance, 3D noise reduction, etc. At the same time, it can also select local exposure and extract the region of interest according to the user's needs.
  • the purpose of performing optimization processing is to In order to obtain a first image with high definition, low noise, wide dynamic range, low distortion and low distortion in order to identify a face in the image, the set threshold here can be adjusted according to the specific situation.
  • Step 203 Store at least one preset face image information corresponding to the at least one first image into a database.
  • At least one face image information of the image is stored in the database, and the face image information is for easy retrieval, and face recognition is not required in the retrieval process; and, due to limited front end storage space, image and face image information in the database Regular updates or real-time updates to ensure that there is sufficient space in the database to store newly collected information.
  • Step 104 Obtain a face information to be retrieved corresponding to the image to be retrieved by using a convolutional neural network.
  • the convolutional neural network includes convolution calculation configuration information corresponding to the processor configuration, the convolutional neural network includes at least one convolution layer, and the convolution calculation configuration information includes a data bit width corresponding to each convolution layer in the convolutional neural network. Value; at least one face area is included in the image to be retrieved.
  • Step 105 Search for matching preset face image information from the database based on the face information to be retrieved.
  • At least one preset face image information is stored in the database.
  • Step 106 Output preset face image information matching the face information to be retrieved.
  • the image in addition to matching the received image to be retrieved with the first image stored in the database, the image may be retrieved in real time, and the image to be retrieved is first received and the face information to be retrieved is obtained, and there is no corresponding in the database.
  • the front-end video stream collecting device is used to process the newly collected video stream to obtain clear and identifiable preset face image information, which is performed in the following steps:
  • the at least one first image further includes a background image for identifying a place where the face image in the at least one first image appears.
  • the other information forming background information, where the background information can be provided
  • the location of the face image in the first image thereby obtaining information such as the motion trajectory of the person; the first image having the background image has an auxiliary function for identifying the person corresponding to the face image.
  • the method further includes: before storing the at least one preset face image information corresponding to the at least one first image in the database,
  • the at least one first image is processed by the convolutional neural network to obtain corresponding at least one preset face image information.
  • the convolutional neural network in this embodiment is implemented by the logic part of the FPGA.
  • each preset face image information includes at least one attribute information .
  • the attribute information involved in this embodiment may include: gender, age, expression, ethnicity, whether to wear glasses and whether to wear a mask, etc., and classification based on these attributes includes: based on gender: male/female; age: juvenile/youth / middle-aged / old-age; expression: happy / sad / angry / calm, etc.; race: yellow / black / white / brown; wearing glasses: yes / no; wearing a mask: yes / no; if combined All the above attributes classify the image, and can obtain a label based on each attribute.
  • each image corresponds to multiple labels, such as an image including a middle-aged yellow female, wearing glasses, not wearing a mask, and having a calm expression.
  • the attribute tags corresponding to the face image include: female, middle-aged, calm, yellow, wearing glasses and not wearing a mask; in the classification process, the first image and the face image information having the same label may be stored in the One location.
  • the at least one piece of attribute information is obtained based on the at least one preset face image information, and the at least one preset face image information is based on the at least one piece of attribute information.
  • Classified storage including:
  • the essence provided by this embodiment is a registration process.
  • a first image and corresponding information for identifying the first image are input.
  • the related information herein refers to the user's name information, etc.
  • the image is identified to obtain corresponding face image information, and the preset face image information and the input first image are stored in a database, and when the image is retrieved later, if the first image is retrieved, the face can be directly obtained. Relevant information corresponding to the image.
  • the preset face image information of the first image is stored in the database based on the attribute information, including:
  • All preset face image information having the same attribute information in at least one preset face image information is stored in one data item, and the data item is indexed based on the attribute information in the database.
  • All preset face image information is stored separately in the database, and all preset face image information with the same attribute information is stored in one data item, so as to obtain corresponding preset face image information;
  • the acquisition time value of the image is obtained in the video stream.
  • the storage of the data item may be sorted according to the collection time value corresponding to the preset face image information stored in the update, and the ranking is convenient after the sorting.
  • the search process first obtains the latest qualified image; according to the collection time value, the number and time of the person appearing in the current situation, avoiding confusing the image of the same person who appears multiple times in the same scene, helping the police to search for criminals When there is evidence of crime, it has an auxiliary effect.
  • the preset face image information corresponding to the first image is stored in a database, including:
  • the preset face image information is stored in the corresponding data item
  • This embodiment provides an example of storage for a first image.
  • the data item is searched in the database according to the obtained attribute information. If there is an existing data item, the corresponding data item is stored; if there is no existing data
  • the entry create a new data entry and save it to ensure that the attribute information of the preset face image information saved in each data entry is the same.
  • the at least one image is filtered in the collected video stream based on each face image appearing in the video stream, including:
  • Face recognition is performed on all intermediate images based on a convolutional neural network, and at least one image having a face image is obtained based on the result of face recognition.
  • the decomposition of the collected video stream can be performed in many ways, and the present embodiment is not limited, and the displayed image is optimized for the obtained decomposition image, and an intermediate image with better display effect is obtained, and the intermediate image is face-formed based on the convolutional neural network. Recognition, obtaining an image with a face image, and deleting other useless images without face images by filtering, thereby improving a reliable image basis for late face recognition.
  • the method further includes: performing face recognition based on a convolutional neural network to obtain preset face recognition information, by preset face recognition information in at least one image. Face image quality was evaluated.
  • the face recognition information is obtained by the face quality evaluation algorithm, and the face recognition information may include a face yaw angle, a pitch angle, and/or a face size, that is, a combination of a face yaw angle, a pitch, and a face size.
  • the face recognition information may include a face yaw angle, a pitch angle, and/or a face size, that is, a combination of a face yaw angle, a pitch, and a face size.
  • the first image higher than the preset score value ensures that the images saved in the database have a face image with high recognition degree through the quality screening, which reduces the occupancy rate of useless information and improves the transmission efficiency;
  • the images are sorted based on the quality of the face image, further avoiding multiple clear images of the same person being uploaded multiple times, while avoiding the leakage of relatively unclear face images.
  • the at least one image is subjected to quality screening to obtain at least one first image, including:
  • the face image quality in the at least one image is filtered based on the preset face recognition information corresponding to the at least one image, and the image whose face image quality reaches a preset threshold is saved as the first image.
  • Face images can quickly achieve face matching during the retrieval process.
  • the method for searching for matching preset face image information from a database based on the face information to be retrieved includes:
  • the matched preset face image information is obtained in the matched data entry
  • the data item is set in the above embodiment, in the embodiment, by first retrieving the data item that the attribute information matches, the preset face image information of the drawing is further searched for in the matching data item, and the retrieval method can be used. Effectively improve retrieval efficiency, and avoid direct matching of a large number of meaningless work generated by preset face image information.
  • the at least one image is subjected to quality screening to obtain at least one first image, including:
  • At least one image is subjected to automatic exposure, automatic white balance, and 3D noise reduction processing to obtain at least one first image whose display effect is optimized.
  • the acquired image is input into an image signal processing module (ISP, Image Signal Processing), and the ISP image signal processing module automatically implements automatic exposure, automatic white balance, and 3D noise reduction processing, and can also be increased according to user requirements.
  • ISP image signal processing module
  • the operation of performing localization and/or extraction of regions of interest, etc., is performed to obtain a first image of high definition, low noise, wide dynamic range, low distortion, and low distortion in order to recognize a human face in the image.
  • the power supply can be powered by DC power supply or Power Over Ethernet (PoE), wherein the DC power supply is prioritized over the power supply of the Ethernet.
  • PoE Power Over Ethernet
  • the foregoing program may be stored in a computer readable storage medium, and the program is executed when executed.
  • the foregoing storage medium includes: a read-only memory (ROM), a random access memory (RAM), a magnetic disk, or an optical disk, and the like, which can store program code. .
  • the flow of the face image retrieval method is:
  • the photographing device collects a video stream, and selects at least one image on which the face image is displayed from the video stream; 2. the photographing device determines, from the at least one image, at least one first image that the face image quality reaches a set threshold. 3. The photographing device obtains at least one preset face image information by processing the at least one first image through the convolutional neural network; 4. the photographing device acquires at least one related information of the at least one first image; 5. the photographing device is based on The at least one related information establishes at least one attribute information corresponding to the at least one first image; 6.
  • the photographing device stores the preset face image information of the same attribute information into a data item in the database to be based on the at least one attribute
  • the information stores at least one preset face image information in a database; 7.
  • the camera reads the image to be retrieved from the front end memory, the camera performs convolution calculation on the image to be retrieved by the current convolution layer to obtain a feature.
  • Figure 8 The camera stores the feature map in the front-end memory as the next convolutional layer. Retrieving the image for convolution calculation of the feature map by the next convolutional layer; 9.
  • the photographing device determines the final output feature map as the face information to be retrieved;
  • the photographing device matches the face information to be retrieved with at least one preset face image in the database; 11.
  • the photographing device outputs a preset face image that matches the face information to be retrieved, and completes the process of face image retrieval.
  • FIG. 3 is a schematic structural view of an embodiment of a photographing apparatus of the present disclosure.
  • the apparatus of this embodiment can be configured to implement the various method embodiments described above. As shown in FIG. 3, the apparatus of this embodiment includes:
  • the processor 34 is configured to configure corresponding convolution calculation configuration information for the convolutional neural network.
  • the convolutional neural network includes at least one convolution layer, and the convolution calculation configuration information includes a data bit width value corresponding to each convolutional layer in the convolutional neural network.
  • the convolution calculation portion 35 is configured to obtain the face information to be retrieved corresponding to the image to be retrieved through the convolutional neural network.
  • At least one face area is included in the image to be retrieved.
  • the searching part 36 is configured to search for matching preset face image information from the database based on the face information to be retrieved, and the database stores at least one preset face image information; and outputs a preset face that matches the face information to be retrieved. Image information.
  • a face image retrieval device based on the face image retrieval method provided by the above embodiment of the present disclosure, obtains the face information to be retrieved corresponding to the image to be retrieved by using a convolutional neural network;
  • the product neural network calculates the configuration information by the convolution corresponding to the processor configuration; since the convolutional neural network sets the convolution calculation configuration information, the bit width of the image input into each convolutional layer in the convolutional neural network is convolved
  • the calculation of face recognition based on convolutional neural network is reduced, the processing efficiency of convolutional layer is improved, and the input image to be retrieved can quickly and accurately obtain the face information to be retrieved, which solves the calculation accuracy of fixed-point operation.
  • the problem of low and affecting the accuracy of the calculation result improves the accuracy of the convolutional neural network; searches for matching preset face image information from the database based on the face information to be retrieved; and outputs the preset person whose face information to be retrieved matches Face image information; real-time people are realized by retrieving matching preset face image information in a database set by the camera Search results and improve the facial image retrieval efficiency.
  • the convolution calculation portion 35 includes: a configurable read controller and an image processing portion;
  • the configurable read controller is configured to read the image to be retrieved from the front end memory according to the convolution calculation configuration information, and the bit width of the image to be retrieved is equal to the data bit width value;
  • the image processing portion is configured to perform convolution calculation on the image to be retrieved by the convolutional neural network to obtain face information to be retrieved.
  • the bit width of the data input into the convolution layer conforms to the bit width required by the convolution layer, thereby realizing the dynamic configuration of the data of the input convolution layer.
  • Each convolution layer can perform calculation without processing the input data, solves the problem that the calculation accuracy of the fixed point operation is low and affects the accuracy of the calculation result, and improves the operation precision of the convolutional neural network.
  • the convolution calculation configuration information further includes a convolution kernel size corresponding to each convolutional layer in the convolutional neural network, or a storage address of the image to be retrieved;
  • the storage address of the image to be retrieved is used to read the image to be retrieved in the front end memory according to the storage address.
  • the image processing portion includes: a layer calculation portion and an iteration portion;
  • the layer calculation part is configured to perform convolution calculation on the image to be retrieved by the current convolution layer to obtain a feature map;
  • the current convolution layer is a convolution layer in each convolution layer in the convolutional neural network;
  • the iterative part is configured to respond to the presence of the next convolutional layer, the following convolutional layer as the current convolutional layer, with the feature map as the image to be retrieved, and iteratively performing the convolution according to the current convolutional layer configuration in the convolutional neural network Calculating the configuration information, reading the image to be retrieved from the front-end memory; performing convolution calculation on the image to be retrieved by the current convolution layer to obtain a feature map until the next convolution layer does not exist;
  • the feature map is output to obtain the face information to be retrieved.
  • the apparatus further includes: a configurable write back controller
  • the configurable writeback controller is configured to write the feature map to the front end memory.
  • the convolution calculation configuration information further includes an offset address
  • the processor is configured to be configured as a storage location of input data corresponding to a next convolution layer according to a storage address and an offset address of the input data corresponding to the current convolution layer;
  • the configurable writeback controller is configured to write the feature map to a storage address of the input data corresponding to the next convolutional layer in the front end memory.
  • FIG. 4 is a schematic structural view of another embodiment of the photographing apparatus of the present disclosure.
  • the apparatus of this embodiment includes: an acquisition screening part, a quality screening part, and a storage part;
  • the collection screening section 41 is configured to collect a video stream, and filter at least one image in the collected video stream based on each face image appearing in the video stream.
  • the image includes a face image, and each face image corresponds to at least one image.
  • the quality screening portion 42 is configured to perform quality screening on the at least one image to obtain at least one first image, wherein the at least one first image is an image in which the face image quality reaches a set threshold, and each of the at least one first image An image includes a face image.
  • the storage portion 43 is configured to store at least one preset face image information corresponding to the at least one first image into a database.
  • the processor 34 is configured to configure corresponding convolution calculation configuration information for the convolutional neural network.
  • the convolutional neural network includes at least one convolution layer, and the convolution calculation configuration information includes a data bit width value corresponding to each convolutional layer in the convolutional neural network.
  • the convolution calculation portion 35 is configured to obtain the face information to be retrieved corresponding to the image to be retrieved through the convolutional neural network.
  • At least one face area is included in the image to be retrieved.
  • the searching unit 36 is configured to search for matching preset face image information from the database based on the face information to be retrieved, and store at least one preset face image information in the database; and output a preset face that matches the face information to be retrieved. Image information.
  • the image in addition to matching the received image to be retrieved with the first image stored in the database, the image may be retrieved in real time, and the image to be retrieved is first received and the face information to be retrieved is obtained, and there is no corresponding in the database.
  • the front-end video stream collecting device is used to process the newly collected video stream to obtain clear and recognizable preset face image information.
  • the image filtered based on the acquired video stream further includes a background image for identifying a location where the face image in the image appears.
  • the convolution calculation portion 35 is configured to obtain corresponding at least one preset face image information by processing the at least one first image by convolutional neural network; based on at least one pre- The face image information is obtained to obtain corresponding at least one attribute information, and the at least one preset face image information is classified and stored based on the at least one attribute information; each preset face image information includes at least one attribute information.
  • the apparatus further includes: an information receiving portion and a classification portion;
  • the information receiving portion is configured to receive a first image whose input image quality exceeds a set threshold and related information corresponding to the first image;
  • the classifying portion is configured to establish at least one attribute information corresponding to the at least one first image based on the at least one related information, and store at least one preset face image information corresponding to the at least one first image based on the at least one attribute information Into the database.
  • the classification section is configured to store all preset face image information having the same attribute information in the at least one preset face image information in a data entry in a database. Indexing data entries based on attribute information;
  • the photographing apparatus of this embodiment further includes: a time sorting part
  • the time sorting part is configured to acquire an acquisition time value of each image, and store the data items in a database according to the collection time value.
  • the storage portion 43 includes: a lookup portion and an attribute storage portion;
  • the searching part is configured to search, in the database, whether a corresponding data item exists according to the attribute information corresponding to the first image;
  • the attribute storage part is configured to store the preset face image information in the corresponding data item when the data item exists in the database, and the data item corresponding to the attribute information does not exist in the database, and the attribute information is attribute information. Create a new data entry and save the preset face image information into the newly created data entry.
  • the collection portion 41 includes: a decomposition portion and a recognition screening portion;
  • the decomposing portion is configured to decompose the collected video stream into at least one disassembled image, and optimize at least one disassembled image to obtain an intermediate image that optimizes an image display effect;
  • the identification screening portion is configured to perform face recognition on all intermediate images based on a convolutional neural network, and obtain at least one image having a face image based on the result of the face recognition.
  • the decomposition of the collected video stream can be performed in many ways, and the present embodiment is not limited, and the displayed image is optimized for the obtained decomposition image, and an intermediate image with better display effect is obtained, and the intermediate image is face-formed based on the convolutional neural network. Recognition, obtaining an image with a face image, and deleting other useless images without face images by filtering, thereby improving a reliable image basis for late face recognition.
  • the collection screening portion 41 further includes: an evaluation sub-portion;
  • the evaluation sub-portion is configured to perform face recognition based on the convolutional neural network to obtain preset face recognition information, and the face image quality in the at least one image is evaluated by preset face recognition information.
  • the quality screening portion 42 is configured to filter the face image quality in the at least one image based on the preset face recognition information corresponding to the at least one image, and save the face image.
  • An image whose quality reaches a preset threshold is taken as at least one first image.
  • the retrieval portion 36 includes: an attribute lookup subsection and a data matching subsection;
  • the attribute search sub-portion is configured to obtain attribute information corresponding to the image to be retrieved based on the face information to be retrieved, and search for a data item in the database according to the attribute information;
  • the data matching sub-portion is configured to obtain matching preset face image information in the matched data entry when there is a data entry that matches the attribute information; when there is no data entry that matches the attribute information, the feedback is not Match result information.
  • the quality screening portion 42 is configured to perform at least one first image optimized for display effects by performing at least one image subjected to automatic exposure, automatic white balance, and 3D noise reduction processing.
  • the database includes a blacklist sub-library and a whitelist sub-library, and the blacklist sub-library includes at least one preset facial image information, and the whitelist The sub-library includes at least one preset face image information;
  • the photographing apparatus of this embodiment further includes: a feedback portion
  • the feedback part is configured to: when the matched preset face image information belongs to the blacklist sub-library, the warning information is fed back; when the matched preset face image information belongs to the whitelist sub-library, the normal information is fed back.
  • the photographing device of the embodiment can act as an electronic policeman to help the police search for criminals.
  • the criminal face information is deployed to the front-end capture machine (photographing device) through the network, and the shooting device monitors for 24 hours. Once the search matches the preset face image information in the blacklist database, the feedback unit will feedback the warning information to realize immediate notification.
  • FIG. 5 is a schematic structural view of an example of the above embodiments of the photographing apparatus of the present disclosure. As shown in FIG. 5, the apparatus of this embodiment includes:
  • An image acquisition module (equivalent to the acquisition screening portion 41 of the present disclosure) is configured to acquire a video stream and filter at least one image in the captured video stream based on each face image appearing in the video stream.
  • the ISP processing module (corresponding to the quality screening portion 42 of the present disclosure) is configured to perform quality screening on all images to obtain a first image in which at least one face image quality reaches a set threshold.
  • the storage module (corresponding to the database of the present disclosure) is configured to store preset face image information corresponding to the first image.
  • the FPGA SoC module includes hardware monitoring (corresponding to the convolution calculation portion 35 of the present disclosure) and central processing (corresponding to the processor 34 of the present disclosure), and the hardware monitoring realizes obtaining the face to be retrieved corresponding to the image to be retrieved through the convolutional neural network.
  • Information; central processing is used to configure the corresponding convolution calculation configuration information for the convolutional neural network.
  • the hardware monitoring and central processing are integrated on a single crystal silicon by the FPGA SoC module, so that the communication between the two is not limited by the bandwidth, and the configuration and convolution operation are realized by one module, real-time face recognition is realized.
  • the communication module (corresponding to the feedback part of the present disclosure) can send the obtained matching preset face image information through the communication module, and can also issue corresponding information according to the preset face image information belonging to the white list or the blacklist. Information to the default client.
  • the embodiment may further include: a power supply system module, in order to realize independent operation of the camera, a power supply system module is provided, and the power supply system module supplies power to all the modules.
  • the flow of the face image retrieval method is:
  • the face capture system uses the image acquisition module to transmit the acquired image data to the ISP processing unit at the back end; 2.
  • the ISP processing unit performs automatic exposure, automatic white balance, 3D denoising, partial exposure on the collected image data, The process of interest area and the like; 3.
  • the face image retrieval system transmits the image data processed by the ISP processing unit to the FPGA SoC system module; 4.
  • the hardware detection of the FPGA SoC system module performs the calculation of the convolutional neural network, and completes the face detection.
  • the work of the FPGA SoC system module performs quality screening and sorting on the detected faces, and the central processing manages and retrieves the results after the detection. 6.
  • the central processing of the FPGA SoC system module searches for matching preset face image information from the storage module based on the detected face information, wherein the storage module is used to save the system startup file and the local face database file, thereby being broken. In the case of the network, the offline registration, identification and saving functions of the personnel are implemented. 7.
  • the communication module transmits the preset face image information to the back end, and simultaneously receives the command sent by the back end, and the central processing completes the response work.
  • an electronic device provided with any of the above embodiments of the present photographing apparatus.
  • a computer storage medium for storing computer readable instructions that, when executed, perform the operations of any of the above embodiments of the disclosed face image retrieval method.
  • the embodiment further provides an electronic device, which may be, for example, a mobile terminal, a personal computer (PC), a tablet computer, a server, or the like.
  • computer system 600 includes one or more processors, communications And the like, the one or more processors such as: one or more central processing units (CPUs) 601, and/or one or more image processors (GPUs) 613, etc., the processors may be stored in a read only memory Various suitable actions and processes are performed by executable instructions in (ROM) 602 or from executable instructions loaded into random access memory (RAM) 603 from storage portion 608.
  • ROM read only memory
  • RAM random access memory
  • the communication portion 612 can include, but is not limited to, a network card, which can include, but is not limited to, an IB (Infiniband) network card, and the processor can communicate with the read only memory 602 and/or the random access memory 630 to execute executable instructions over the bus 604.
  • the operation is performed by the communication unit 612 and communicates with other target devices via the communication unit 612, so as to complete the operation corresponding to any method provided by the embodiment of the present application, for example, obtaining a to-be-retrieved face corresponding to the image to be retrieved by using a convolutional neural network.
  • the convolutional neural network calculates configuration information corresponding to the convolution corresponding to the processor configuration, the convolutional neural network includes at least one convolution layer, and the convolution calculation configuration information includes data bit widths corresponding to the convolutional layers in the convolutional neural network a value; the image to be retrieved includes at least one face region; the matched face image information is searched from the database based on the face information to be retrieved; at least one preset face image information is stored in the database; and the face to be retrieved is output Preset face image information that matches the information.
  • RAM 603 various programs and data required for the operation of the device can be stored.
  • the CPU 601, the ROM 602, and the RAM 603 are connected to each other through a bus 604.
  • ROM 602 is an optional module.
  • the RAM 603 stores executable instructions or writes executable instructions to the ROM 602 at runtime, the executable instructions causing the processor 601 to perform operations corresponding to the above-described communication methods.
  • An input/output (I/O, Input/Output) interface 605 is also coupled to bus 604.
  • the communication unit 612 may be integrated or may be provided with a plurality of sub-modules (e.g., a plurality of IB network cards) and on the bus link.
  • the following components are connected to the I/O interface 605: an input portion 606 including a keyboard, a mouse, etc.; an output portion 607 including, for example, a cathode ray tube (CRT), a liquid crystal display (LCD), and the like, and a storage portion 608 including a hard disk or the like. And a communication portion 609 including a network interface card such as a LAN card, a modem, or the like. The communication section 609 performs communication processing via a network such as the Internet.
  • Driver 610 is also coupled to I/O interface 605 as needed.
  • a removable medium 611 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory or the like, is mounted on the drive 610 as needed so that a computer program read therefrom is installed into the storage portion 608 as needed.
  • FIG. 7 is only an optional implementation manner.
  • the number and types of components in FIG. 7 may be selected, deleted, added, or replaced according to actual needs;
  • Functional components can also be implemented in separate settings or integrated settings, such as GPU and CPU detachable settings or GPU can be integrated on the CPU, the communication can be separated, or integrated on the CPU or GPU, etc. Wait.
  • an embodiment of the present disclosure includes a computer program product comprising a computer program tangibly embodied on a machine readable medium, the computer program comprising program code for executing the method illustrated in the flowchart, the program code comprising Executing the instruction corresponding to the method step provided by the embodiment of the present application, for example, obtaining the face information to be retrieved corresponding to the image to be retrieved by using a convolutional neural network; convolutional neural network is configured by convolution calculation corresponding to the processor configuration, convolution The neural network includes at least one convolution layer, and the convolution calculation configuration information includes a data bit width value corresponding to each convolution layer in the convolutional neural network; the image to be retrieved includes at least one face region; based on the face information to be retrieved Searching for matching preset face image information in the database; storing at least one preset face image information in the database; and
  • the computer program can be downloaded and installed from the network via communication portion 609, and/or installed from removable media 611.
  • the computer program is executed by the central processing unit (CPU) 601, the above-described functions defined in the method of the present application are performed.
  • the methods and apparatus, apparatus of the present disclosure may be implemented in a number of ways.
  • the methods, apparatus, and apparatus of the present disclosure may be implemented in software, hardware, firmware, or any combination of software, hardware, or firmware.
  • the above-described sequence of steps for the method is for illustrative purposes only, and the steps of the method of the present disclosure are not limited to the order specifically described above unless otherwise specifically stated.
  • the present embodiment may also be a program recorded in a recording medium, the programs including machine readable instructions for implementing the method according to the present disclosure.
  • the present disclosure also covers a recording medium storing a program for executing the method according to the present disclosure.
  • the convolutional neural network calculates the configuration information by the convolution corresponding to the processor configuration; since the convolutional neural network sets the convolution calculation configuration information, input to the convolution
  • the bit width of the images in each convolutional layer in the neural network corresponds to the convolutional layer, which reduces the computational complexity of face recognition based on convolutional neural networks, improves the processing efficiency of the convolutional layer, and inputs the search to be retrieved.
  • the image can quickly and accurately obtain the face information to be retrieved, solves the problem that the calculation accuracy of the fixed point operation is low and affects the accuracy of the calculation result, improves the operation precision of the convolutional neural network, and searches for matching based on the face information to be retrieved from the database.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Evolutionary Computation (AREA)
  • Library & Information Science (AREA)
  • Artificial Intelligence (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Human Computer Interaction (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)
  • Image Processing (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Processing Or Creating Images (AREA)

Abstract

一种人脸图像检索方法和系统、拍摄装置、计算机存储介质,其中,方法包括:通过卷积神经网络获得待检索图像对应的待检索人脸信息(104);卷积神经网络经过处理器配置对应的卷积计算配置信息,卷积神经网络包括至少一个卷积层,卷积计算配置信息包括卷积神经网络中的各卷积层对应的数据位宽值;待检索图像中包括至少一个人脸区域;基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息(105);数据库中保存有至少一个预设人脸图像信息;输出待检索人脸信息匹配的预设人脸图像信息(106)。所述方法通过在拍摄装置中检索获得匹配的预设人脸图像信息,实现实时人脸检索的效果,提高了检索效率。

Description

人脸图像检索方法和系统、拍摄装置、计算机存储介质
相关申请的交叉引用
本申请基于申请号为201710774389.9、申请日为2017年08月31日,申请名称为“人脸图像检索方法和系统、拍摄装置、计算机存储介质”的中国专利申请提出,并要求该中国专利申请的优先权,该中国专利申请的全部内容在此以引入方式结合在本申请中。
技术领域
本公开涉及计算机视觉技术,尤其是一种人脸图像检索方法和系统、拍摄装置、计算机存储介质。
背景技术
近年来,随着安防体系的日渐壮大、监控点的不断扩张和高清监控设备的普及,监控获得的图像视频信息呈现爆炸式的增长。传统视频监控系统的存储和检索等面临着巨大的挑战,如何快速、高效地从海量图像视频中提取出有用的信息至关重要。为此,人们把人脸识别技术引入了视频监控系统中,人脸识别技术很大程度上依赖于卷积神经网络的支持,而卷积神经网络需要庞大的计算量,从而导致人脸图像检索的效率低。
发明内容
本实施例提供了一种人脸图像检索方法和系统、拍摄装置、计算机存储介质,能够减少基于卷积神经网络进行人脸识别的计算量,从而提高人脸图像检索的效率。
本实施例提供的一种人脸图像检索方法,包括:
通过卷积神经网络获得待检索图像对应的待检索人脸信息;所述卷积神经网络经过处理器配置对应的卷积计算配置信息,所述卷积神经网络包括至少一个卷积层,所述卷积计算配置信息包括所述卷积神经网络中的各卷积层对应的数据位宽值;所述待检索图像中包括至少一个人脸区域;
基于所述待检索人脸信息从数据库中查找匹配的预设人脸图像信息;所述数据库中保存有至少一个预设人脸图像信息;
输出所述待检索人脸信息匹配的预设人脸图像信息。
根据本实施例的另一个方面,提供的一种拍摄装置,包括:
卷积计算部分,配置为通过卷积神经网络获得待检索图像对应的待检索人脸信息;所述卷积神经网络配置有卷积计算配置信息,所述卷积神经网络包括至少一个卷积层,所述卷积计算配置信息包括所述卷积神经网络中的各卷积层对应的数据位宽值;所述待检索图像中包括至少一个人脸区域;
处理器,配置为所述卷积神经网络配置对应的卷积计算配置信息;
检索部分,配置为基于所述待检索人脸信息从数据库中查找匹配的预设人脸图像信息;所述数据库中保存有至少一个预设人脸图像信息;输出所述待检索人脸信息匹配的预设人脸图像信息。
根据本实施例的另一个方面,提供的一种人脸图像检索系统,设置有如上所述的拍摄装置。
根据本实施例的另一个方面,提供的一种计算机存储介质,用于存储计算机可读取的指令,所述指令被执行时执行如上所述人脸图像检索方法的操作。
基于本公开上述实施例提供的一种人脸图像检索方法和系统、拍摄装置、计算机存储介质,通过卷积神经网络获得待检索图像对应的待检索人脸信息;卷积神经网络经过处理器配置对应的卷积计算配置信息;由于卷积神经网络设置了卷积计算配置信息,输入到卷积神经网络中各卷积层中的图像的位宽都与卷积层相对应,减少了基于卷积神经网络进行人脸识别的计算量,提高了卷积层的处理效率,并输入的待检索图像可以快速准确的得到待检索人脸信息,解决了定点运算计算精度低及影响计算结果准确度的问题,提高了卷积神经网络的运算精度;基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息;输出待检索人脸信息匹配的预设人脸图像信息;通过在拍摄装置设置的数据库中检索匹配的预设人脸图像信息,实现了实时人脸检索的效果,提高了人脸图像检索的效率。
下面通过附图和实施例,对本公开的技术方案做进一步的详细描述。
附图说明
图1为本公开人脸图像检索方法一个实施例的流程图;
图2为本公开人脸图像检索方法另一个实施例的流程图;
图3为本公开人脸图像检索装置一个实施例的结构示意图;
图4为本公开人脸图像检索装置另一个实施例的结构示意图;
图5为本公开拍摄装置上述各实施例的一个示例的结构示意图;
图6为本公开拍摄装置上述各实施例的一个示例的结构示意图;
图7为适于用来实现本实施例的一个终端设备或服务器的电子设备600的结构示意图。
具体实施方式
现在将参照附图来详细描述本公开的各种示例性实施例。应注意到:除非另外具体说明,否则在这些实施例中阐述的部件和步骤的相对布置、数字表达式和数值不限制本公开的范围。
同时,应当明白,为了便于描述,附图中所示出的各个部分的尺寸并不是按照实际的比例关系绘制的。
以下对至少一个示例性实施例的描述实际上仅仅是说明性的,决不作为对本公开及其应用或使用的任何限制。
对于相关领域普通技术人员已知的技术、方法和设备可能不作详细讨论,但在适当情况下,所述技术、方法和设备应当被视为说明书的一部分。
应注意到:相似的标号和字母在下面的附图中表示类似项,因此,一旦某一项在一个附图中被定义,则在随后的附图中不需要对其进行进一步讨论。
本实施例可以应用于计算机系统/服务器,其可与众多其它通用或专用计算系统环境或配置一起操作。适于与计算机系统/服务器一起使用的众所周知的计算系统、环境和/或配置的例子包括但不限于:个人计算机系统、服务器计算机系统、瘦客户机、厚客户机、手持或膝上设备、基于微处理器的系统、机顶盒、可编程消费电子产品、网络个人电脑、小型计算机系统、大型计算机系统和包括上述任何系统的分布式云计算技术环境,等等。
计算机系统/服务器可以在由计算机系统执行的计算机系统可执行指令(诸如程序模 块)的一般语境下描述。通常,程序模块可以包括例程、程序、目标程序、组件、逻辑、数据结构等等,它们执行特定的任务或者实现特定的抽象数据类型。计算机系统/服务器可以在分布式云计算环境中实施,分布式云计算环境中,任务是由通过通信网络链接的远程处理设备执行的。在分布式云计算环境中,程序模块可以位于包括存储设备的本地或远程计算系统存储介质上。
在实现本公开的过程中,本发明人通过研究发现,现有的视频监控系统中,人脸检测识别都是在后端服务器实现的,前端只负责图像数据的采集编码及传输。这种前后端结合操作的模式,需要较大网络带宽;同时,由于传输的视频流数据中大部分都是无用的信息,大大降低了后端服务器对有效数据提取的效率;此外,由于图像在传输之前经过了有损编码,后端服务器拿到的数据并不是原始的图像数据,一定程度上会导致漏检或错检。
现有技术提供了一种前端化的人脸抓拍机,该人脸抓拍机虽提高了人脸识别的准确率,但只是将原本部署于后端服务器上的中央处理单元、识别及存储模块部署在前端的视频监控设备上,由于监控图像、视频的数量巨大,意味着该监控机的功耗及成本较高,并且很难达到实时检测人脸的效果,因此,人脸抓拍机在实际应用场景中并不具有应用价值。
图1为本公开人脸图像检索方法一个实施例的流程图。如图1所示,其应用于拍摄装置,该实施例方法包括:
步骤104,通过卷积神经网络获得待检索图像对应的待检索人脸信息。
其中,卷积神经网络经过处理器配置对应的卷积计算配置信息,卷积神经网络包括至少一个卷积层,卷积计算配置信息包括卷积神经网络中的各卷积层对应的数据位宽值;待检索图像中包括至少一个人脸区域,待检索图像可以是根据检索指令获取的,检索指令可以是由服务器端或云端下发的,或直接接收外部输入的待检索图像和检索指令。要对图像进行检索识别,通常是基于对应的人脸信息进行识别,在此步骤中,通过卷积神经网络处理获得对应的待检索人脸信息,将后续检索中的图像检索转换为人脸信息检索,使检索更快捷,无需后续转换,其中卷积神经网络可以是预先训练好的。
步骤105,基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息。
其中,数据库中保存有至少一个预设人脸图像信息。通过设置数据库在拍摄装置中实现了人脸的检测与识别工作,大大降低了网络带宽的要求,提高了数据传输效率。人脸检测和检索的过程可以通过现场可编程门阵列系统级芯片(FPGA SoC,Field-Programmable Gate Array System on Chip)完成的,FPGA SoC将FPGA逻辑块和CPU集成到了一片单晶硅上,FPGA逻辑块和CPU之间的通信通过先进可扩展接口总线(AXI,Advanced eXtensible Interface)进行,具有很大的物理带宽,克服了现有方案中FPGA逻辑块与CPU分离设置需要很大的带宽实现通信的弊端,并且FPGA SoC特有的性能功耗比优势,使整机功耗不超过4W,更适合应用于各种不同的恶劣环境中;FPGA可同时进行数据并行和任务并行计算,将一个任务拆分成多级流水作业(同时处理),将每一帧的检测时长缩减到了40ms以内,大大提升了实时检测效果;数据并行:是指各卷积层输入的图像数据、网络层之间传递的数据,可根据需要建立不同的通道同时处理;任务并行:是指神经网络中的卷积、池化及全连接可并行执行。而传统的嵌入式片上系统只将CPU和各功能模块集成到一片单晶硅上,而不设置FPGA逻辑块的情况下,是很难实现人脸的实时识别的,需要借助后端服务器或者更强大的处理器才能实现。
步骤106,输出待检索人脸信息匹配的预设人脸图像信息。
基于本公开上述实施例提供的一种人脸图像检索方法,通过卷积神经网络获得待检 索图像对应的待检索人脸信息;卷积神经网络经过处理器配置对应的卷积计算配置信息;由于卷积神经网络设置了卷积计算配置信息,输入到卷积神经网络中各卷积层中的图像的位宽都与卷积层相对应,减少了基于卷积神经网络进行人脸识别的计算量,提高了卷积层的处理效率,并输入的待检索图像可以快速准确的得到待检索人脸信息,解决了定点运算计算精度低及影响计算结果准确度的问题,提高了卷积神经网络的运算精度;基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息;输出待检索人脸信息匹配的预设人脸图像信息;通过在拍摄装置设置的数据库中检索匹配的预设人脸图像信息,实现了实时人脸检索的效果,提高了人脸图像检索的效率。
本公开人脸图像检索方法的另一个实施例,在上述实施例的基础上,操作104包括:
按照卷积计算配置信息从前端存储器读取待检索图像,待检索图像的位宽等于数据位宽值;
通过卷积神经网络对待检索图像进行卷积计算,得到待检索人脸信息。
本实施例中,通过读取设定位宽的待检索图像,使输入到卷积层中的数据的位宽符合该卷积层要求的位宽,实现了对输入卷积层的数据的动态配置,各卷积层无需对输入的数据进行处理即可执行计算,解决了定点运算计算精度低及影响计算结果准确度的问题,提高了卷积神经网络的运算精度。
在本公开人脸图像检索方法上述各实施例的一个示例中,卷积计算配置信息还包括卷积神经网络中的各卷积层对应的卷积核大小、或待检索图像的存储地址。
其中,待检索图像的存储地址用于为拍摄装置按照存储地址在前端存储器中读取待检索图像;通过对卷积层的输入数据位宽和权值数据位宽进行配置,在读取输入数据(待检索图像数据)和权值数据时,分别按照设置的输入数据位宽和权值数据位宽进行读取,可通过多次读取将完整的输入数据输入到卷积层中,由于每次读取的数据位宽都与该卷积层相对应,因此提高了卷积层的计算效率,同时保证了输入数据的完整,不会因为输入数据位宽的设置使输入数据缺失而导致结果不准确。
在本公开人脸图像检索方法上述各实施例的一个示例中,通过卷积神经网络对待检索图像进行卷积计算,得到待检索人脸信息,包括:
通过当前卷积层对待检索图像进行卷积计算,得到特征图;当前卷积层为卷积神经网络中各卷积层中的一个卷积层;
响应于存在下一个卷积层,以下一个卷积层作为当前卷积层,以特征图作为待检索图像,迭代执行根据为卷积神经网络中当前卷积层配置的卷积计算配置信息从前端存储器读取待检索图像;通过当前卷积层对待检索图像进行卷积计算,得到特征图,直到不存在下一个卷积层;
输出特征图获得待检索人脸信息。
本实施例为了实现对卷积神经网络中每一个卷积层都进行加速,通过迭代的方法,将下一层卷积层作为当前卷积层,将上一个卷积层计算得到的计算结果数据作为下一个卷积层的输入数据,同样通过按照设置的输入数据位宽和权值位宽读取输入数据和权值数据,此时的权值数据是配置的对应该卷积层的权值数据,得到特征图后,将特征图存入前端存储器,以备下一个卷积层读取,直到完成当前卷积层的卷积计算之后,没有下一个卷积层,此时输出当前得到的特征图为卷积神经网络的获得的待检索人脸信息。
在本公开人脸图像检索方法上述各实施例的一个示例中,在迭代读取配置信息并执行卷积计算的过程中,在得到特征图之后,将特征图写入前端存储器。
将对应每层卷积层的特征图存入前端存储器,使下一个卷积层在读取待检索图像时,可以直接从前端存储器中获取,方便数据的获取和数据位宽的设置。
在本公开人脸图像检索方法上述各实施例的一个示例中,卷积计算配置信息还包括 偏移地址;
为卷积神经网络配置卷积计算配置信息,还包括:
根据输入数据的存储地址和偏移地址配置为下一个卷积层对应的输入数据的存储地址,输入数据为当前卷积层接收的待检索图像数据;
将特征图写入前端存储器,包括:将特征图写入前端存储器中的下一个卷积层对应的输入数据的存储地址。
通过输入数据的存储地址叠加偏移地址即可得到下一个卷积层对应的输入数据的存储地址,由于卷积神经网络中,上一个卷积层的输出数据就是下一个卷积层的输入数据,因此,将上一个卷积层的输出数据处理为下一个卷积层的输入数据并存储到确定的下一个卷积层对应的输入数据的存储地址中,在下一个卷积层开始卷积计算时,只需到对应的存储地址进行读取即可。
图2为本公开人脸图像检索方法又一个实施例的流程图。如图2所示,该实施例方法包括:
步骤201,采集视频流,基于每个出现在视频流中的人脸图像在采集的视频流中筛选得到至少一个图像。
其中,图像中包括可识别的人脸图像,每个人脸图像对应至少一个图像;采集视频流可以是通过设置的监控设备(如:摄像头等)进行实时采集,由于前端存储空间有限,在前端的数据库中保存视频流是不可行的,因此,本步骤将视频流分解为一帧一帧的图像,而针对基于视频流获得的图像存在大量重复图像、无意义图像(无人脸图像)和模糊的现象,对视频流获得的所有图像进行筛选,得到其中包括可识别的人脸图像,并且,针对每个人脸图像采集至少一个图像,以保证不会漏掉在视频中出现的人脸,在后续识别中可以更准确的找到。
步骤202,对至少一个图像进行质量筛选,得到至少一个第一图像。
其中,至少一个第一图像为人脸图像质量达到设定阈值的图像;为了使上一步获得的图像中的人脸便于识别,采用了专用的图像信号处理(ISP,Image Signal Processing)处理芯片对原始图像进行了优化处理,这里的优化处理可以包括自动曝光、自动白平衡、3D降噪等操作,同时,还可以根据用户的需求选择局部曝光、提取感兴趣区域等操作,执行优化处理的目的是为了得到高清晰、低噪声、宽动态范围、低畸变和低失真的第一图像,以便识别图像中的人脸,这里的设定阈值可以根据具体情况进行调整。
步骤203,将对应至少一个第一图像的至少一个预设人脸图像信息存入数据库。
基于通过质量筛选得到的至少一个第一图像获得至少一个人脸图像信息,为了在前端实现实时人脸检索,需要在前端建立数据库并将质量优化后的至少一个第一图像和对应至少一个第一图像的至少一个人脸图像信息存入该数据库,人脸图像信息是为了便于检索,在检索过程中无需再进行人脸识别;并且,由于前端存储空间有限,数据库中的图像及人脸图像信息定期更新或实时更新,以保证数据库中有充足的空间存储新采集的信息。
步骤104,通过卷积神经网络获得待检索图像对应的待检索人脸信息。
其中,卷积神经网络经过处理器配置对应的卷积计算配置信息,卷积神经网络包括至少一个卷积层,卷积计算配置信息包括卷积神经网络中的各卷积层对应的数据位宽值;待检索图像中包括至少一个人脸区域。
步骤105,基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息。
其中,数据库中保存有至少一个预设人脸图像信息。
步骤106,输出待检索人脸信息匹配的预设人脸图像信息。
在本实施例中,除了将接收到的待检索图像与数据库中存储的第一图像进行匹配, 还可以实时检索,先接收待检索图像并获得待检索人脸信息,在数据库中不存在对应的人脸图像信息时,利用前端视频流采集装置,对新采集的视频流进行处理得到清晰的可识别的预设人脸图像信息,即按照下列步骤顺序执行:
步骤104,步骤201,步骤202,步骤203、步骤105和步骤106;按照此顺序执行时可以仅将检索到的图像及预设人脸图像信息存入数据库,也可以将全部采集筛选得到的图像和预设人脸图像信息存入数据库;并且,为了后期结合服务器和云端的检索,需要将数据库中的存储的图像和信息上传到服务器中,服务器接收多个前端提供的图像和信息,可以在检索时得到更多信息。
在本公开人脸图像检索方法上述各实施例的一个示例中,至少一个第一图像还包括背景图像,背景图像用于标识至少一个第一图像中的人脸图像出现的地点。
基于至少一个第一图像获得至少一个人脸图像信息的同时,还可以基于至少一个第一图像获得除了至少一个人脸图像信息的其他信息,这些其他信息构成背景信息,这里的背景信息就能提供该第一图像中的人脸图像出现的地点,进而获得这个人的行动轨迹等信息;具有背景图像的第一图像对识别人脸图像对应人员出现的场合有辅助作用。
在本公开人脸图像检索方法上述各实施例的一个示例中,在将对应至少一个第一图像的至少一个预设人脸图像信息存入数据库之前,还包括:
对至少一个第一图像通过卷积神经网络处理获得对应的至少一个预设人脸图像信息。
对于获取至少一个预设人脸图像信息与基于待处理图像获取待处理图像信息通过相同的卷积神经网络的计算获得,本实施例中的卷积神经网络是通过FPGA逻辑部分实现。
基于至少一个预设人脸图像信息得到对应的至少一种属性信息,基于至少一种属性信息将至少一个预设人脸图像信息分类存储;每个预设人脸图像信息包括至少一种属性信息。
本实施例中涉及的属性信息可以包括:性别、年龄、表情、种族、是否戴眼镜和是否戴口罩等等,基于这些属性进行分类的情况包括:基于性别:男/女;年龄:少年/青年/中年/老年;表情:高兴/悲伤/愤怒/平静等;种族:黄种人/黑种人/白种人/棕种人;戴眼镜:是/否;戴口罩:是/否;如果结合上述所有属性对图像进行分类,可以基于每个属性获得一个标签,此时每个图像对应多个标签,如一个图像中包括一个中年黄种人女性,戴眼镜、未戴口罩,且表情平静,此时该人脸图像对应的属性标签包括:女性、中年、平静、黄种人、戴眼镜和未戴口罩;在分类过程中,可以将具有相同标签的第一图像和人脸图像信息存储在一个位置。
在本公开人脸图像检索方法上述各实施例的一个示例中,基于至少一个预设人脸图像信息得到对应的至少一种属性信息,基于至少一种属性信息将至少一个预设人脸图像信息分类存储,包括:
接收输入的图像质量超出设定阈值的至少一个第一图像和对应至少一个第一图像的至少一个相关信息;
基于至少一个相关信息为至少一个第一图像建立对应的至少一种属性信息,基于至少一种属性信息将对应至少一个第一图像的至少一个预设人脸图像信息存入数据库。
本实施例提供的实质是一个注册的过程,在注册过程中,输入一张第一图像和对应用于识别该第一图像的相关信息,通常这里的相关信息指用户的姓名信息等,并对该图像进行识别获得对应的人脸图像信息,将预设人脸图像信息和输入的第一图像存入数据库后,在后期检索图像时,如检索得到该第一图像就可直接获得该人脸图像对应的相关信息。
在本公开人脸图像检索方法上述各实施例的一个示例中,基于属性信息将对第一图像的预设人脸图像信息存入数据库,包括:
将至少一个预设人脸图像信息中具有相同属性信息的所有预设人脸图像信息存入一个数据条目中,在数据库中基于属性信息为数据条目建立索引。
该实施例方法还包括:
获取采集每个图像的采集时间值,将数据条目按照采集时间值,顺序存入数据库。
在数据库中将所有预设人脸图像信息进行区分存储,将所有属性信息都相同的预设人脸图像信息存入一个数据条目中,以便于检索获得相应的预设人脸图像信息;在筛选图像时,获取在视频流中获得该图像的采集时间值,获得时间值后,对于数据条目的存储可以按照其中更新保存的预设人脸图像信息对应的采集时间值进行排序,排序后便于在检索过程中先获得最新的符合条件的图像;根据采集时间值给出某人出现在当前场合的次数以及时间,避免将多次出现在同一场景的同一个人的图像混淆,在帮助警察搜寻犯罪分子的犯罪证据时,具有辅助作用。
在本公开人脸图像检索方法上述各实施例的一个示例中,将对应第一图像的预设人脸图像信息存入数据库,包括:
基于第一图像对应的属性信息在数据库中查找是否存在对应的数据条目;
当数据库中存在于属性信息对应的数据条目,将预设人脸图像信息存入对应的数据条目中;
当数据库中不存在属性信息对应的数据条目,为属性信息新建一个数据条目,将预设人脸图像信息存入该新建的数据条目中。
本实施例提供了针对一个第一图像的存储的示例,在存储前根据获得的属性信息在数据库中查找数据条目,如果存在已有数据条目,存入对应的数据条目;如果不存在已有数据条目,建立新的数据条目并存入,以保证每个数据条目中保存的预设人脸图像信息的属性信息都相同。
本公开人脸图像检索方法的还一个实施例中,在上述各实施例的基础上,基于每个出现在视频流中的人脸图像在采集的视频流中筛选得到至少一个图像,包括:
将采集的视频流分解为至少一张分解图像,对至少一张分解图像进行优化,得到优化图像显示效果的中间图像;
对所有中间图像基于卷积神经网络进行人脸识别,基于人脸识别的结果筛选获得具有人脸图像的至少一个图像。
对于采集的视频流的分解可以通过很多方式,本实施例不做限制,对得到的分解图像优化其显示效果,得到显示效果较好的中间图像,在基于卷积神经网络对中间图像进行人脸识别,获得具有人脸图像的图像,通过筛选将其他没有人脸图像的无用图像删除,为后期人脸识别提高可靠的图像基础。
在本公开人脸图像检索方法上述各实施例的一个示例中,还包括:基于卷积神经网络进行人脸识别得到预设人脸识别信息,通过预设人脸识别信息对至少一个图像中的人脸图像质量进行评价。
本实施例中通过人脸质量评估算法获得人脸识别信息,人脸识别信息可以包括人脸偏航角度、俯仰角度和/或人脸大小,即结合人脸偏航角度、俯仰及人脸大小来综合评价人脸图像质量,对抓拍出来的人脸进行打分,将得到的分数与预先设定的分数值进行比较,将分数低于预先设定的分数值的第一图像删除,仅保留分数高于预先设定的分数值的第一图像,通过质量筛选保证了数据库中保存的图像都具有识别度较高的人脸图像,降低了无用信息的占用率,提高了传输效率;还可以对图像基于人脸图像质量进行排序,进一步避免同一个人的多个清晰图像上传多次,同时避免了漏传相对不清晰的人脸图像。
在本公开人脸图像检索方法上述各实施例的一个示例中,对至少一个图像进行质量筛选,得到至少一个第一图像,包括:
基于至少一个图像对应的预设人脸识别信息对至少一个图像中的人脸图像质量进行筛选,保存人脸图像质量达到预设阈值的图像作为第一图像。
基于人脸图像质量对至少一个图像进行筛选,仅保留其中人脸图像质量达到预设阈值的图像,丢弃其他人脸图像质量未达到预设阈值的图像,保证得到的所有第一图像中的人脸图像在检索过程中可以快速实现人脸匹配。
在本公开人脸图像检索方法上述各实施例的一个示例中,基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息,包括:
基于待检索人脸信息获得对应待检索图像的属性信息,基于属性信息在数据库中查找是否具有符合的数据条目;
如有存在与属性信息符合的数据条目,在符合的数据条目中获得所述匹配的预设人脸图像信息;
如果不存在与属性信息符合的数据条目,反馈无匹配结果信息。
由于上述实施例中设置了数据条目,在本实施例中通过首先检索属性信息符合的数据条目,进一步在符合的数据条目中查找附图的预设人脸图像信息,通过这种检索方式,可有效提高检索效率,而避免直接匹配预设人脸图像信息产生的大量无意义工作。
在本公开人脸图像检索方法上述各实施例的一个示例中,对至少一个图像进行质量筛选,得到至少一个第一图像,包括:
对至少一个图像经过自动曝光、自动白平衡和3D降噪处理,得到显示效果经过优化的至少一个第一图像。
本实施例中,将采集的图像输入(ISP,Image Signal Processing)图像信号处理模块中,ISP图像信号处理模块自动实现自动曝光、自动白平衡和3D降噪处理,还可以根据用户的需求选择增加局部曝光和/或提取感兴趣区域等操作,执行优化处理的目的是为了得到高清晰、低噪声、宽动态范围、低畸变和低失真的第一图像,以便识别图像中的人脸。
本实施例中提出了可以通过直流供电或有源以太网(POE,Power Over Ethernet)802.3af以太网供电,其中直流供电优先于以太网供电。
本领域普通技术人员可以理解:实现上述方法实施例的全部或部分步骤可以通过程序指令相关的硬件来完成,前述的程序可以存储于一计算机可读取存储介质中,该程序在执行时,执行包括上述方法实施例的步骤;而前述的存储介质包括:只读存储器(ROM,Read-Only Memory)、随机存储器(RAM,Random Access Memory)、磁碟或者光盘等各种可以存储程序代码的介质。
示例性的,人脸图像检索方法的流程为:
1、拍摄装置采集视频流,从视频流中筛选出其上显示有人脸图像的至少一个图像;2、拍摄装置从至少一个图像中确定出人脸图像质量达到设定阈值的至少一个第一图像;3、拍摄装置对至少一个第一图像通过卷积神经网络处理获得对应的至少一个预设人脸图像信息;4、拍摄装置获取至少一个第一图像的至少一个相关信息;5、拍摄装置基于至少一个相关信息为至少一个第一图像建立对应的至少一种属性信息;6、拍摄装置将相同属性信息的预设人脸图像信息存储至数据库中的一个数据条目中,以基于至少一种属性信息将至少一个预设人脸图像信息粗存储至数据库中;7、当拍摄装置从前端存储器中读取到待检索图像时,拍摄装置通过当前卷积层对待检索图像进行卷积计算,得到特征图;8、拍摄装置将特征图存储至前端存储器中,作为下一个卷积层的待检索图像,以供下一个卷积层对特征图进行卷积计算;9、如此迭代执行,直至不存在下一个卷积 层时,拍摄装置将最终输出特征图确定为待检索人脸信息;10、拍摄装置将待检索人脸信息和数据库中的至少一个预设人脸图像进行匹配;11、拍摄装置输出与待检索人脸信息匹配的预设人脸图像,完成人脸图像检索的过程。
图3为本公开拍摄装置一个实施例的结构示意图。该实施例的装置可配置为实现本公开上述各方法实施例。如图3所示,该实施例的装置包括:
处理器34,配置为为卷积神经网络配置对应的卷积计算配置信息。
其中,卷积神经网络包括至少一个卷积层,卷积计算配置信息包括卷积神经网络中的各卷积层对应的数据位宽值。
卷积计算部分35,配置为通过卷积神经网络获得待检索图像对应的待检索人脸信息。
其中,待检索图像中包括至少一个人脸区域。
检索部分36,配置为基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息,数据库中保存有至少一个预设人脸图像信息;输出待检索人脸信息匹配的预设人脸图像信息。
基于本公开上述实施例提供的一种人脸图像检索装置,基于本公开上述实施例提供的一种人脸图像检索方法,通过卷积神经网络获得待检索图像对应的待检索人脸信息;卷积神经网络经过处理器配置对应的卷积计算配置信息;由于卷积神经网络设置了卷积计算配置信息,输入到卷积神经网络中各卷积层中的图像的位宽都与卷积层相对应,减少了基于卷积神经网络进行人脸识别的计算量,提高了卷积层的处理效率,并输入的待检索图像可以快速准确的得到待检索人脸信息,解决了定点运算计算精度低及影响计算结果准确度的问题,提高了卷积神经网络的运算精度;基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息;输出待检索人脸信息匹配的预设人脸图像信息;通过在拍摄装置设置的数据库中检索匹配的预设人脸图像信息,实现了实时人脸检索的效果,提高了人脸图像检索的效率。
本公开拍摄装置的另一个实施例,在上述实施例的基础上,卷积计算部分35,包括:可配置读取控制器和图像处理部分;
可配置读取控制器,配置为按照卷积计算配置信息从前端存储器读取待检索图像,待检索图像的位宽等于所述数据位宽值;
图像处理部分,配置为通过卷积神经网络对待检索图像进行卷积计算,得到待检索人脸信息。
本实施例中,通过读取设定位宽的待检索图像使输入到卷积层中的数据的位宽符合该卷积层要求的位宽,实现了对输入卷积层的数据的动态配置,各卷积层无需对输入的数据进行处理即可执行计算,解决了定点运算计算精度低及影响计算结果准确度的问题,提高了卷积神经网络的运算精度。
在本公开拍摄装置上述各实施例的一个示例中,卷积计算配置信息还包括卷积神经网络中的各卷积层对应的卷积核大小、或所述待检索图像的存储地址;其中,待检索图像的存储地址用于按照存储地址在前端存储器中读取待检索图像。
在本公开拍摄装置上述各实施例的一个示例中,图像处理部分,包括:层计算部分和迭代部分;
层计算部分,配置为通过当前卷积层对待检索图像进行卷积计算,得到特征图;当前卷积层为卷积神经网络中各卷积层中的一个卷积层;
迭代部分,配置为响应于存在下一个卷积层,以下一个卷积层作为当前卷积层,以特征图作为待检索图像,迭代执行根据为卷积神经网络中当前卷积层配置的卷积计算配置信息,从前端存储器读取待检索图像;通过当前卷积层对待检索图像进行卷积计算,得到特征图,直到不存在下一个卷积层;
输出特征图获得待检索人脸信息。
在本公开拍摄装置上述各实施例的一个示例中,所述装置还包括:可配置写回控制器;
所述可配置写回控制器,配置为将特征图写入前端存储器。
在本公开拍摄装置上述各实施例的一个示例中,卷积计算配置信息还包括偏移地址;
所述处理器,配置为根据当前卷积层对应的输入数据的存储地址和偏移地址配置为下一个卷积层对应的输入数据的存储地;
所述可配置写回控制器,配置为将特征图写入前端存储器中的下一个卷积层对应的输入数据的存储地址。
图4为本公开拍摄装置另一个实施例的结构示意图。如图4所示,本实施例装置包括:采集筛选部分、质量筛选部分和存储部分;
所述采集筛选部分41,配置为采集视频流,基于每个出现在视频流中的人脸图像在采集的视频流中筛选得到至少一个图像。
其中,图像中包括人脸图像,每个人脸图像对应至少一个图像。
所述质量筛选部分42,配置为对至少一个图像进行质量筛选,得到至少一个第一图像,至少一个第一图像为人脸图像质量达到设定阈值的图像,至少一个第一图像中的每一个第一图像包括一个人脸图像。
所述存储部分43,配置为将对应至少一个第一图像的至少一个预设人脸图像信息存入数据库。
所述处理器34,配置为为卷积神经网络配置对应的卷积计算配置信息。
其中,卷积神经网络包括至少一个卷积层,卷积计算配置信息包括卷积神经网络中的各卷积层对应的数据位宽值。
所述卷积计算部分35,配置为通过卷积神经网络获得待检索图像对应的待检索人脸信息。
其中,待检索图像中包括至少一个人脸区域。
检索单元36,配置为基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息,数据库中保存有至少一个预设人脸图像信息;输出待检索人脸信息匹配的预设人脸图像信息。
在本实施例中,除了将接收到的待检索图像与数据库中存储的第一图像进行匹配,还可以实时检索,先接收待检索图像并获得待检索人脸信息,在数据库中不存在对应的人脸图像信息时,利用前端视频流采集装置,对新采集的视频流进行处理得到清晰的可识别的预设人脸图像信息。
在本公开拍摄装置上述各实施例的一个示例中,基于采集的视频流筛选得到的图像还包括背景图像,背景图像用于标识图像中的人脸图像出现的地点。
在本公开拍摄装置上述各实施例的一个示例中,卷积计算部分35,配置为对至少一个第一图像通过卷积神经网络处理获得对应的至少一个预设人脸图像信息;基于至少一个预设人脸图像信息得到对应的至少一种属性信息,基于至少一种属性信息将至少一个预设人脸图像信息分类存储;每个预设人脸图像信息包括至少一种属性信息。
在本公开拍摄装置上述各实施例的一个示例中,还包括:信息接收部分和分类部分;
所述信息接收部分,配置为接收输入的图像质量超出设定阈值的第一图像和对应第一图像的相关信息;
所述分类部分,配置为基于至少一个相关信息为至少一个第一图像建立对应的至少一种属性信息,基于至少一种属性信息将对应至少一个第一图像的至少一个预设人脸图像信息存入数据库。
在本公开拍摄装置上述各实施例的一个示例中,分类部分,配置为将至少一个预设人脸图像信息中具有相同属性信息的所有预设人脸图像信息存入一个数据条目中,在数据库中基于属性信息为数据条目建立索引;
本实施例拍摄装置还包括:时间排序部分;
所述时间排序部分,配置为获取采集每个图像的采集时间值,将数据条目按照采集时间值,顺序存入数据库。
在本公开拍摄装置上述各实施例的一个示例中,存储部分43,包括:查找部分和属性存储部分;
所述查找部分,配置为基于第一图像对应的属性信息在数据库中查找是否存在对应的数据条目;
所述属性存储部分,配置为当数据库中存在于属性信息对应的数据条目,将预设人脸图像信息存入对应的数据条目中;当数据库中不存在属性信息对应的数据条目,为属性信息新建一个数据条目,将预设人脸图像信息存入新建的数据条目中。
本公开拍摄装置的还一个实施例中,在上述各实施例的基础上,采集筛选部分41,包括:分解部分和识别筛选部分;
所述分解部分,配置为将采集的视频流分解为至少一张分解图像,对至少一张分解图像进行优化,得到优化图像显示效果的中间图像;
所述识别筛选部分,配置为对所有中间图像基于卷积神经网络进行人脸识别,基于人脸识别的结果筛选获得具有人脸图像的至少一个图像。
对于采集的视频流的分解可以通过很多方式,本实施例不做限制,对得到的分解图像优化其显示效果,得到显示效果较好的中间图像,在基于卷积神经网络对中间图像进行人脸识别,获得具有人脸图像的图像,通过筛选将其他没有人脸图像的无用图像删除,为后期人脸识别提高可靠的图像基础。
在本公开拍摄装置上述各实施例的一个示例中,采集筛选部分41,还包括:评价子部分;
所述评价子部分,配置为基于卷积神经网络进行人脸识别得到预设人脸识别信息,通过预设人脸识别信息对至少一个图像中的人脸图像质量进行评价。
在本公开拍摄装置上述各实施例的一个示例中,质量筛选部分42,配置为基于至少一个图像对应的预设人脸识别信息对至少一个图像中的人脸图像质量进行筛选,保存人脸图像质量达到预设阈值的图像作为至少一个第一图像。
在本公开拍摄装置上述各实施例的一个示例中,检索部分36,包括:属性查找子部分和数据匹配子部分;
所述属性查找子部分,配置为基于待检索人脸信息获得对应待检索图像的属性信息,基于属性信息在数据库中查找是否具有符合的数据条目;
所述数据匹配子部分,配置为当存在与属性信息符合的数据条目时,在符合的数据条目中获得匹配的预设人脸图像信息;当不存在与属性信息符合的数据条目时,反馈无匹配结果信息。
在本公开拍摄装置上述各实施例的一个示例中,质量筛选部分42,配置为对至少一个图像经过自动曝光、自动白平衡和3D降噪处理,得到显示效果经过优化的至少一个第一图像。
本公开拍摄装置的再一个实施例中,在上述各实施例的基础上,数据库中包括黑名单子库和白名单子库,黑名单子库中包括至少一个预设人脸图像信息,白名单子库中包括至少一个预设人脸图像信息;
本实施例拍摄装置还包括:反馈部分;
所述反馈部分,配置为当匹配的预设人脸图像信息属于黑名单子库时,反馈警示信息;当匹配的预设人脸图像信息属于白名单子库时,反馈正常信息。
在具体实施时,本实施例拍摄装置可以充当电子警察的角色,帮助警察搜寻犯罪份子。将罪犯人脸信息通过网络部署到前端抓拍机(拍摄装置)中,拍摄装置24小时监控,一旦检索匹配到黑名单数据库中的预设人脸图像信息,反馈单元将反馈警示信息,实现立刻通知警方,突破了人工监控的弊端,实时监测并能够及时通知。
图5为本公开拍摄装置上述各实施例的一个示例的结构示意图。如图5所示,本实施例装置包括:
图像采集模块(相当于本公开的采集筛选部分41),配置为采集视频流,并基于每个出现在视频流中的人脸图像在采集的视频流中筛选得到至少一个图像。
ISP处理模块(相当于本公开的质量筛选部分42),配置为对所有图像进行质量筛选,得到至少一个人脸图像质量达到设定阈值的第一图像。
存储模块(相当于本公开的数据库),配置为存储对应第一图像的预设人脸图像信息。
FPGA SoC模块包括硬件监测(相当于本公开的卷积计算部分35)和中央处理(相当于本公开的处理器34),硬件监测实现通过卷积神经网络获得待检索图像对应的待检索人脸信息;中央处理用于为卷积神经网络配置对应的卷积计算配置信息。本实施例中通过FPGA SoC模块将硬件监测和中央处理集成在一片单晶硅上,使二者通信不受带宽限制,并通过一个模块实现了配置和卷积运算,实现了实时人脸识别。
通信模块(相当于本公开的反馈部分),通过通信模块可将得到的匹配的预设人脸图像信息发送出去,同时还可以根据该预设人脸图像信息属于白名单或黑名单发出相应的信息到预设的客户端中。
本实施例还可以包括:供电系统模块,为了实现拍摄装置独立运行,提供了供电系统模块,该供电系统模块为上述所有模块供电。
示例性的,如图6所示,人脸图像检索方法的流程为:
1、人脸抓拍系统利用图像采集模块将采集到的图像数据传输至后端的ISP处理单元;2、ISP处理单元对采集到的图像数据进行自动曝光、自动白平衡、3D去噪、局部曝光、感兴趣区域等过程;3、人脸图像检索系统将ISP处理单元处理后的图像数据传输至FPGA SoC系统模块;4、FPGA SoC系统模块的硬件检测进行卷积神经网络的计算,完成人脸检测的工作;5、FPGA SoC系统模块的中央处理对检测出的人脸进行质量筛选和排序的工作,同时中央处理对检测之后的结果进行管理、检索等工作。6、FPGA SoC系统模块的中央处理基于检测出的人脸信息从存储模块中查找匹配的预设人脸图像信息,其中,存储模块用来保存系统启动文件和本地人脸库文件,从而在断网的情况下实现人员的离线注册、识别与保存功能。7、通讯模块将预设人脸图像信息传输至后端,同时接收后端发下来的命令,由中央处理完成响应工作。
根据本实施例的另一个方面,提供的一种电子设备,设置有本公开拍摄装置上述任一个实施例。
根据本实施例的另一个方面,提供的一种计算机存储介质,用于存储计算机可读取的指令,所述指令被执行时执行本公开人脸图像检索方法上述任一个实施例的操作。
本实施例还提供了一种电子设备,例如可以是移动终端、个人计算机(PC)、平板电脑、服务器等。下面参考图7,其示出了适于用来实现本申请实施例的一个终端设备或服务器的电子设备600的结构示意图:如图7所示,计算机系统600包括一个或多个处理器、通信部等,所述一个或多个处理器例如:一个或多个中央处理单元(CPU)601,和/或一个或多个图像处理器(GPU)613等,处理器可以根据存储在只读存储器(ROM) 602中的可执行指令或者从存储部分608加载到随机访问存储器(RAM)603中的可执行指令而执行各种适当的动作和处理。通信部612可包括但不限于网卡,所述网卡可包括但不限于IB(Infiniband)网卡,处理器可与只读存储器602和/或随机访问存储器630中通信以执行可执行指令,通过总线604与通信部612相连、并经通信部612与其他目标设备通信,从而完成本申请实施例提供的任一项方法对应的操作,例如,通过卷积神经网络获得待检索图像对应的待检索人脸信息;卷积神经网络经过处理器配置对应的卷积计算配置信息,卷积神经网络包括至少一个卷积层,卷积计算配置信息包括卷积神经网络中的各卷积层对应的数据位宽值;待检索图像中包括至少一个人脸区域;基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息;数据库中保存有至少一个预设人脸图像信息;输出待检索人脸信息匹配的预设人脸图像信息。
此外,在RAM 603中,还可存储有装置操作所需的各种程序和数据。CPU601、ROM602以及RAM603通过总线604彼此相连。在有RAM603的情况下,ROM602为可选模块。RAM603存储可执行指令,或在运行时向ROM602中写入可执行指令,可执行指令使处理器601执行上述通信方法对应的操作。输入/输出(I/O,Input/Output)接口605也连接至总线604。通信部612可以集成设置,也可以设置为具有多个子模块(例如多个IB网卡),并在总线链接上。
以下部件连接至I/O接口605:包括键盘、鼠标等的输入部分606;包括诸如阴极射线管(CRT)、液晶显示器(LCD)等以及扬声器等的输出部分607;包括硬盘等的存储部分608;以及包括诸如LAN卡、调制解调器等的网络接口卡的通信部分609。通信部分609经由诸如因特网的网络执行通信处理。驱动器610也根据需要连接至I/O接口605。可拆卸介质611,诸如磁盘、光盘、磁光盘、半导体存储器等等,根据需要安装在驱动器610上,以便于从其上读出的计算机程序根据需要被安装入存储部分608。
需要说明的,如图7所示的架构仅为一种可选实现方式,在实践过程中,可根据实际需要对上述图7的部件数量和类型进行选择、删减、增加或替换;在不同功能部件设置上,也可采用分离设置或集成设置等实现方式,例如GPU和CPU可分离设置或者可将GPU集成在CPU上,通信部可分离设置,也可集成设置在CPU或GPU上,等等。这些可替换的实施方式均落入本公开公开的保护范围。
特别地,根据本公开的实施例,上文参考流程图描述的过程可以被实现为计算机软件程序。例如,本公开的实施例包括一种计算机程序产品,其包括有形地包含在机器可读介质上的计算机程序,计算机程序包含用于执行流程图所示的方法的程序代码,程序代码可包括对应执行本申请实施例提供的方法步骤对应的指令,例如,通过卷积神经网络获得待检索图像对应的待检索人脸信息;卷积神经网络经过处理器配置对应的卷积计算配置信息,卷积神经网络包括至少一个卷积层,卷积计算配置信息包括卷积神经网络中的各卷积层对应的数据位宽值;待检索图像中包括至少一个人脸区域;基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息;数据库中保存有至少一个预设人脸图像信息;输出待检索人脸信息匹配的预设人脸图像信息。在这样的实施例中,该计算机程序可以通过通信部分609从网络上被下载和安装,和/或从可拆卸介质611被安装。在该计算机程序被中央处理单元(CPU)601执行时,执行本申请的方法中限定的上述功能。
可能以许多方式来实现本公开的方法和装置、设备。例如,可通过软件、硬件、固件或者软件、硬件、固件的任何组合来实现本公开的方法和装置、设备。用于方法的步骤的上述顺序仅是为了进行说明,本公开的方法的步骤不限于以上具体描述的顺序,除非以其它方式特别说明。此外,在一些实施例中,还可将本实施例为记录在记录介质中的程序,这些程序包括用于实现根据本公开的方法的机器可读指令。因而,本公开还覆 盖存储用于执行根据本公开的方法的程序的记录介质。
本公开的描述是为了示例和描述起见而给出的,而并不是无遗漏的或者将本公开限于所公开的形式。很多修改和变化对于本领域的普通技术人员而言是显然的。选择和描述实施例是为了更好说明本公开的原理和实际应用,并且使本领域的普通技术人员能够理解本公开从而设计适于特定用途的带有各种修改的各种实施例。
工业实用性
通过卷积神经网络获得待检索图像对应的待检索人脸信息;卷积神经网络经过处理器配置对应的卷积计算配置信息;由于卷积神经网络设置了卷积计算配置信息,输入到卷积神经网络中各卷积层中的图像的位宽都与卷积层相对应,减少了基于卷积神经网络进行人脸识别的计算量,提高了卷积层的处理效率,并输入的待检索图像可以快速准确的得到待检索人脸信息,解决了定点运算计算精度低及影响计算结果准确度的问题,提高了卷积神经网络的运算精度;基于待检索人脸信息从数据库中查找匹配的预设人脸图像信息;输出待检索人脸信息匹配的预设人脸图像信息;通过在拍摄装置设置的数据库中检索匹配的预设人脸图像信息,实现了实时人脸检索的效果,提高人脸图像检索的效率。

Claims (37)

  1. 一种人脸图像检索方法,应用于拍摄装置,包括:
    通过卷积神经网络获得待检索图像对应的待检索人脸信息;所述卷积神经网络经过处理器配置对应的卷积计算配置信息,所述卷积神经网络包括至少一个卷积层,所述卷积计算配置信息包括所述卷积神经网络中的各卷积层对应的数据位宽值;所述待检索图像中包括至少一个人脸区域;
    基于所述待检索人脸信息从数据库中查找匹配的预设人脸图像信息;所述数据库中保存有至少一个预设人脸图像信息;
    输出所述待检索人脸信息匹配的预设人脸图像信息。
  2. 根据权利要求1所述的方法,其中,所述通过卷积神经网络获得待检索图像对应的待检索人脸信息,包括:
    按照所述卷积计算配置信息从前端存储器读取所述待检索图像,所述待检索图像的位宽等于所述数据位宽值;
    通过所述卷积神经网络对所述待检索图像进行卷积计算,得到待检索人脸信息。
  3. 根据权利要求1或2所述的方法,其中,所述卷积计算配置信息还包括:所述卷积神经网络中的各卷积层对应的卷积核大小、或所述待检索图像的存储地址;其中,所述待检索图像的存储地址用于按照所述存储地址在所述前端存储器中读取所述待检索图像。
  4. 根据权利要求3所述的方法,其中,所述通过所述卷积神经网络对所述待检索图像进行卷积计算,得到待检索人脸信息,包括:
    通过当前卷积层对所述待检索图像进行卷积计算,得到特征图;所述当前卷积层为所述卷积神经网络中各卷积层中的一个卷积层;
    响应于存在下一个卷积层,以所述下一个卷积层作为当前卷积层,以所述特征图作为待检索图像,迭代执行根据为所述卷积神经网络中当前卷积层配置的所述卷积计算配置信息,从所述前端存储器读取待检索图像;通过当前卷积层对所述待检索图像进行卷积计算,得到特征图,直到不存在下一个卷积层;
    输出所述特征图获得所述待检索人脸信息。
  5. 根据权利要求4所述的方法,其中,所述得到特征图之后,所述方法还包括:
    将所述特征图写入所述前端存储器。
  6. 根据权利要求5所述的方法,其中,所述卷积计算配置信息还包括偏移地址;所述通过当前卷积层对所述待检索图像进行卷积计算,得到特征图之后,所述方法还包括:
    根据输入数据的存储地址和所述偏移地址配置为所述下一个卷积层对应的输入数据的存储地址,所述输入数据为所述当前卷积层接收的待检索图像数据;
    相应的,所述将所述特征图写入所述前端存储器,包括:
    将所述特征图写入所述前端存储器中的所述下一个卷积层对应的输入数据的存储地址。
  7. 根据权利要求1所述的方法,其中,所述基于所述待检索人脸信息从数据库中查找匹配的预设人脸图像信息之前,所述方法还包括:
    采集视频流,基于每个出现在所述视频流中的人脸图像在采集的所述视频流中筛选得到至少一个图像,其中,所述图像中包括人脸图像,每个所述人脸图像对应至少一个图像;
    对所述至少一个图像进行质量筛选,得到至少一个第一图像,所述至少一个第一图 像为人脸图像质量达到设定阈值的图像,所述至少一个第一图像中的每一个第一图像包括一个人脸图像;
    将对应所述至少一个第一图像的所述至少一个预设人脸图像信息存入所述数据库。
  8. 根据权利要求7所述的方法,其中,所述至少一个第一图像还包括背景图像,所述背景图像用于标识所述至少一个第一图像中的人脸图像出现的地点。
  9. 根据权利要求7或8所述的方法,其中,所述将对应所述至少一个第一图像的所述至少一个预设人脸图像信息存入所述数据库之前,所述方法还包括:
    对所述至少一个第一图像通过所述卷积神经网络处理获得对应的所述至少一个预设人脸图像信息;
    基于所述至少一个预设人脸图像信息得到对应的至少一种属性信息,基于所述至少一种属性信息将所述至少一个预设人脸图像信息分类存储;每个所述预设人脸图像信息包括至少一种属性信息。
  10. 根据权利要求9所述的方法,其中,所述基于所述至少一个预设人脸图像信息得到对应的至少一种属性信息,基于所述至少一种属性信息将所述至少一个预设人脸图像信息分类存储,包括:
    接收输入的图像质量超出设定阈值的至少一个第一图像和对应所述至少一个第一图像的至少一个相关信息;
    基于所述至少一个相关信息为所述至少一个第一图像建立对应的所述至少一种属性信息,基于所述至少一种属性信息将对应所述至少一个第一图像的所述至少一个预设人脸图像信息存入所述数据库。
  11. 根据权利要求10所述的方法,其中,基于所述至少一种属性信息将对应所述至少一个第一图像的所述至少一个预设人脸图像信息存入所述数据库,包括:
    将所述至少一个预设人脸图像信息中具有相同属性信息的所有预设人脸图像信息存入一个数据条目中,在所述数据库中基于属性信息为所述数据条目建立索引;
    所述方法还包括:
    获取采集每个图像的采集时间值,将所述数据条目按照所述采集时间值,顺序存入所述数据库。
  12. 根据权利要求11所述的方法,其中,将对应所述第一图像的预设人脸图像信息存入数据库,包括:
    基于所述第一图像对应的属性信息在所述数据库中查找是否存在对应的数据条目;
    当所述数据库中存在于所述属性信息对应的数据条目,将所述预设人脸图像信息存入所述对应的数据条目中;当所述数据库中不存在所述属性信息对应的数据条目,为所述属性信息新建一个数据条目,将所述预设人脸图像信息存入所述新建的数据条目中。
  13. 根据权利要求7所述的方法,其中,所述基于每个出现在所述视频流中的人脸图像在采集的所述视频流中筛选得到至少一个图像,包括:
    将采集的视频流分解为至少一张分解图像,对所述至少一张分解图像进行优化,得到优化图像显示效果的中间图像;
    对所有所述中间图像基于所述卷积神经网络进行人脸识别,基于所述人脸识别的结果筛选获得具有人脸图像的所述至少一个图像。
  14. 根据权利要求13所述的方法,其中,所述对所有所述中间图像基于所述卷积神经网络进行人脸识别,包括:
    基于所述卷积神经网络进行人脸识别得到预设人脸识别信息,通过预设人脸识别信息对所至少一个述图像中的人脸图像质量进行评价。
  15. 根据权利要求14所述的方法,其中,所述对所述至少一个图像进行质量筛选, 得到至少一个第一图像,包括:
    基于所述至少一个图像对应的预设人脸识别信息对所述至少一个图像中的人脸图像质量进行筛选,保存所述人脸图像质量达到预设阈值的图像作为所述至少一个第一图像。
  16. 根据权利要求11所述的方法,其中,所述基于所述待检索人脸信息从数据库中查找匹配的预设人脸图像信息,包括:
    基于所述待检索人脸信息获得对应所述待检索图像的属性信息,基于所述属性信息在所述数据库中查找是否具有符合的数据条目;
    当存在与所述属性信息符合的数据条目时,在所述符合的数据条目中获得所述匹配的预设人脸图像信息;
    当不存在与所述属性信息符合的数据条目时,反馈无匹配结果信息。
  17. 根据权利要求7所述的方法,其中,所述对所述至少一个图像进行质量筛选,得到至少一个第一图像,包括:
    对所述至少一个图像经过自动曝光、自动白平衡和3D降噪处理,得到显示效果经过优化的所述至少一个第一图像。
  18. 一种拍摄装置,包括:
    处理器,配置为为卷积神经网络配置对应的卷积计算配置信息;所述卷积神经网络包括至少一个卷积层,所述卷积计算配置信息包括所述卷积神经网络中的各卷积层对应的数据位宽值;
    卷积计算部分,配置为通过所述卷积神经网络获得待检索图像对应的待检索人脸信息;所述待检索图像中包括至少一个人脸区域;
    检索部分,配置为基于所述待检索人脸信息从数据库中查找匹配的预设人脸图像信息;所述数据库中保存有至少一个预设人脸图像信息;输出所述待检索人脸信息匹配的预设人脸图像信息。
  19. 根据权利要求18所述的装置,其中,卷积计算部分包括:可配置读取控制器和图像处理部分;
    所述可配置读取控制器,配置为按照所述卷积计算配置信息从前端存储器读取所述待检索图像,所述待检索图像的位宽等于所述数据位宽值;
    所述图像处理部分,配置为通过所述卷积神经网络对所述待检索图像进行卷积计算,得到待检索人脸信息。
  20. 根据权利要求18或19所述的装置,其中,所述卷积计算配置信息还包括所述卷积神经网络中的各卷积层对应的卷积核大小、或所述待检索图像的存储地址;其中,所述待检索图像的存储地址用于按照所述存储地址在所述前端存储器中读取所述待检索图像。
  21. 根据权利要求20所述的装置,其中,图像处理部分包括:层计算部分和迭代部分;
    所述层计算部分,配置为通过当前卷积层对所述待检索图像进行卷积计算,得到特征图;所述当前卷积层为所述卷积神经网络中各卷积层中的一个卷积层;
    所述迭代部分,配置为响应于存在下一个卷积层,以所述下一个卷积层作为当前卷积层,以所述特征图作为待检索图像,迭代执行根据为所述卷积神经网络中当前卷积层配置的所述卷积计算配置信息,从所述前端存储器读取待检索图像;通过当前卷积层对所述待检索图像进行卷积计算,得到特征图,直到不存在下一个卷积层;
    输出所述特征图获得所述待检索人脸信息。
  22. 根据权利要求21所述的装置,其中,所述装置还包括:可配置写回控制器;
    所述可配置写回控制器,配置为将所述特征图写入所述前端存储器。
  23. 根据权利要求22所述的装置,其中,所述卷积计算配置信息还包括偏移地址;
    所述处理器,配置为根据所述当前卷积层对应的输入数据的存储地址和所述偏移地址配置为所述下一个卷积层对应的输入数据的存储地址;
    所述可配置写回控制器,配置为将所述特征图写入所述前端存储器中的所述下一个卷积层对应的输入数据的存储地址。
  24. 根据权利要求23所述的装置,其中,所述装置还包括:采集筛选部分、质量筛选部分和存储部分;
    所述采集筛选部分,配置为采集视频流,基于每个出现在所述视频流中的人脸图像在采集的所述视频流中筛选得到至少一个图像,其中,所述图像中包括人脸图像,每个所述人脸图像对应至少一个图像;
    所述质量筛选部分,配置为对所述至少一个图像进行质量筛选,得到至少一个第一图像,所述至少一个第一图像为人脸图像质量达到设定阈值的所述图像,所述至少一个第一图像中的每一个第一图像包括一个人脸图像;
    所述存储部分,配置为将对应所述至少一个第一图像的所述至少一个预设人脸图像信息存入所述数据库。
  25. 根据权利要求24所述的装置,其中,所述至少一个第一图像还包括背景图像,所述背景图像配置为标识所述至少一个第一图像中的人脸图像出现的地点。
  26. 根据权利要求24或25所述的装置,其中,
    所述卷积计算部分,配置为对所述至少一个第一图像通过所述卷积神经网络处理获得对应的所述至少一个预设人脸图像信息;基于所述至少一个预设人脸图像信息得到对应的至少一种属性信息,基于所述至少一种属性信息将所述至少一个预设人脸图像信息分类存储;每个所述预设人脸图像信息包括至少一种属性信息。
  27. 根据权利要求26所述的装置,其中,所述装置还包括:信息接收部分和分类部分;
    所述信息接收部分,配置为接收输入的图像质量超出设定阈值的至少一个第一图像和对应所述至少一个第一图像的至少一个相关信息;
    所述分类部分,配置为基于所述至少一个相关信息为所述至少一个第一图像建立对应的所述至少一种属性信息,基于所述至少一种属性信息将对应所述至少一个第一图像的所述至少一个预设人脸图像信息存入所述数据库。
  28. 根据权利要求27所述的装置,其中,所述分类部分,配置为将所述至少一个预设人脸图像信息中具有相同属性信息的所有预设人脸图像信息存入一个数据条目中,在所述数据库中基于属性信息为所述数据条目建立索引;
    所述装置还包括:时间排序部分;
    所述时间排序部分,配置为获取采集每个图像的采集时间值,将所述数据条目按照所述采集时间值,顺序存入所述数据库。
  29. 根据权利要求28所述的装置,其中,所述存储部分包括:查找部分和属性存储部分;
    所述查找部分,配置为基于所述第一图像对应的属性信息在所述数据库中查找是否存在对应的数据条目;
    所述属性存储部分,配置为当所述数据库中存在于所述属性信息对应的数据条目,将所述预设人脸图像信息存入所述对应的数据条目中;当所述数据库中不存在所述属性信息对应的数据条目,为所述属性信息新建一个数据条目,将所述预设人脸图像信息存入所述新建的数据条目中。
  30. 根据权利要求24所述的装置,其中,所述采集筛选部分,包括:分解部分和识别筛选部分;
    所述分解部分,配置为将采集的视频流分解为至少一张分解图像,对所述至少一张分解图像进行优化,得到优化图像显示效果的中间图像;
    所述识别筛选部分,配置为对所有所述中间图像基于所述卷积神经网络进行人脸识别,基于所述人脸识别的结果筛选获得具有人脸图像的所述至少一个图像。
  31. 根据权利要求30所述的装置,其中,所述采集筛选部分还包括:评价子部分;
    所述评价子部分,配置为基于所述卷积神经网络进行人脸识别得到预设人脸识别信息,通过预设人脸识别信息对所述至少一个图像中的人脸图像质量进行评价。
  32. 根据权利要求31所述的装置,其中,所述质量筛选部分,配置为基于所述至少一个图像对应的预设人脸识别信息对所述至少一个图像中的人脸图像质量进行筛选,保存所述人脸图像质量达到预设阈值的图像作为所述至少一个第一图像。
  33. 根据权利要求28所述的装置,其中,所述检索部分,包括:属性查找子部分和数据匹配子部分;
    所述属性查找子部分,配置为基于所述待检索人脸信息获得对应所述待检索图像的属性信息,基于所述属性信息在所述数据库中查找是否具有符合的数据条目;
    所述数据匹配子部分,配置为当存在与所述属性信息符合的数据条目时,在所述符合的数据条目中获得所述匹配的预设人脸图像信息;当不存在与所述属性信息符合的数据条目时,反馈无匹配结果信息。
  34. 根据权利要求24所述的装置,其中,所述质量筛选部分,配置为对所述至少一个图像经过自动曝光、自动白平衡和3D降噪处理,得到显示效果经过优化的所述至少一个第一图像。
  35. 根据权利要求18、24、27、28、29或33任一所述的装置,其中,所述数据库中包括黑名单子库和白名单子库,所述黑名单子库中包括至少一个预设人脸图像信息,所述白名单子库中包括至少一个预设人脸图像信息;
    所述装置还包括:反馈部分;
    所述反馈部分,配置为当所述匹配的预设人脸图像信息属于所述黑名单子库时,反馈警示信息;当所述匹配的预设人脸图像信息属于所述白名单子库时,反馈正常信息。
  36. 一种人脸图像检索系统,设置有如权利要求18至35任意一项所述的拍摄装置。
  37. 一种计算机存储介质,用于存储计算机可读取的指令,所述指令被执行时执行权利要求1至17任意一项所述人脸图像检索方法的操作。
PCT/CN2018/102267 2017-08-31 2018-08-24 人脸图像检索方法和系统、拍摄装置、计算机存储介质 WO2019042230A1 (zh)

Priority Applications (3)

Application Number Priority Date Filing Date Title
JP2019571526A JP7038744B2 (ja) 2017-08-31 2018-08-24 顔画像検索方法およびシステム、撮影装置、ならびにコンピュータ記憶媒体
SG11202000075QA SG11202000075QA (en) 2017-08-31 2018-08-24 Face image retrieval methods and systems, photographing apparatuses, and computer storage media
US16/732,225 US11182594B2 (en) 2017-08-31 2019-12-31 Face image retrieval methods and systems, photographing apparatuses, and computer storage media

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710774389.9A CN108228696B (zh) 2017-08-31 2017-08-31 人脸图像检索方法和系统、拍摄装置、计算机存储介质
CN201710774389.9 2017-08-31

Related Child Applications (1)

Application Number Title Priority Date Filing Date
US16/732,225 Continuation US11182594B2 (en) 2017-08-31 2019-12-31 Face image retrieval methods and systems, photographing apparatuses, and computer storage media

Publications (1)

Publication Number Publication Date
WO2019042230A1 true WO2019042230A1 (zh) 2019-03-07

Family

ID=62655298

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/102267 WO2019042230A1 (zh) 2017-08-31 2018-08-24 人脸图像检索方法和系统、拍摄装置、计算机存储介质

Country Status (5)

Country Link
US (1) US11182594B2 (zh)
JP (1) JP7038744B2 (zh)
CN (1) CN108228696B (zh)
SG (1) SG11202000075QA (zh)
WO (1) WO2019042230A1 (zh)

Families Citing this family (16)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108228696B (zh) * 2017-08-31 2021-03-23 深圳市商汤科技有限公司 人脸图像检索方法和系统、拍摄装置、计算机存储介质
US11501522B2 (en) * 2017-12-06 2022-11-15 Nec Corporation Image recognition model generating device, image recognition model generating method, and image recognition model generating program storing medium
CN109002789B (zh) * 2018-07-10 2021-06-18 银河水滴科技(北京)有限公司 一种应用于摄像头的人脸识别方法
CN110874817B (zh) * 2018-08-29 2022-02-01 上海商汤智能科技有限公司 图像拼接方法和装置、车载图像处理装置、设备、介质
CN110874632B (zh) * 2018-08-31 2024-05-03 嘉楠明芯(北京)科技有限公司 图像识别处理方法和装置
CN109614910B (zh) * 2018-12-04 2020-11-20 青岛小鸟看看科技有限公司 一种人脸识别方法和装置
KR20200081044A (ko) * 2018-12-27 2020-07-07 삼성전자주식회사 뉴럴 네트워크의 컨볼루션 연산을 처리하는 방법 및 장치
CN110363106A (zh) * 2019-06-25 2019-10-22 中国船舶重工集团公司第七一九研究所 一种人脸检测与匹配系统
CN110442742A (zh) * 2019-07-31 2019-11-12 深圳市商汤科技有限公司 检索图像的方法及装置、处理器、电子设备及存储介质
CN110941730B (zh) * 2019-11-29 2020-12-08 南京甄视智能科技有限公司 基于人脸特征数据偏移的检索方法与装置
CN111881813B (zh) * 2020-07-24 2021-02-19 深圳市卡联科技股份有限公司 人脸识别终端的数据存储方法及系统
CN112241684A (zh) * 2020-09-16 2021-01-19 四川天翼网络服务有限公司 一种人脸检索分布式计算方法及系统
JP7348150B2 (ja) * 2020-09-17 2023-09-20 ヤフー株式会社 学習装置、学習方法、及び学習プログラム
CN112632300A (zh) * 2020-09-29 2021-04-09 深圳市商汤科技有限公司 图像检索方法及装置、电子设备及存储介质
CN112989082B (zh) * 2021-05-20 2021-07-23 南京甄视智能科技有限公司 Cpu和gpu混合的自适应人脸搜索方法及系统
CN115456860B (zh) * 2022-11-09 2023-03-24 深圳市唯特视科技有限公司 基于fpga的图像增强方法、装置、头盔、设备及介质

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160275341A1 (en) * 2015-03-18 2016-09-22 Adobe Systems Incorporated Facial Expression Capture for Character Animation
CN106204948A (zh) * 2016-07-11 2016-12-07 商汤集团有限公司 储物柜管理方法及储物柜管理装置
CN106529517A (zh) * 2016-12-30 2017-03-22 北京旷视科技有限公司 图像处理方法和图像处理设备
CN106650691A (zh) * 2016-12-30 2017-05-10 北京旷视科技有限公司 图像处理方法和图像处理设备
CN106682650A (zh) * 2017-01-26 2017-05-17 北京中科神探科技有限公司 基于嵌入式深度学习技术的移动终端人脸识别方法和系统
CN106897695A (zh) * 2017-02-24 2017-06-27 上海斐讯数据通信技术有限公司 一种图像识别处理装置、系统及方法
CN108228696A (zh) * 2017-08-31 2018-06-29 深圳市商汤科技有限公司 人脸图像检索方法和系统、拍摄装置、计算机存储介质

Family Cites Families (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US6219640B1 (en) * 1999-08-06 2001-04-17 International Business Machines Corporation Methods and apparatus for audio-visual speaker recognition and utterance verification
US7302653B2 (en) * 2005-02-24 2007-11-27 International Business Machines Corporation Probability of fault function determination using critical defect size map
US7310788B2 (en) * 2005-02-24 2007-12-18 International Business Machines Corporation Sample probability of fault function determination using critical defect size map
JP6345520B2 (ja) * 2014-07-10 2018-06-20 国立研究開発法人産業技術総合研究所 画像検索装置と画像検索プログラムと画像検索方法
CN107004138A (zh) * 2014-12-17 2017-08-01 诺基亚技术有限公司 利用神经网络的对象检测
CN104765768B (zh) * 2015-03-09 2018-11-02 深圳云天励飞技术有限公司 海量人脸库的快速准确检索方法
CN105760933A (zh) * 2016-02-18 2016-07-13 清华大学 卷积神经网络的逐层变精度定点化方法及装置
CN205792930U (zh) 2016-06-03 2016-12-07 广东万峯信息科技有限公司 一种人脸抓拍机及应用该种人脸抓拍机的监控系统
TWI601424B (zh) * 2016-06-13 2017-10-01 晨星半導體股份有限公司 時間解交錯電路與方法
KR20180060149A (ko) * 2016-11-28 2018-06-07 삼성전자주식회사 컨볼루션 처리 장치 및 방법
US10402527B2 (en) * 2017-01-04 2019-09-03 Stmicroelectronics S.R.L. Reconfigurable interconnect
KR102642853B1 (ko) * 2017-01-05 2024-03-05 한국전자통신연구원 컨볼루션 회로, 그것을 포함하는 어플리케이션 프로세서 및 그것의 동작 방법
WO2018192500A1 (zh) * 2017-04-19 2018-10-25 上海寒武纪信息科技有限公司 处理装置和处理方法
US10474458B2 (en) * 2017-04-28 2019-11-12 Intel Corporation Instructions and logic to perform floating-point and integer operations for machine learning
US9916531B1 (en) * 2017-06-22 2018-03-13 Intel Corporation Accumulator constrained quantization of convolutional neural networks
CN111095294A (zh) * 2017-07-05 2020-05-01 深视有限公司 深度视觉处理器
WO2019167007A1 (en) * 2018-03-01 2019-09-06 Infotoo International Limited Methods and apparatus for determining authenticity of an information bearing device
US20190303757A1 (en) * 2018-03-29 2019-10-03 Mediatek Inc. Weight skipping deep learning accelerator
US12014273B2 (en) * 2018-12-12 2024-06-18 Kneron (Taiwan) Co., Ltd. Low precision and coarse-to-fine dynamic fixed-point quantization design in convolution neural network
US20200293865A1 (en) * 2019-03-14 2020-09-17 Gyrfalcon Technology Inc. Using identity layer in a cellular neural network architecture
US11475298B2 (en) * 2019-03-20 2022-10-18 Gyrfalcon Technology Inc. Using quantization in training an artificial intelligence model in a semiconductor solution
CN109961102B (zh) * 2019-03-30 2021-06-22 北京市商汤科技开发有限公司 图像处理方法、装置、电子设备及存储介质

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20160275341A1 (en) * 2015-03-18 2016-09-22 Adobe Systems Incorporated Facial Expression Capture for Character Animation
CN106204948A (zh) * 2016-07-11 2016-12-07 商汤集团有限公司 储物柜管理方法及储物柜管理装置
CN106529517A (zh) * 2016-12-30 2017-03-22 北京旷视科技有限公司 图像处理方法和图像处理设备
CN106650691A (zh) * 2016-12-30 2017-05-10 北京旷视科技有限公司 图像处理方法和图像处理设备
CN106682650A (zh) * 2017-01-26 2017-05-17 北京中科神探科技有限公司 基于嵌入式深度学习技术的移动终端人脸识别方法和系统
CN106897695A (zh) * 2017-02-24 2017-06-27 上海斐讯数据通信技术有限公司 一种图像识别处理装置、系统及方法
CN108228696A (zh) * 2017-08-31 2018-06-29 深圳市商汤科技有限公司 人脸图像检索方法和系统、拍摄装置、计算机存储介质

Also Published As

Publication number Publication date
JP7038744B2 (ja) 2022-03-18
CN108228696B (zh) 2021-03-23
JP2020524348A (ja) 2020-08-13
CN108228696A (zh) 2018-06-29
US20200151434A1 (en) 2020-05-14
US11182594B2 (en) 2021-11-23
SG11202000075QA (en) 2020-02-27

Similar Documents

Publication Publication Date Title
WO2019042230A1 (zh) 人脸图像检索方法和系统、拍摄装置、计算机存储介质
US11409983B2 (en) Methods and apparatuses for dynamically adding facial images into database, electronic devices and media
JP7181437B2 (ja) 制御されていない照明条件の画像中の肌色を識別する技術
US9665927B2 (en) Method and apparatus of multi-frame super resolution robust to local and global motion
US20200167314A1 (en) System and method for concepts caching using a deep-content-classification (dcc) system
US8938092B2 (en) Image processing system, image capture apparatus, image processing apparatus, control method therefor, and program
US20190188524A1 (en) Method and system for classifying an object-of-interest using an artificial neural network
US20190325197A1 (en) Methods and apparatuses for searching for target person, devices, and media
WO2013086492A1 (en) Faceprint generation for image recognition
JP7419080B2 (ja) コンピュータシステムおよびプログラム
US20210201501A1 (en) Motion-based object detection method, object detection apparatus and electronic device
CN108229289B (zh) 目标检索方法、装置和电子设备
KR20220064870A (ko) 물체 검출을 위한 관심 영역 선택
KR20130098769A (ko) 확장성을 고려한 특징 기술자 생성 및 특징 기술자를 이용한 정합 장치 및 방법
Mekhalfi et al. Fast indoor scene description for blind people with multiresolution random projections
WO2019232723A1 (en) Systems and methods for cleaning data
KR20180119013A (ko) 컨볼루션 신경망을 이용한 영상 검색 방법 및 그 장치
Sismananda et al. Performance comparison of yolo-lite and yolov3 using raspberry pi and motioneyeos
CN115115855A (zh) 图像编码器的训练方法、装置、设备及介质
Venkatesvara Rao et al. Real-time video object detection and classification using hybrid texture feature extraction
Wang et al. Design and implementation of remote facial expression recognition surveillance system based on PCA and KNN algorithms
US20190188514A1 (en) Information processing apparatus, information processing system, control method, and program
CN113874877A (zh) 神经网络及分类器选择系统和方法
US11605220B2 (en) Systems and methods for video surveillance
CN107615746A (zh) 用于智能成像的集成方案

Legal Events

Date Code Title Description
ENP Entry into the national phase

Ref document number: 2019571526

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 14/09/2020).

122 Ep: pct application non-entry in european phase

Ref document number: 18851177

Country of ref document: EP

Kind code of ref document: A1