WO2021169641A1 - 人脸识别方法和系统 - Google Patents

人脸识别方法和系统 Download PDF

Info

Publication number
WO2021169641A1
WO2021169641A1 PCT/CN2021/071260 CN2021071260W WO2021169641A1 WO 2021169641 A1 WO2021169641 A1 WO 2021169641A1 CN 2021071260 W CN2021071260 W CN 2021071260W WO 2021169641 A1 WO2021169641 A1 WO 2021169641A1
Authority
WO
WIPO (PCT)
Prior art keywords
target block
convolution
layer
target
convolution kernels
Prior art date
Application number
PCT/CN2021/071260
Other languages
English (en)
French (fr)
Inventor
朱锦祥
单以磊
臧磊
Original Assignee
深圳壹账通智能科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳壹账通智能科技有限公司 filed Critical 深圳壹账通智能科技有限公司
Publication of WO2021169641A1 publication Critical patent/WO2021169641A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/168Feature extraction; Face representation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Definitions

  • the embodiments of the present application relate to the field of artificial intelligence technology, and in particular to a face recognition method, system, computer equipment, and computer-readable storage medium.
  • Face recognition technology is an important method to identify the identity of users.
  • the inventor found that the general method is: by obtaining face images or video streams on the client, detecting whether there are faces based on the face images or video streams and extracting each person The position and size of the face and the position information of each main facial organ, and these information extract the identity features contained in each face, and compare it with the face in the face database to identify whether it is the target object .
  • face recognition technology based on deep neural network not only has higher recognition accuracy, but also has the ability to automatically extract facial features.
  • neural network architectures for extracting facial features such as a neural network architecture based on VGG, a ResNet architecture based on a residual structure, a MobileNet architecture, and so on.
  • VGG neural network architecture
  • ResNet ResNet architecture
  • the MobileNet architecture does not require large hardware resources, it has facial features. The extraction ability is poor, resulting in poor face recognition accuracy.
  • the purpose of the embodiments of the present application is to provide a face recognition method, system, computer device, and computer readable storage medium, which can be used to solve the technical problem that the hardware resource consumption and recognition accuracy cannot be balanced.
  • An aspect of the embodiments of the present application provides a face recognition method, the method includes:
  • the face features in the image to be recognized are extracted by a face feature extraction model, the face feature extraction model includes a target block sequence, the target block sequence includes one or more target blocks, each target area
  • the block includes a deeply separable convolution structure and an attention structure;
  • facial features perform a facial recognition operation on the image to be recognized.
  • Another aspect of the embodiments of the present application also provides a face recognition system, which includes:
  • An image acquisition module for acquiring an image to be recognized including face information
  • the feature extraction module is used to extract the facial features in the image to be recognized through a facial feature extraction model, the facial feature extraction model includes a target block sequence, and the target block sequence includes one or more target regions Block, each target block includes a deeply separable convolution structure and an attention structure; and
  • the image recognition module is configured to perform a face recognition operation on the image to be recognized according to the facial features.
  • a computer device including a memory, a processor, and a computer program stored in the memory and running on the processor.
  • the processor is used to implement the human face when the computer program is executed.
  • Recognition method the face recognition method includes the following steps:
  • the face features in the image to be recognized are extracted by a face feature extraction model, the face feature extraction model includes a target block sequence, the target block sequence includes one or more target blocks, each target area
  • the block includes a deeply separable convolution structure and an attention structure;
  • facial features perform a facial recognition operation on the image to be recognized.
  • the computer program When the computer program is executed by a processor, it is used to implement the face recognition method described above.
  • the face recognition method includes the following steps :
  • the face features in the image to be recognized are extracted by a face feature extraction model, the face feature extraction model includes a target block sequence, the target block sequence includes one or more target blocks, each target area
  • the block includes a deeply separable convolution structure and an attention structure;
  • facial features perform a facial recognition operation on the image to be recognized.
  • the face feature extraction model of the embodiment of the present application has been tested and found that the model has the characteristics of less parameters but basically no loss of accuracy, greatly reducing the memory usage and floating point calculations of the model, realizing storage and calculation optimization, and having hardware It consumes less resources and has the technical effect of high recognition accuracy.
  • Fig. 1 schematically shows a flowchart of a face recognition method according to Embodiment 1 of the present application
  • Figure 2 schematically shows an exemplary structure of a target block sequence
  • Figure 3 schematically shows an exemplary structure of the target block
  • Fig. 4 schematically shows a block diagram of a face recognition system according to the second embodiment of the present application.
  • Fig. 5 schematically shows a schematic diagram of the hardware architecture of a computer device suitable for implementing the face recognition method according to the third embodiment of the present application.
  • the technical solution of this application can be applied to the fields of artificial intelligence, smart city, blockchain and/or big data technology to realize face recognition.
  • the data involved in this application such as the image to be recognized and/or facial features, and other information can be stored in a database, or can be stored in a blockchain, such as distributed storage through a blockchain, which is not limited in this application .
  • Fig. 1 schematically shows a flowchart of a face recognition method according to Embodiment 1 of the present application. It can be understood that the flowchart in this method embodiment is not used to limit the order of execution of the steps. The following exemplarily describes the computer device 2 as the execution subject.
  • the face recognition method may include steps S100 to S104, where:
  • Step S100 Obtain an image to be recognized that includes face information.
  • the computer device 2 may be a smart phone, a tablet computer, a laptop computer, or the like. Take the smart phone as an example:
  • the smart phone can monitor the user's image acquisition instructions, such as the image acquisition instructions input through the touch screen. After the image acquisition instructions are monitored, the camera can be called according to the preset usage scenarios, such as the front camera or the rear camera. For example, in the face unlocking scene, if the smart phone is in the locked screen state and it is detected that the user triggers an unlocking operation through a physical button or touch screen, an image acquisition instruction is generated in response to the unlocking operation to call the front camera for image acquisition.
  • the unlocking operation of the smart phone can also be triggered in other ways.
  • various sensors detect whether the motion trajectory of the smartphone falls into a predetermined motion trajectory set, and if it falls into the predetermined motion trajectory set, an unlocking operation is triggered.
  • the camera can collect a variety of continuous pictures, select part of the pictures containing face information from these continuous pictures, and perform face recognition operations based on the selected part of the pictures.
  • This embodiment can also be used in other application scenarios, such as face payment scenarios and other various identity verification scenarios.
  • Step S102 Extract the facial features in the image to be recognized through the facial feature extraction model.
  • the face feature extraction model includes a target block sequence, the target block sequence includes one or more target blocks, and each target block includes a target block including a depthwise separable convolution structure (depthwise separable convolution) and Attention structure.
  • the depthwise separable convolution structure includes depthwise convolution and pointwise convolution.
  • multiple convolution kernels can be set, and the number of the multiple convolution kernels is the same as the number of channels, and there is a one-to-one correspondence between the convolution kernels and the channels.
  • the depthwise convolution can be composed of 3 convolution kernels, and one convolution kernel is used to convolve the image of the R channel and generate the corresponding
  • the Feature map a convolution kernel is used to perform convolution operations on the G channel image and generate a corresponding Feature map
  • a convolution kernel is used to perform convolution operations on the B channel image and generate a corresponding Feature map. It should be noted that if the size relationship between the input and output is set to be the same (that is, the padding is same), the size of each feature map mentioned above is also H*W.
  • the pointwise convolution multiple convolution kernels with a size of 1*1*M can be set, and the data of the multiple convolution kernels can be C, where M is the number of channels of the depthwise convolution. For example, since the number of channels of the depthwise convolution is 3 and 3 feature maps are output, the pointwise convolution can be composed of C 1*1*M convolution kernels. Each 1*1*M convolution kernel is used to perform a convolution operation corresponding to the previous network output (ie, 3 feature maps) to generate a new feature map. It is not difficult to understand that C 1*1*M convolution kernels mean that C new feature maps are output.
  • the attention structure can be implemented using a variety of model structures, such as an encoder-decoder framework.
  • the attention structure may be a Squeeze-Excitation structure, where:
  • the Squeeze structure in the compressed reward and punishment structure is used to process the feature map (hereinafter referred to as Feature map) output by the depth separable convolution structure to obtain the global receptive field information of the Feature map. Specifically: Perform feature compression based on spatial dimensions, and turn each two-dimensional feature channel into a real number. This real number has a global receptive field to some extent, and the output dimension matches the channel number of the input feature channel .
  • the reward (Excitation) structure in the compressed reward and punishment structure is used to evaluate the weight of each channel in the Feature map according to the global receptive field information provided by the compressed structure to generate the weight of each channel, and according to the respective The weight of the channel performs a calibration operation on each feature in the Feature map. Specifically: After the weights of each channel are generated, the weights of each channel are weighted to the corresponding original feature channel according to the sigmoid function, thereby completing the operation of calibrating the original feature in the dimension of the feature channel.
  • the first step through the formula Convert the input of H*W*C to the output of 1*1*C to obtain the global receptive field information of each Feature map.
  • u c represents the c-th Feature map
  • i represents the i-th column in the Feature map
  • j represents the j-th row in the Feature map.
  • the dimension of W 1 is C/r*C
  • the dimension of W 2 is C*C/r
  • both W 1 and W 2 are used in the fully connected operation
  • r is a scaling parameter used to reduce the number of channels.
  • the third step through the formula Each weight is weighted to the corresponding Feature map, u c represents the c-th Feature map, and s c represents the weight of a c-th Feature map.
  • the target block sequence includes a plurality of target blocks stacked in series in sequence, and the number of the plurality of target blocks is determined according to the computing resources and the accuracy of the face recognition operation. Specifically: according to the hardware resources and recognition accuracy of the computer equipment, different facial feature extraction models can be configured.
  • the facial feature extraction model may include one target block or multiple target blocks stacked in an orderly manner.
  • the target block sequence includes multiple target blocks stacked in an orderly manner, it is set as follows: the output channel data of each target block is monotonically increasing, and each target block The number of output channels is always greater than the number of output channels of the target block before it to increase the extraction accuracy.
  • the target block sequence includes a first target block, a second target block, a third target block, a fourth target block, and a fifth target zone stacked in sequence in sequence.
  • Block and the sixth target block wherein the output channels of the first target block are 16, the output channels of the second target block are 24, and the output channels of the third target block are 64 A.
  • the fourth target block has 96 output channels
  • the fifth target block has 160 output channels
  • the sixth target block has 320 output channels.
  • each target block includes the first convolutional layer (Conv_1), the first batch of standardized operation layers (BN), the first activation function layer (Swish), and the depth separable convolution structure coupled in sequence (DWConv), the second batch of standardized operation layer (BN), the second activation function layer (Swish), the third batch of standardized operation layer (BN), the compression reward and punishment structure (SE Module), the second convolution layer (Conv_2) and The fourth batch of standardized operating layers (BN).
  • the target block of the above structure can greatly reduce the memory usage and floating point calculation of the model, and realize the storage and calculation optimization of the model.
  • the number of convolution kernels in the first convolution layer is an even number
  • the number of convolution kernels in the first convolution layer is the value of the previous target block of the target block. N times the number of output channels, where N is a natural number greater than 1.
  • the number of convolution kernels in the second convolution layer is an even number; the number of convolution kernels in the second convolution layer is greater than or equal to 2 of the number of convolution kernels in the first convolution layer of the same target block Times and less than or equal to 4 times the number of convolution kernels in the first convolution layer; the number of convolution kernels in the second convolution layer is greater than that in the second convolution layer of the previous target block where the target block is located The number of convolution kernels.
  • Conv_1 can be a two-dimensional convolution, in the target block as the first operation unit of the target block, its convolution kernel size (kernel size) is 1*1, stride (stride) is 1*1, volume
  • the number of product cores has the following characteristics: 1. When the target block is the first target block of the entire deep neural network, the number of filters is a multiple of 2, and the recommended minimum value is not less than 16; 2. When the target block is the entire For the second or more target blocks of the deep neural network, the number of convolution kernels is equal to 6 times the number of output channels of the previous target block.
  • BN is the abbreviation of Batch Normalization, that is, batch standardized operation.
  • BN make the distribution of the input of each layer similar, prevent the output of the neuron from changing to the saturation area, and avoid the problem of disappearing gradient.
  • Use BN to find a 0-centered, unit-variance distribution as the input of the activation function of each layer.
  • the output of BN is obtained to ensure that the effect of this normalization will be maintained.
  • the eps parameter to prevent division by 0 is set to 0.001
  • the momentum factor momentum parameter is set to 0.01.
  • can be set to 1
  • Sigmoid is a non-linear activation function. Its mathematical formula is After testing, using Swish can improve accuracy while keeping all model parameters unchanged.
  • DWConv is an abbreviation for depthwise separable convolution.
  • the size of the convolution kernel in depthwise convolution is 3*3
  • the stride is 1*1
  • the number of convolution kernels is equal to the number of convolution kernels in Conv_1, and there is no bias term.
  • SE Module is a Squeeze-Excitation structure, which contains two convolutional layers, reduce_conv and expand_conv, the size of the convolution kernel and stride size of these two convolutional layers are both 1*1, the number of convolution kernels There is a difference in settings, where the number of convolution kernels of reduce_conv is 1/24 of the number of convolution kernels of Conv_1, and the number of convolution kernels of expand_conv is equal to the number of convolution kernels of Conv_1.
  • Conv_2 is the last layer of convolution operation of the target block.
  • the number of convolution kernels determines the dimension of the output channel of the target block.
  • the kernel size of Conv_2 is 1*1, and the stride ) Is 1*1, and the number of convolution kernels can be determined by a grid search algorithm.
  • Conv_2 convolution kernels with different target block structures have the following characteristics:
  • the number of convolution kernels of the subsequent target block is greater than the number of conv_2 convolution kernels of the convolution block of the previous layer
  • the number of Conv_2 convolution kernels is greater than or equal to twice the number of Conv_1 convolution kernels
  • the number of Conv_2 convolution kernels is less than 4 times the number of Conv_1 convolution kernels
  • Step S104 Perform a face recognition operation on the image to be recognized according to the face feature.
  • the face feature output by the face extraction model is compared with a reference face feature, where the reference face feature is a pre-stored face feature of the target object; whether it is the target object is determined according to the comparison result.
  • a threshold is determined according to the recognition error rate acceptable to the user (for example, different thresholds can be set according to different scenes), and the facial features output by the face extraction model are compared with the reference facial features. Determine the degree of matching between the two, which can be represented by cosine similarity; if the similarity is greater than or equal to the threshold, it is determined to be the target object, and if the similarity is less than the threshold, it is determined not to The target object.
  • this embodiment is based on a deep separable convolution and a lightweight attention structure, combined with ordinary convolution and BN operations, and finally designs a face feature extraction model that includes the target block.
  • the model has parameters. Fewer features, but the accuracy is basically not compromised, greatly reducing the model's video memory occupation and floating-point calculations, and realize storage and calculation optimization. Tests show that the parameter amount of the facial feature extraction model can be reduced from 54 trillion to 5.21 trillion, and the accuracy has not decreased. Therefore, the parameter amount of this embodiment is maintained at the level of the MobileNet architecture, but the facial feature extraction ability is no less than that of the ResNet architecture.
  • the accuracy of face recognition will be higher than resnet50 and resnet150, which is effective It solves the technical problem that the hardware resource consumption and the face feature extraction ability cannot have both, and can achieve higher face recognition accuracy with less hardware resource consumption.
  • FIG. 4 shows a block diagram of a face recognition system according to the second embodiment of the present application.
  • the face recognition system can be divided into one or more program modules, one or more program modules are stored in a storage medium, and Executed by one or more processors to complete the embodiments of the present application.
  • the program modules referred to in the embodiments of the present application refer to a series of computer program instruction segments that can complete specific functions. The following description will specifically introduce the functions of the program modules in this embodiment.
  • the face recognition system 400 may include the following components:
  • the image acquisition module 402 is used to acquire an image to be recognized including face information.
  • the feature extraction module 404 is configured to extract the face features in the image to be recognized through the face feature extraction model.
  • the face feature extraction model includes a target block sequence, the target block sequence includes one or more target blocks, and each target block includes a depth separable convolution structure and an attention structure.
  • the image recognition module 406 is configured to perform a face recognition operation on the image to be recognized according to the facial features.
  • the attention structure includes a compressed reward and punishment structure, wherein: the compressed structure in the compressed reward and punishment structure is used to process the feature map output by the depth separable convolution structure to obtain the result The global receptive field information of the feature map; the reward structure in the compressed reward and punishment structure is used to evaluate the weight of each channel in the feature map according to the global receptive field information provided by the compressed structure to generate the weight of each channel , And perform a calibration operation on each feature in the feature map according to the weight of each channel.
  • the depthwise separable convolution structure includes depthwise convolution and pointwise convolution.
  • multiple convolution kernels can be set, and the number of the multiple convolution kernels is the same as the number of channels, and there is a one-to-one correspondence between the convolution kernels and the channels.
  • the depthwise convolution can be composed of 3 convolution kernels, and one convolution kernel is used to convolve the image of the R channel and generate the corresponding
  • the Feature map a convolution kernel is used to perform convolution operations on the G channel image and generate a corresponding Feature map
  • a convolution kernel is used to perform convolution operations on the B channel image and generate a corresponding Feature map. It should be noted that if the size relationship between the input and output is set to be the same (that is, the padding is same), the size of each feature map mentioned above is also H*W.
  • the pointwise convolution multiple convolution kernels with a size of 1*1*M can be set, and the data of the multiple convolution kernels can be C, where M is the number of channels of the depthwise convolution. For example, since the number of channels of the depthwise convolution is 3 and 3 feature maps are output, the pointwise convolution can be composed of C 1*1*M convolution kernels. Each 1*1*M convolution kernel is used to perform a convolution operation corresponding to the previous network output (ie, 3 feature maps) to generate a new feature map. It is not difficult to understand that C 1*1*M convolution kernels mean that C new feature maps are output.
  • the Squeeze-Excitation structure in which:
  • the Squeeze structure in the compressed reward and punishment structure is used to process the feature map (hereinafter referred to as Feature map) output by the depth separable convolution structure to obtain the global receptive field information of the Feature map. Specifically: Perform feature compression based on spatial dimensions, and turn each two-dimensional feature channel into a real number. This real number has a global receptive field to some extent, and the output dimension matches the channel number of the input feature channel .
  • the reward (Excitation) structure in the compressed reward and punishment structure is used to evaluate the weight of each channel in the Feature map according to the global receptive field information provided by the compressed structure to generate the weight of each channel, and according to the respective The weight of the channel performs a calibration operation on each feature in the Feature map. Specifically: After the weights of each channel are generated, the weights of each channel are weighted to the corresponding original feature channel according to the sigmoid function, thereby completing the operation of calibrating the original feature in the dimension of the feature channel.
  • the first step through the formula Convert the input of H*W*C to the output of 1*1*C to obtain the global receptive field information of each Feature map.
  • u c represents the c-th Feature map
  • i represents the i-th column in the Feature map
  • j represents the j-th row in the Feature map.
  • the dimension of W 1 is C/r*C
  • the dimension of W 2 is C*C/r
  • both W 1 and W 2 are used in the fully connected operation
  • r is a scaling parameter used to reduce the number of channels.
  • the third step through the formula Each weight is weighted to the corresponding Feature map, u c represents the c-th Feature map, and s c represents the weight of a c-th Feature map.
  • the target block sequence includes a plurality of target blocks stacked in series in sequence, and the number of the plurality of target blocks is determined according to the computing resources and the accuracy of the face recognition operation. Specifically: according to the hardware resources and recognition accuracy of the computer equipment, different facial feature extraction models can be configured.
  • the facial feature extraction model may include one target block or multiple target blocks stacked in an orderly manner.
  • the target block sequence includes multiple target blocks stacked in an orderly manner, it is set as follows: the output channel data of each target block is monotonically increasing, and each target block The number of output channels is always greater than the number of output channels of the target block before it to increase the extraction accuracy.
  • each target block includes a first convolutional layer (Conv_1), a first batch of standardized operation layers (BN), a first activation function layer (Swish), and a depth separable volume that are coupled in sequence.
  • Product structure DWConv
  • second batch of standardized operation layer BN
  • second activation function layer Swish
  • third batch of standardized operation layer BN
  • compressed reward and punishment structure SE Module
  • second convolution layer Conv_2
  • the number of convolution kernels in the first convolution layer is an even number; the number of convolution kernels in the first convolution layer is the output channel of the previous target block of the target block. N times the number, and N is a natural number greater than 1.
  • the number of convolution kernels in the second convolutional layer is an even number; the number of convolution kernels in the second convolutional layer is greater than or equal to the first convolutional layer of the same target block 2 times the number of convolution kernels in and less than or equal to 4 times the number of convolution kernels in the first convolution layer; the number of convolution kernels in the second convolution layer is greater than the previous target area of the target block The number of convolution kernels in the second convolution layer of the block.
  • Conv_1 can be a two-dimensional convolution, in the target block as the first operation unit of the target block, its convolution kernel size (kernel size) is 1*1, stride (stride) is 1*1, volume
  • the number of product cores has the following characteristics: 1. When the target block is the first target block of the entire deep neural network, the number of filters is a multiple of 2, and the recommended minimum value is not less than 16; 2. When the target block is the entire For the second or more target blocks of the deep neural network, the number of convolution kernels is equal to 6 times the number of output channels of the previous target block.
  • BN is the abbreviation of Batch Normalization, that is, batch standardized operation.
  • BN make the distribution of the input of each layer similar, prevent the output of the neuron from changing to the saturation area, and avoid the problem of disappearing gradient.
  • Use BN to find a 0-centered, unit-variance distribution as the input of the activation function of each layer.
  • the output of BN is obtained to ensure that the effect of this normalization will be maintained.
  • the eps parameter to prevent division by 0 is set to 0.001
  • the momentum factor momentum parameter is set to 0.01.
  • can be set to 1
  • Sigmoid is a non-linear activation function. Its mathematical formula is After testing, using Swish can improve accuracy while keeping all model parameters unchanged.
  • DWConv is an abbreviation for depthwise separable convolution.
  • the size of the convolution kernel in depthwise convolution is 3*3
  • the stride is 1*1
  • the number of convolution kernels is equal to the number of convolution kernels in Conv_1, and there is no bias term.
  • SE Module is a Squeeze-Excitation structure, which contains two convolutional layers, reduce_conv and expand_conv, the size of the convolution kernel and stride size of these two convolutional layers are both 1*1, the number of convolution kernels There is a difference in settings, where the number of convolution kernels of reduce_conv is 1/24 of the number of convolution kernels of Conv_1, and the number of convolution kernels of expand_conv is equal to the number of convolution kernels of Conv_1.
  • Conv_2 is the last layer of convolution operation of the target block.
  • the number of convolution kernels determines the dimension of the output channel of the target block.
  • the kernel size of Conv_2 is 1*1, and the stride ) Is 1*1, and the number of convolution kernels can be determined by a grid search algorithm.
  • Conv_2 convolution kernels with different target block structures have the following characteristics:
  • the number of convolution kernels of the subsequent target block is greater than the number of conv_2 convolution kernels of the convolution block of the previous layer
  • the number of Conv_2 convolution kernels is greater than or equal to twice the number of Conv_1 convolution kernels
  • the number of Conv_2 convolution kernels is less than 4 times the number of Conv_1 convolution kernels
  • the target block sequence includes a first target block, a second target block, a third target block, a fourth target block, and a fifth target block stacked in series in sequence.
  • a sixth target block wherein the first target block has 16 output channels, the second target block has 24 output channels, and the third target block has 64 output channels , The fourth target block has 96 output channels, the fifth target block has 160 output channels, and the sixth target block has 320 output channels.
  • Fig. 5 schematically shows a schematic diagram of the hardware architecture of a computer device suitable for implementing the face recognition method according to the third embodiment of the present application.
  • the computer device 2 is a device that can automatically perform numerical calculation and/or information processing in accordance with pre-set or stored instructions.
  • it can be a smart phone, tablet computer, notebook computer, desktop computer, surveillance equipment, video conferencing system, rack server, blade server, tower server or cabinet server (including independent servers, or multiple server offices). Server cluster) and so on.
  • the computer device 2 at least includes but is not limited to: a memory and a processor.
  • the computer device 2 may also include a network interface.
  • the computer device 2 includes a memory 510, a processor 520, and a network interface 530.
  • the memory 510, the processor 520, and the network interface 530 can communicate with each other through a system bus. in:
  • the memory 510 includes at least one type of computer-readable storage medium.
  • the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), and static random access memory.
  • SRAM read-only memory
  • ROM read-only memory
  • EEPROM electrically erasable programmable read-only memory
  • PROM programmable read-only memory
  • magnetic memory magnetic disk, optical disk, etc.
  • the memory 510 may be an internal storage module of the computer device 2, for example, the hard disk or memory of the computer device 2.
  • the memory 510 may also be an external storage device of the computer device 2, for example, a plug-in hard disk equipped on the computer device 2, a smart memory card (Smart Media Card, referred to as SMC), and a secure digital (Secure Digital). Digital, abbreviated as SD) card, flash card (Flash Card), etc.
  • the memory 510 may also include both the internal storage module of the computer device 2 and its external storage device.
  • the memory 510 is generally used to store the operating system and various application software installed in the computer device 2, such as the program code of the face recognition method.
  • the memory 510 may also be used to temporarily store various types of data that have been output or will be output.
  • the processor 520 may be a central processing unit (Central Processing Unit, CPU for short), a controller, a microcontroller, a microprocessor, or other data processing chips.
  • the processor 520 is generally used to control the overall operation of the computer device 2, for example, to perform data interaction or communication-related control and processing with the computer device 2.
  • the processor 520 is configured to run program codes stored in the memory 510 or process data.
  • the network interface 530 may include a wireless network interface or a wired network interface, and the network interface 530 is generally used to establish a communication connection between the computer device 2 and other computer devices.
  • the network interface 530 is used to connect the computer device 2 with an external terminal through a network, and establish a data transmission channel and a communication connection between the computer device 2 and the external terminal.
  • the network can be Intranet, Internet, Global System of Mobile communication (GSM), Wideband Code Division Multiple Access (WCDMA), 4G network , 5G network, Bluetooth (Bluetooth), Wi-Fi and other wireless or wired networks.
  • FIG. 5 only shows a computer device with components 510-530, but it should be understood that it is not required to implement all the components shown, and more or fewer components may be implemented instead.
  • the face recognition method stored in the memory 510 may also be divided into one or more program modules and executed by one or more processors (processor 520 in this embodiment) for Implement part or all of the steps of the face recognition method in the foregoing embodiment to complete this application.
  • This embodiment also provides a computer-readable storage medium on which a computer program is stored, and when the computer program is executed by a processor, the steps of the face recognition method in the embodiment are implemented.
  • the storage medium involved in this application such as a computer-readable storage medium, may be non-volatile or volatile.
  • the computer-readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (for example, SD or DX memory, etc.), random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), programmable read-only memory (PROM), magnetic memory, magnetic disks, optical disks, etc.
  • the computer-readable storage medium may be an internal storage unit of a computer device, such as a hard disk or memory of the computer device.
  • the computer-readable storage medium may also be an external storage device of the computer device, such as a plug-in hard disk equipped on the computer device, a smart memory card (Smart Media Card, referred to as SMC), and a secure digital ( Secure Digital, referred to as SD card, Flash Card, etc.
  • the computer-readable storage medium may also include both the internal storage unit and the external storage device of the computer device.
  • the computer-readable storage medium is generally used to store the operating system and various application software installed in the computer device, such as the program code of the face recognition method in the embodiment.
  • the computer-readable storage medium can also be used to temporarily store various types of data that have been output or will be output.
  • modules or steps of the embodiments of the present application described above can be implemented by a general computing device, and they can be concentrated on a single computing device or distributed among multiple computing devices.
  • they can be implemented by the program code executable by the computing device, so that they can be stored in the storage device for execution by the computing device, and in some cases, they can be different from here
  • the steps shown or described are executed in the order of, or they are respectively fabricated into individual integrated circuit modules, or multiple modules or steps of them are fabricated into a single integrated circuit module to achieve. In this way, the embodiments of the present application are not limited to any specific combination of hardware and software.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • General Physics & Mathematics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Oral & Maxillofacial Surgery (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Image Analysis (AREA)

Abstract

一种人脸识别方法、人脸识别系统、计算机设备及计算机可读存储介质,该方法包括:获取包括人脸信息的待识别图像(S100);通过人脸特征提取模型提取所述待识别图像中的人脸特征,其中,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构(S102);根据所述人脸特征,对所述待识别图像执行人脸识别操作(S104)。本方法实现了存储和运算优化,具有硬件资源耗费较少且具有高识别精确度的技术效果。

Description

人脸识别方法和系统
本申请要求于2020年2月28日提交中国专利局、申请号为202010128434.5,发明名称为“人脸识别方法和系统”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请实施例涉及人工智能技术领域,尤其涉及一种人脸识别方法、系统、计算机设备及计算机可读存储介质。
背景技术
人脸识别技术是鉴别用户身份的一个重要方法,发明人发现,其一般做法为:通过在客户端获取人脸图像或者视频流,基于人脸图像或视频流检测是否存在人脸以及提取每个人脸的位置、大小和各个主要面部器官的位置信息,以及这些信息提取每个人脸中所蕴涵的身份特征,并将其与人脸数据库中的人脸进行对比,从而识别来确定是否是目标对象。随着人工智能技术的不断发展,人们开始借助深度神经网络技术来提取人脸特征。相比于传统的人脸识别技术,基于深度神经网络的人脸识别技术不仅具备更高的识别准确率,还具备自动提取人脸特征的能力。
本发明人目前所了解的是,用于提取人脸特征的神经网络架构较多,如基于VGG的神经网络架构、基于残差结构的ResNet架构、MobileNet架构等等。但是,发明人意识到,VGG或ResNet等架构需要耗费很大的硬件资源,无法被应用到手机等硬件资源有效的计算机设备中,而MobileNet架构虽然不需要很大的硬件资源,但是人脸特征提取能力又较差,导致人脸识别精确度不高。
因此,本申请人认为有必要提供一种硬件资源耗费较少且具有高识别精确度的人脸识别技术。
发明内容
本申请实施例的目的是提供一种人脸识别方法、系统、计算机设备及计算机可读存储介质,可以用于解决硬件资源耗费和识别精确度无法兼顾的的技术问题。
本申请实施例的一个方面提供了一种人脸识别方法,所述方法包括:
获取包括人脸信息的待识别图像;
通过人脸特征提取模型提取所述待识别图像中的人脸特征,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构;及
根据所述人脸特征,对所述待识别图像执行人脸识别操作。
本申请实施例的另一个方面还提供了一种人脸识别系统,所述系统包括:
图像获取模块,用于获取包括人脸信息的待识别图像;
特征提取模块,用于通过人脸特征提取模型提取所述待识别图像中的人脸特征,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构;及
图像识别模块,用于根据所述人脸特征,对所述待识别图像执行人脸识别操作。
本申请实施例的再一个方面提供了一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,上述处理器执行上述计算机程序时用于实现上述人脸识别方法,该人脸识别方法包括以下步骤:
获取包括人脸信息的待识别图像;
通过人脸特征提取模型提取所述待识别图像中的人脸特征,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构;及
根据所述人脸特征,对所述待识别图像执行人脸识别操作。
本申请实施例的又一个方面提供了一种计算机可读存储介质,其上存储有计算机程序,上述计算机程序被处理器执行时用于实现上述人脸识别方法,该人脸识别方法包括以下步骤:
获取包括人脸信息的待识别图像;
通过人脸特征提取模型提取所述待识别图像中的人脸特征,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构;及
根据所述人脸特征,对所述待识别图像执行人脸识别操作。
本申请实施例的人脸特征提取模型,经测试,该模型具备参数量较少但精度基本不受损的特性,大大降低模型的显存占用和浮点运算量,实现存储和运算优化,具有硬件资源耗费较少且具有高识别精确度的技术效果。
附图说明
图1示意性示出了根据本申请实施例一的人脸识别方法的流程图;
图2示意性示出了目标区块序列的示例性结构;
图3示意性示出了目标区块的示例性结构;
图4示意性示出了根据本申请实施例二的人脸识别系统的框图;以及
图5示意性示出了根据本申请实施例三的适于实现人脸识别方法的计算机设备的硬件架构示意图。
具体实施方式
为了使本申请实施例的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处所描述的具体实施例仅用以解释本申请,并不用于限定本申请。基于本申请中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
需要说明的是,在本申请中涉及“第一”、“第二”等的描述仅用于描述目的,而不能理解为指示或暗示其相对重要性或者隐含指明所指示的技术特征的数量。由此,限定有“第一”、“第二”的特征可以明示或者隐含地包括至少一个该特征。另外,各个实施例之间的技术方案可以相互结合,但是必须是以本领域普通技术人员能够实现为基础,当技术方案的结合出现相互矛盾或无法实现时应当认为这种技术方案的结合不存在,也不在本申请要求的保护范围之内。
本申请的技术方案可应用于人工智能、智慧城市、区块链和/或大数据技术领域,以实现人脸识别。可选的,本申请涉及的数据如待识别图像和/或人脸特征等信息可存储于数据库中,或者可以存储于区块链中,比如通过区块链分布式存储,本申请不做限定。
实施例一
图1示意性示出了根据本申请实施例一的人脸识别方法的流程图。可以理解,本方法实施例中的流程图不用于对执行步骤的顺序进行限定。下面以计算机设备2为执行主体进行示例性描述。
如图1所示,该人脸识别方法可以包括步骤S100~步骤S104,其中:
步骤S100,获取包括人脸信息的待识别图像。
计算机设备2可以是智能手机、平板电脑、膝上笔记本电脑等。以智能手机为例:
智能手机可以监测用户的图像采集指令,如通过触摸显示屏输入的图像采集指令,监测到该图像采集指令之后,根据预设使用场景调用摄像头,如前置摄像头或后置摄像头。例如,在人脸解锁场景中,如果智能手机处于锁屏状态并监测到用户通过物理按钮或触摸屏触发解锁操作时,响应该解锁操作生成图像采集指令以调用前置摄像头进行图像采集操 作。当然,也可以通过其他方式触发智能手机的解锁操作。例如,通过各类传感器(例如,重力传感器、陀螺仪等)检测到智能手机的运动轨迹是否落入预定运动轨迹集合中,如果落入到预定运动轨迹集合中则触发解锁操作。不难理解,摄像头可以采集多种连续图片,从这些连续图片中挑选出包含人脸信息的部分图片,基于挑选出的部分图片进行人脸识别操作。本实施例还可以用在其他应用场景中,如人脸支付场景以及其他各类身份验证场景等。
步骤S102,通过人脸特征提取模型提取所述待识别图像中的人脸特征。
所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括目标区块包括深度可分离卷积结构(depthwise separable convolution)和注意力结构。
下面对深度可分离卷积结构和注意力结构进行示例性介绍:
所述深度可分离卷积结构包括depthwise convolution(深度卷积)和pointwise convolution(逐点卷积)。
所述depthwise convolution,可以设置多个卷积核,所述多个卷积核的数量和通道数量相同,且卷积核和通道之间具有一一对应关系。例如,待识别图像为三通道的彩色图片(shape为H*W*3),depthwise convolution可以由3个卷积核构成,一个卷积核用于对R通道的图像进行卷积操作并生成对应的Feature map、一个卷积核用于对G通道的图像进行卷积操作并生成对应的Feature map、一个卷积核用于对B通道的图像进行卷积操作并生成对应的Feature map。需要说明的是,如果被设置输入和输出大小关系一致(即padding为same),则上述各个Feature map的尺寸亦为H*W。
所述pointwise convolution:可以设置多个尺寸为1*1*M卷积核,所述多个卷积核的数据可以为C,其中M即上一层网络(depthwise convolution)的通道数量。例如,由于depthwise convolution的通道数为3,输出3个Feature map,所以,pointwise convolution可以由C个1*1*M的卷积核构成。每个1*1*M的卷积核用于对应上一层网络输出(即,3个Feature map)执行卷积操作以生成一个新的Feature map。不难理解,C个1*1*M的卷积核意味着要输出C个新的Feature map。
所述注意力结构可以采用多种模型结构实现,如编码器-解码器框架。
在示例性的实施例中,所述注意力结构可以是压缩奖惩(Squeeze-Excitation)结构,其中:
所述压缩奖惩结构中的压缩(Squeeze)结构,用于对所述深度可分离卷积结构输出的特征图(下称,Feature map)进行处理以得到所述Feature map的全局感受野信息。具体的:基于空间维度来进行特征压缩,将每个二维的特征通道变成一个实数,这个实数某种程度上具有全局的感受野,并且输出的维度和输入的特征通道的通道数量相匹配。
所述压缩奖惩结构中的奖励(Excitation)结构,用于根据所述压缩结构提供的全局感受野信息对所述Feature map中各个通道进行权值评比以生成各个通道的权重,并根据所述各个通道的权重对所述Feature map中的各个特征进行标定操作。具体的:生成各个通道的权重之后,根据S型函数将上述各个通道的权重加权到对应的原特征通道上,从而完成在特征通道维度上对原始特征进行标定的操作。
为了易于理解,假设U(U∈R H*W*C),即深度可分离卷积结构输出的C个H*W的Feature map。下面以U为输入进行介绍Squeeze-Excitation结构的实现过程:
第一步骤:通过公式
Figure PCTCN2021071260-appb-000001
将H*W*C的输入转换成1*1*C的输出,以得到各个Feature map的全局感受野信息。其中,u c表示第c个Feature map,i表示Feature map中第i列,j表示Feature map中第j行。
第二步骤:通过公式s=F ex(z,W)=σ(g(z,W))=σ(W 2σ(W 1*z))来获取C个Feature map 各自的权重。其中,W 1的维度是C/r*C,W 2的维度是C*C/r,W 1和W 2均用于全连接运算中,r是一个用于减少通道数量的缩放参数。
第三步骤:通过公式
Figure PCTCN2021071260-appb-000002
将各个权重加权到对应的Feature map上,u c表示第c个Feature map,s c表示一个第c个Feature map的权重。
在示例性的实施例中,所述目标区块序列包括依顺序串行堆叠的多个目标区块,所述多个目标区块的数量根据计算资源和人脸识别操作的精确度确定。具体的:可以根据计算机设备的硬件资源和识别精度,配置不同的人脸特征提取模型,所述人脸特征提取模型可以包括一个目标区块,也可以包括有序堆叠的多个目标区块。在示例性的实施例中,当所述目标区块序列包括有序堆叠的多个目标区块时,则设置为:每个目标区块的输出通道数据是单调递增的,每个目标区块的输出通道数量始终大于其前面的目标区块的输出通道数量,以增加提取精度。
进一步的,如图2所示,所述目标区块序列包括依顺序串行堆叠的第一目标区块、第二目标区块、第三目标区块、第四目标区块、第五目标区块和第六目标区块,其中,所述第一目标区块的输出通道为16个、所述第二目标区块的输出通道为24个、所述第三目标区块的输出通道为64个、所述第四目标区块的输出通道为96个、所述第五目标区块的输出通道为160个、所述第六目标区块的输出通道为320个。通过上述配置,可以将模型参数数量控制在6兆以下,且保障识别准确率达到93%以上。
在示例性的实施例中,为优化目标区块以减少神经网络参数的数量,同时保证图片特征提取能力,以下提供所述目标区块的一示例性结构。如图3所示,每个目标区块包括依顺序耦合的第一卷积层(Conv_1)、第一批标准化操作层(BN)、第一激活函数层(Swish)、深度可分离卷积结构(DWConv)、第二批标准化操作层(BN)、第二激活函数层(Swish)、第三批标准化操作层(BN)、压缩奖惩结构(SE Module)、第二卷积层(Conv_2)和第四批标准化操作层(BN)。经过测试,上述结构的目标区块,可以极大的降低模型的显存占用和浮点运算量,实现模型的存储和运算优化。作为保障运算优化的优选设置,其中:所述第一卷积层中的卷积核数量为偶数;第一卷积层中的卷积核数量,为所在目标区块的上一个目标区块的输出通道数量的N倍,N为大于1的自然数。所述第二卷积层中的卷积核数量为偶数;第二卷积层中的卷积核数量,大于或等于同一个目标区块的第一卷积层中的卷积核数量的2倍并且小于或等于第一卷积层中的卷积核数量的4倍;第二卷积层中的卷积核数量,大于所在目标区块的上一个目标区块的第二卷积层中的卷积核数量。
以下将对目标区块中的各个操作单元进行具体介绍:
Conv_1,可以是二维卷积,在目标区块中作为目标区块的第一个操作单元,其卷积核大小(kernel size)为1*1,步幅(stride)为1*1,卷积核数量具有以下特点:1.当目标区块为整个深度神经网络的第一个目标区块时,过滤器数为2的倍数,建议最小值不小于16;2.当目标区块为整个深度神经网络的第二个或第二个以上目标区块时,卷积核数量等于上一个目标区块的输出道通数量的6倍。
BN,是Batch Normalization的缩写,即批标准化操作。
BN的作用:让每一层的输入的分布变得相似,防止神经元的输出变化到饱和区域,以避免梯度消失问题。通过BN寻找一个以0为中心的,单位方差的分布作为每一层的激活函数的输入。训练过程中,可以用激活的输入x减去这个batch中的均值μ来得到以0为中心的分布
Figure PCTCN2021071260-appb-000003
然后,通过x除以batch的方差,并通过σ+ε来防止除0操作,确保了所有的激活函数的输入分布具有单位方差
Figure PCTCN2021071260-appb-000004
最后,将x通过一个线性变换,通过缩放和偏移,得到了BN的输出,确保这个归一化的作用会保持住。实际参数:防止除0操作的eps参数设为0.001,而动量因子momentum参数为0.01。
Swish,为激活函数,其公式为:f(x)=x*sigmoid(βx),其中,β为x的缩放参数,在本实施例中,β可以设置为1,Sigmoid是非线性的激活函数,其数学公式为
Figure PCTCN2021071260-appb-000005
经过测试,在保持所有的模型参数不变的情况下,使用Swish可以提升准确率。
DWConv,为depthwise separable convolution的缩写。在本实施例中,depthwise convolution中的卷积核大小为3*3,步幅为1*1,卷积核的数量等于Conv_1中的卷积核的数量,无偏置项。
SE Module,为Squeeze-Excitation结构,其包含了两个卷积层,分别是reduce_conv和expand_conv,这两个卷积层的卷积核大小和步幅大小都是1*1,卷积核的数量设置存在区别,其中,reduce_conv的卷积核的数量为Conv_1的卷积核的数量的24分之1,expand_conv的卷积核的数量等于Conv_1的卷积核的数量。
Conv_2,为目标区块的最后一层卷积操作,其卷积核的数量决定了目标区块的输出通道的维度,Conv_2的卷积核大小(kernel size)为1*1,步幅(stride)为1*1,卷积核的数量可通过网格搜索算法来确定。不同目标区块结构的Conv_2的卷积核具有如下特点:
1,后面的目标区块的卷积核的数量大于上一层的卷积块的Conv_2的卷积核的数量;
2,Conv_2的卷积核的数量大于或等于Conv_1的卷积核的数量的2倍;
3,Conv_2的卷积核的数量小于Conv_1的卷积核的数量的4倍;
4,Conv_2的卷积核的数量为偶数。
步骤S104,根据所述人脸特征,对所述待识别图像执行人脸识别操作。
具体的,将人脸提取模型输出的人脸特征与参考人脸特征进行比较,所述参考人脸特征为预存的目标对象的人脸特征;根据比较结果来确定是否为目标对象。在一些方案中,根据用户可以接受的识别错误率来确定一个阈值(例如,可以根据不用的场景,设置不同的阈值),将人脸提取模型输出的人脸特征与参考人脸特征进行比较以确定二者之间的匹配度,该匹配度可以通过余弦相似度表示;如果所述相似度大于或等于所述阈值,则确定是目标对象,如果所述相似度小于所述阈值,则确定不是所述目标对象。
不难理解,本实施例基于深度可分离卷积和轻量级注意力结构,联合使用普通卷积和BN操作运算,最终设计出包含目标区块的人脸特征提取模型,该模型具备参数量较少但精度基本不受损的特性,大大降低模型的显存占用和浮点运算量,实现存储和运算优化。经测试表明,人脸特征提取模型的参数量可以由54兆下降为5.21兆,而精度并没有下降。因此,本实施例的参数量维持在MobileNet架构的级别,但人脸特征提取能力不亚于ResNet架构,如适当提高参数量,则人脸识别的准确度要比resnet50、resnet150更高,从而有效解决了硬件资源消耗和人脸特征提取能力之间不可兼得的技术问题,可以在硬件资源耗费较少的情况下实现较高的人脸识别精度。
实施例二
图4示出了根据本申请实施例二的人脸识别系统的框图,该人脸识别系统可以被分割成一个或多个程序模块,一个或者多个程序模块被存储于存储介质中,并由一个或多个处理器所执行,以完成本申请实施例。本申请实施例所称的程序模块是指能够完成特定功能的一系列计算机程序指令段,以下描述将具体介绍本实施例各程序模块的功能。
如图4所示,所述人脸识别系统400可以包括以下组成部分:
图像获取模块402,用于获取包括人脸信息的待识别图像。
特征提取模块404,用于通过所述人脸特征提取模型提取所述待识别图像中的人脸特征。
其中,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构。
图像识别模块406,用于根据所述人脸特征,对所述待识别图像执行人脸识别操作。
在示例性的实施例中,所述注意力结构包括压缩奖惩结构,其中:所述压缩奖惩结构中的压缩结构,用于对所述深度可分离卷积结构输出的特征图进行处理以得到所述特征图的全局感受野信息;所述压缩奖惩结构中的奖励结构,用于根据所述压缩结构提供的全局感受野信息对所述特征图中各个通道进行权值评比以生成各个通道的权重,并根据所述各个通道的权重对所述特征图中的各个特征进行标定操作。
下面对深度可分离卷积结构和压缩奖惩结构进行示例性介绍:
所述深度可分离卷积结构包括depthwise convolution(深度卷积)和pointwise convolution(逐点卷积)。
所述depthwise convolution,可以设置多个卷积核,所述多个卷积核的数量和通道数量相同,且卷积核和通道之间具有一一对应关系。例如,待识别图像为三通道的彩色图片(shape为H*W*3),depthwise convolution可以由3个卷积核构成,一个卷积核用于对R通道的图像进行卷积操作并生成对应的Feature map、一个卷积核用于对G通道的图像进行卷积操作并生成对应的Feature map、一个卷积核用于对B通道的图像进行卷积操作并生成对应的Feature map。需要说明的是,如果被设置输入和输出大小关系一致(即padding为same),则上述各个Feature map的尺寸亦为H*W。
所述pointwise convolution:可以设置多个尺寸为1*1*M卷积核,所述多个卷积核的数据可以为C,其中M即上一层网络(depthwise convolution)的通道数量。例如,由于depthwise convolution的通道数为3,输出3个Feature map,所以,pointwise convolution可以由C个1*1*M的卷积核构成。每个1*1*M的卷积核用于对应上一层网络输出(即,3个Feature map)执行卷积操作以生成一个新的Feature map。不难理解,C个1*1*M的卷积核意味着要输出C个新的Feature map。
所述压缩奖惩(Squeeze-Excitation)结构,其中:
所述压缩奖惩结构中的压缩(Squeeze)结构,用于对所述深度可分离卷积结构输出的特征图(下称,Feature map)进行处理以得到所述Feature map的全局感受野信息。具体的:基于空间维度来进行特征压缩,将每个二维的特征通道变成一个实数,这个实数某种程度上具有全局的感受野,并且输出的维度和输入的特征通道的通道数量相匹配。
所述压缩奖惩结构中的奖励(Excitation)结构,用于根据所述压缩结构提供的全局感受野信息对所述Feature map中各个通道进行权值评比以生成各个通道的权重,并根据所述各个通道的权重对所述Feature map中的各个特征进行标定操作。具体的:生成各个通道的权重之后,根据S型函数将上述各个通道的权重加权到对应的原特征通道上,从而完成在特征通道维度上对原始特征进行标定的操作。
为了易于理解,假设U(U∈R H*W*C),即深度可分离卷积结构输出的C个H*W的Feature map。下面以U为输入进行介绍Squeeze-Excitation结构的实现过程:
第一步骤:通过公式
Figure PCTCN2021071260-appb-000006
将H*W*C的输入转换成1*1*C的输出,以得到各个Feature map的全局感受野信息。其中,u c表示第c个Feature map,i表示Feature map中第i列,j表示Feature map中第j行。
第二步骤:通过公式s=F ex(z,W)=σ(g(z,W))=σ(W 2σ(W 1*z))来获取C个Feature map各自的权重。其中,W 1的维度是C/r*C,W 2的维度是C*C/r,W 1和W 2均用于全连接运算中,r是一个用于减少通道数量的缩放参数。
第三步骤:通过公式
Figure PCTCN2021071260-appb-000007
将各个权重加权到对应的Feature map上,u c表示第c个Feature map,s c表示一个第c个Feature map的权重。
在示例性的实施例中,所述目标区块序列包括依顺序串行堆叠的多个目标区块,所述多个目标区块的数量根据计算资源和人脸识别操作的精确度确定。具体的:可以根据计算机设备的硬件资源和识别精度,配置不同的人脸特征提取模型,所述人脸特征提取模型可以包括一个目标区块,也可以包括有序堆叠的多个目标区块。在示例性的实施例中,当所述目标区块序列包括有序堆叠的多个目标区块时,则设置为:每个目标区块的输出通道数据是单调递增的,每个目标区块的输出通道数量始终大于其前面的目标区块的输出通道数量,以增加提取精度。
在示例性的实施例中,每个目标区块包括依顺序耦合的第一卷积层(Conv_1)、第一批标准化操作层(BN)、第一激活函数层(Swish)、深度可分离卷积结构(DWConv)、第二批标准化操作层(BN)、第二激活函数层(Swish)、第三批标准化操作层(BN)、压缩奖惩结构(SE Module)、第二卷积层(Conv_2)和第四批标准化操作层(BN)。
在示例性的实施例中,所述第一卷积层中的卷积核数量为偶数;第一卷积层中的卷积核数量,为所在目标区块的上一个目标区块的输出通道数量的N倍,N为大于1的自然数。
在示例性的实施例中,所述第二卷积层中的卷积核数量为偶数;第二卷积层中的卷积核数量,大于或等于同一个目标区块的第一卷积层中的卷积核数量的2倍并且小于或等于第一卷积层中的卷积核数量的4倍;第二卷积层中的卷积核数量,大于所在目标区块的上一个目标区块的第二卷积层中的卷积核数量。
以下将对目标区块中的各个操作单元进行具体介绍:
Conv_1,可以是二维卷积,在目标区块中作为目标区块的第一个操作单元,其卷积核大小(kernel size)为1*1,步幅(stride)为1*1,卷积核数量具有以下特点:1.当目标区块为整个深度神经网络的第一个目标区块时,过滤器数为2的倍数,建议最小值不小于16;2.当目标区块为整个深度神经网络的第二个或第二个以上目标区块时,卷积核数量等于上一个目标区块的输出道通数量的6倍。
BN,是Batch Normalization的缩写,即批标准化操作。
BN的作用:让每一层的输入的分布变得相似,防止神经元的输出变化到饱和区域,以避免梯度消失问题。通过BN寻找一个以0为中心的,单位方差的分布作为每一层的激活函数的输入。训练过程中,可以用激活的输入x减去这个batch中的均值μ来得到以0为中心的分布
Figure PCTCN2021071260-appb-000008
然后,通过x除以batch的方差,并通过σ+ε来防止除0操作,确保了所有的激活函数的输入分布具有单位方差
Figure PCTCN2021071260-appb-000009
最后,将x通过一个线性变换,通过缩放和偏移,得到了BN的输出,确保这个归一化的作用会保持住。实际参数:防止除0操作的eps参数设为0.001,而动量因子momentum参数为0.01。
Swish,为激活函数,其公式为:f(x)=x*sigmoid(βx),其中,β为x的缩放参数,在本实施例中,β可以设置为1,Sigmoid是非线性的激活函数,其数学公式为
Figure PCTCN2021071260-appb-000010
经过测试,在保持所有的模型参数不变的情况下,使用Swish可以提升准确率。
DWConv,为depthwise separable convolution的缩写。在本实施例中,depthwise convolution中的卷积核大小为3*3,步幅为1*1,卷积核的数量等于Conv_1中的卷积核的数量,无偏置项。
SE Module,为Squeeze-Excitation结构,其包含了两个卷积层,分别是reduce_conv和expand_conv,这两个卷积层的卷积核大小和步幅大小都是1*1,卷积核的数量设置存在区别,其中,reduce_conv的卷积核的数量为Conv_1的卷积核的数量的24分之1,expand_conv的卷积核的数量等于Conv_1的卷积核的数量。
Conv_2,为目标区块的最后一层卷积操作,其卷积核的数量决定了目标区块的输出通 道的维度,Conv_2的卷积核大小(kernel size)为1*1,步幅(stride)为1*1,卷积核的数量可通过网格搜索算法来确定。不同目标区块结构的Conv_2的卷积核具有如下特点:
1,后面的目标区块的卷积核的数量大于上一层的卷积块的Conv_2的卷积核的数量;
2,Conv_2的卷积核的数量大于或等于Conv_1的卷积核的数量的2倍;
3,Conv_2的卷积核的数量小于Conv_1的卷积核的数量的4倍;
4,Conv_2的卷积核的数量为偶数。
在示例性的实施例中,所述目标区块序列包括依顺序串行堆叠的第一目标区块、第二目标区块、第三目标区块、第四目标区块、第五目标区块和第六目标区块,其中,所述第一目标区块的输出通道为16个、所述第二目标区块的输出通道为24个、所述第三目标区块的输出通道为64个、所述第四目标区块的输出通道为96个、所述第五目标区块的输出通道为160个、所述第六目标区块的输出通道为320个。
实施例三
图5示意性示出了根据本申请实施例三的适于实现人脸识别方法的计算机设备的硬件架构示意图。本实施例中,计算机设备2是一种能够按照事先设定或者存储的指令,自动进行数值计算和/或信息处理的设备。例如,可以是智能手机、平板电脑、笔记本电脑、台式计算机、监控设备、视频会议系统、机架式服务器、刀片式服务器、塔式服务器或机柜式服务器(包括独立的服务器,或者多个服务器所组成的服务器集群)等。如图5所示,计算机设备2至少包括但不限于:存储器和处理器。可选的,该计算机设备2还可包括网络接口。例如,该计算机设备2包括存储器510、处理器520、网络接口530,如可通过系统总线相互通信连接存储器510、处理器520、网络接口530。其中:
存储器510至少包括一种类型的计算机可读存储介质,可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,存储器510可以是计算机设备2的内部存储模块,例如该计算机设备2的硬盘或内存。在另一些实施例中,存储器510也可以是计算机设备2的外部存储设备,例如该计算机设备2上配备的插接式硬盘,智能存储卡(Smart Media Card,简称为SMC),安全数字(Secure Digital,简称为SD)卡,闪存卡(Flash Card)等。当然,存储器510还可以既包括计算机设备2的内部存储模块也包括其外部存储设备。本实施例中,存储器510通常用于存储安装于计算机设备2的操作系统和各类应用软件,例如人脸识别方法的程序代码等。此外,存储器510还可以用于暂时地存储已经输出或者将要输出的各类数据。
处理器520在一些实施例中可以是中央处理器(Central Processing Unit,简称为CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器520通常用于控制计算机设备2的总体操作,例如执行与计算机设备2进行数据交互或者通信相关的控制和处理等。本实施例中,处理器520用于运行存储器510中存储的程序代码或者处理数据。
网络接口530可包括无线网络接口或有线网络接口,该网络接口530通常用于在计算机设备2与其他计算机设备之间建立通信连接。例如,网络接口530用于通过网络将计算机设备2与外部终端相连,在计算机设备2与外部终端之间的建立数据传输通道和通信连接等。网络可以是企业内部网(Intranet)、互联网(Internet)、全球移动通讯系统(Global System of Mobile communication,简称为GSM)、宽带码分多址(Wideband Code Division Multiple Access,简称为WCDMA)、4G网络、5G网络、蓝牙(Bluetooth)、Wi-Fi等无线或有线网络。
需要指出的是,图5仅示出了具有部件510-530的计算机设备,但是应理解的是,并不要求实施所有示出的部件,可以替代的实施更多或者更少的部件。
在本实施例中,存储于存储器510中的人脸识别方法还可以被分割为一个或者多个程序模块,并由一个或多个处理器(本实施例为处理器520)所执行,用于实现上述实施例中人脸识别方法的部分或全部步骤,以完成本申请。
实施例四
本实施例还提供一种计算机可读存储介质,计算机可读存储介质其上存储有计算机程序,计算机程序被处理器执行时实现实施例中的人脸识别方法的步骤。
可选的,本申请涉及的存储介质如计算机可读存储介质可以是非易失性的,也可以是易失性的。
本实施例中,计算机可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,计算机可读存储介质可以是计算机设备的内部存储单元,例如该计算机设备的硬盘或内存。在另一些实施例中,计算机可读存储介质也可以是计算机设备的外部存储设备,例如该计算机设备上配备的插接式硬盘,智能存储卡(Smart Media Card,简称为SMC),安全数字(Secure Digital,简称为SD)卡,闪存卡(Flash Card)等。当然,计算机可读存储介质还可以既包括计算机设备的内部存储单元也包括其外部存储设备。本实施例中,计算机可读存储介质通常用于存储安装于计算机设备的操作系统和各类应用软件,例如实施例中的人脸识别方法的程序代码等。此外,计算机可读存储介质还可以用于暂时地存储已经输出或者将要输出的各类数据。
显然,本领域的技术人员应该明白,上述的本申请实施例的各模块或各步骤可以用通用的计算装置来实现,它们可以集中在单个的计算装置上,或者分布在多个计算装置所组成的网络上,可选地,它们可以用计算装置可执行的程序代码来实现,从而,可以将它们存储在存储装置中由计算装置来执行,并且在某些情况下,可以以不同于此处的顺序执行所示出或描述的步骤,或者将它们分别制作成各个集成电路模块,或者将它们中的多个模块或步骤制作成单个集成电路模块来实现。这样,本申请实施例不限制于任何特定的硬件和软件结合。
以上仅为本申请的优选实施例,并非因此限制本申请的专利范围,凡是利用本申请说明书及附图内容所作的等效结构或等效流程变换,或直接或间接运用在其他相关的技术领域,均同理包括在本申请的专利保护范围内。

Claims (20)

  1. 一种人脸识别方法,其中,所述方法包括:
    获取包括人脸信息的待识别图像;
    通过人脸特征提取模型提取所述待识别图像中的人脸特征,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构;及
    根据所述人脸特征,对所述待识别图像执行人脸识别操作。
  2. 根据权利要求1所述的人脸识别方法,其中,所述注意力结构包括压缩奖惩结构,其中:
    所述压缩奖惩结构中的压缩结构,用于对所述深度可分离卷积结构输出的特征图进行处理以得到所述特征图的全局感受野信息;所述压缩奖惩结构中的奖励结构,用于根据所述压缩结构提供的全局感受野信息对所述特征图中各个通道进行权值评比以生成各个通道的权重,并根据所述各个通道的权重对所述特征图中的各个特征进行标定操作。
  3. 根据权利要求1所述的人脸识别方法,其中,所述目标区块序列包括依顺序串行堆叠的多个目标区块,所述多个目标区块的数量根据计算资源和人脸识别操作的精确度确定。
  4. 根据权利要求3所述的人脸识别方法,其中,每个目标区块包括依顺序耦合的第一卷积层、第一批标准化操作层、第一激活函数层、深度可分离卷积结构、第二批标准化操作层、第二激活函数层、第三批标准化操作层、压缩奖惩结构、第二卷积层和第四批标准化操作层。
  5. 根据权利要求4所述的人脸识别方法,其中,所述第一卷积层中的卷积核数量为偶数;且,所述第一卷积层中的卷积核数量,为所在目标区块的上一个目标区块的输出通道数量的N倍,N为大于1的自然数。
  6. 根据权利要求4所述的人脸识别方法,其中,所述第二卷积层中的卷积核数量为偶数;所述第二卷积层中的卷积核数量,大于或等于同一个目标区块的第一卷积层中的卷积核数量的2倍并且小于或等于第一卷积层中的卷积核数量的4倍;所述第二卷积层中的卷积核数量,大于所在目标区块的上一个目标区块的第二卷积层中的卷积核数量。
  7. 根据权利要求5所述的人脸识别方法,其中,所述目标区块序列包括依顺序串行堆叠的第一目标区块、第二目标区块、第三目标区块、第四目标区块、第五目标区块和第六目标区块,其中,所述第一目标区块的输出通道为16个、所述第二目标区块的输出通道为24个、所述第三目标区块的输出通道为64个、所述第四目标区块的输出通道为96个、所述第五目标区块的输出通道为160个、所述第六目标区块的输出通道为320个。
  8. 一种人脸识别系统,其中,所述系统包括:
    图像获取模块,用于获取包括人脸信息的待识别图像;
    特征提取模块,用于通过人脸特征提取模型提取所述待识别图像中的人脸特征,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构;及
    图像识别模块,用于根据所述人脸特征,对所述待识别图像执行人脸识别操作。
  9. 一种计算机设备,包括存储器、处理器以及存储在存储器上并可在处理器上运行的计算机程序,其中,所述处理器执行所述计算机程序时用于实现以下步骤:
    获取包括人脸信息的待识别图像;
    通过人脸特征提取模型提取所述待识别图像中的人脸特征,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构;及
    根据所述人脸特征,对所述待识别图像执行人脸识别操作。
  10. 根据权利要求9所述的计算机设备,其中,所述注意力结构包括压缩奖惩结构,其中:
    所述压缩奖惩结构中的压缩结构,用于对所述深度可分离卷积结构输出的特征图进行处理以得到所述特征图的全局感受野信息;所述压缩奖惩结构中的奖励结构,用于根据所述压缩结构提供的全局感受野信息对所述特征图中各个通道进行权值评比以生成各个通道的权重,并根据所述各个通道的权重对所述特征图中的各个特征进行标定操作。
  11. 根据权利要求9所述的计算机设备,其中,所述目标区块序列包括依顺序串行堆叠的多个目标区块,所述多个目标区块的数量根据计算资源和人脸识别操作的精确度确定。
  12. 根据权利要求11所述的计算机设备,其中,每个目标区块包括依顺序耦合的第一卷积层、第一批标准化操作层、第一激活函数层、深度可分离卷积结构、第二批标准化操作层、第二激活函数层、第三批标准化操作层、压缩奖惩结构、第二卷积层和第四批标准化操作层。
  13. 根据权利要求12所述的计算机设备,其中,所述第一卷积层中的卷积核数量为偶数;且,所述第一卷积层中的卷积核数量,为所在目标区块的上一个目标区块的输出通道数量的N倍,N为大于1的自然数;和/或,
    所述第二卷积层中的卷积核数量为偶数;所述第二卷积层中的卷积核数量,大于或等于同一个目标区块的第一卷积层中的卷积核数量的2倍并且小于或等于第一卷积层中的卷积核数量的4倍;所述第二卷积层中的卷积核数量,大于所在目标区块的上一个目标区块的第二卷积层中的卷积核数量。
  14. 根据权利要求13所述的计算机设备,其中,所述目标区块序列包括依顺序串行堆叠的第一目标区块、第二目标区块、第三目标区块、第四目标区块、第五目标区块和第六目标区块,其中,所述第一目标区块的输出通道为16个、所述第二目标区块的输出通道为24个、所述第三目标区块的输出通道为64个、所述第四目标区块的输出通道为96个、所述第五目标区块的输出通道为160个、所述第六目标区块的输出通道为320个。
  15. 一种计算机可读存储介质,其上存储有计算机程序,其中,所述计算机程序被处理器执行时用于实现以下步骤:
    获取包括人脸信息的待识别图像;
    通过人脸特征提取模型提取所述待识别图像中的人脸特征,所述人脸特征提取模型包括目标区块序列,所述目标区块序列包括一个或多个目标区块,每个目标区块包括深度可分离卷积结构和注意力结构;及
    根据所述人脸特征,对所述待识别图像执行人脸识别操作。
  16. 根据权利要求15所述的计算机可读存储介质,其中,所述注意力结构包括压缩奖惩结构,其中:
    所述压缩奖惩结构中的压缩结构,用于对所述深度可分离卷积结构输出的特征图进行处理以得到所述特征图的全局感受野信息;所述压缩奖惩结构中的奖励结构,用于根据所述压缩结构提供的全局感受野信息对所述特征图中各个通道进行权值评比以生成各个通道的权重,并根据所述各个通道的权重对所述特征图中的各个特征进行标定操作。
  17. 根据权利要求15所述的计算机可读存储介质,其中,所述目标区块序列包括依顺序串行堆叠的多个目标区块,所述多个目标区块的数量根据计算资源和人脸识别操作的精确度确定。
  18. 根据权利要求17所述的计算机可读存储介质,其中,每个目标区块包括依顺序耦合的第一卷积层、第一批标准化操作层、第一激活函数层、深度可分离卷积结构、第二批标准化操作层、第二激活函数层、第三批标准化操作层、压缩奖惩结构、第二卷积层和第四批标准化操作层。
  19. 根据权利要求18所述的计算机可读存储介质,其中,所述第一卷积层中的卷积核数量为偶数;且,所述第一卷积层中的卷积核数量,为所在目标区块的上一个目标区块的输出通道数量的N倍,N为大于1的自然数;和/或,
    所述第二卷积层中的卷积核数量为偶数;所述第二卷积层中的卷积核数量,大于或等于同一个目标区块的第一卷积层中的卷积核数量的2倍并且小于或等于第一卷积层中的卷积核数量的4倍;所述第二卷积层中的卷积核数量,大于所在目标区块的上一个目标区块的第二卷积层中的卷积核数量。
  20. 根据权利要求19所述的计算机可读存储介质,其中,所述目标区块序列包括依顺序串行堆叠的第一目标区块、第二目标区块、第三目标区块、第四目标区块、第五目标区块和第六目标区块,其中,所述第一目标区块的输出通道为16个、所述第二目标区块的输出通道为24个、所述第三目标区块的输出通道为64个、所述第四目标区块的输出通道为96个、所述第五目标区块的输出通道为160个、所述第六目标区块的输出通道为320个。
PCT/CN2021/071260 2020-02-28 2021-01-12 人脸识别方法和系统 WO2021169641A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202010128434.5A CN111353430A (zh) 2020-02-28 2020-02-28 人脸识别方法和系统
CN202010128434.5 2020-02-28

Publications (1)

Publication Number Publication Date
WO2021169641A1 true WO2021169641A1 (zh) 2021-09-02

Family

ID=71197164

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2021/071260 WO2021169641A1 (zh) 2020-02-28 2021-01-12 人脸识别方法和系统

Country Status (2)

Country Link
CN (1) CN111353430A (zh)
WO (1) WO2021169641A1 (zh)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688783A (zh) * 2021-09-10 2021-11-23 柚皮(重庆)科技有限公司 人脸特征提取方法、低分辨率人脸识别方法及设备
CN114155573A (zh) * 2021-11-05 2022-03-08 上海弘目智能科技有限公司 基于SE-ResNet网络的人种识别方法、装置及计算机存储介质
CN114331904A (zh) * 2021-12-31 2022-04-12 电子科技大学 一种人脸遮挡识别方法
CN116938601A (zh) * 2023-09-15 2023-10-24 湖南视觉伟业智能科技有限公司 一种用于实名制鉴权设备的分工鉴权方法
CN118334732A (zh) * 2024-06-13 2024-07-12 深圳市博锐高科科技有限公司 一种缺失人脸图像修补识别方法、芯片和终端

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111353430A (zh) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 人脸识别方法和系统

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170220904A1 (en) * 2015-04-02 2017-08-03 Tencent Technology (Shenzhen) Company Limited Training method and apparatus for convolutional neural network model
CN108830211A (zh) * 2018-06-11 2018-11-16 厦门中控智慧信息技术有限公司 基于深度学习的人脸识别方法及相关产品
CN110781784A (zh) * 2019-10-18 2020-02-11 高新兴科技集团股份有限公司 基于双路注意力机制的人脸识别方法、装置及设备
CN111353430A (zh) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 人脸识别方法和系统

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110383288B (zh) * 2019-06-06 2023-07-14 深圳市汇顶科技股份有限公司 人脸识别的方法、装置和电子设备

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20170220904A1 (en) * 2015-04-02 2017-08-03 Tencent Technology (Shenzhen) Company Limited Training method and apparatus for convolutional neural network model
CN108830211A (zh) * 2018-06-11 2018-11-16 厦门中控智慧信息技术有限公司 基于深度学习的人脸识别方法及相关产品
CN110781784A (zh) * 2019-10-18 2020-02-11 高新兴科技集团股份有限公司 基于双路注意力机制的人脸识别方法、装置及设备
CN111353430A (zh) * 2020-02-28 2020-06-30 深圳壹账通智能科技有限公司 人脸识别方法和系统

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
QIAN YONGSHENG, SHAO JIE, JI XINXIN, LI XIAORUI, MO CHEN, CHENG QIYU: "Multi-view Facial Expression Recognition based on Improved Convolutional Neural Network", COMPUTER ENGINEERING AND APPLICATIONS, vol. 54, no. 24, 31 December 2018 (2018-12-31), CN, pages 12 - 19, XP055840773, ISSN: 1002-8331, DOI: 10.3778/j.issn.1002-8331.1810-0315 *

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113688783A (zh) * 2021-09-10 2021-11-23 柚皮(重庆)科技有限公司 人脸特征提取方法、低分辨率人脸识别方法及设备
CN113688783B (zh) * 2021-09-10 2022-06-28 一脉通(深圳)智能科技有限公司 人脸特征提取方法、低分辨率人脸识别方法及设备
CN114155573A (zh) * 2021-11-05 2022-03-08 上海弘目智能科技有限公司 基于SE-ResNet网络的人种识别方法、装置及计算机存储介质
CN114331904A (zh) * 2021-12-31 2022-04-12 电子科技大学 一种人脸遮挡识别方法
CN114331904B (zh) * 2021-12-31 2023-08-08 电子科技大学 一种人脸遮挡识别方法
CN116938601A (zh) * 2023-09-15 2023-10-24 湖南视觉伟业智能科技有限公司 一种用于实名制鉴权设备的分工鉴权方法
CN116938601B (zh) * 2023-09-15 2023-11-24 湖南视觉伟业智能科技有限公司 一种用于实名制鉴权设备的分工鉴权方法
CN118334732A (zh) * 2024-06-13 2024-07-12 深圳市博锐高科科技有限公司 一种缺失人脸图像修补识别方法、芯片和终端

Also Published As

Publication number Publication date
CN111353430A (zh) 2020-06-30

Similar Documents

Publication Publication Date Title
WO2021169641A1 (zh) 人脸识别方法和系统
CN108205655B (zh) 一种关键点预测方法、装置、电子设备及存储介质
US10586108B2 (en) Photo processing method and apparatus
US11983850B2 (en) Image processing method and apparatus, device, and storage medium
WO2021248859A1 (zh) 视频分类方法、装置、设备及计算机可读存储介质
EP3982322A1 (en) Panoramic image and video splicing method, computer-readable storage medium, and panoramic camera
WO2020107847A1 (zh) 基于骨骼点的跌倒检测方法及其跌倒检测装置
US20220222918A1 (en) Image retrieval method and apparatus, storage medium, and device
WO2018021942A2 (ru) Распознавание лиц с помощью искусственной нейронной сети
WO2021051547A1 (zh) 暴力行为检测方法及系统
US8983193B1 (en) Techniques for automatic photo album generation
WO2021103187A1 (zh) 图像处理方法及装置、处理器、电子设备、存储介质
CN106803054B (zh) 人脸模型矩阵训练方法和装置
US20210099310A1 (en) Image processing method, image matching method, device and storage medium
TWI803243B (zh) 圖像擴增方法、電腦設備及儲存介質
CN110689046A (zh) 图像识别方法、装置、计算机装置及存储介质
WO2020147408A1 (zh) 一种人脸识别模型的评价方法、装置、存储介质及计算机设备
CN113343981A (zh) 一种视觉特征增强的字符识别方法、装置和设备
CN110008922B (zh) 用于终端设备的图像处理方法、设备、装置、介质
CN111382791A (zh) 深度学习任务处理方法、图像识别任务处理方法和装置
WO2024061123A1 (zh) 一种图像处理方法及其相关设备
CN113128278B (zh) 一种图像识别方法及装置
CN113743533B (zh) 一种图片聚类方法、装置及存储介质
CN116311425A (zh) 人脸识别模型训练方法、装置、计算机设备和存储介质
CN115082999A (zh) 合影图像人物分析方法、装置、计算机设备和存储介质

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21760846

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 12.01.2023)

122 Ep: pct application non-entry in european phase

Ref document number: 21760846

Country of ref document: EP

Kind code of ref document: A1