WO2019128646A1 - 人脸检测方法、卷积神经网络参数的训练方法、装置及介质 - Google Patents

人脸检测方法、卷积神经网络参数的训练方法、装置及介质 Download PDF

Info

Publication number
WO2019128646A1
WO2019128646A1 PCT/CN2018/119188 CN2018119188W WO2019128646A1 WO 2019128646 A1 WO2019128646 A1 WO 2019128646A1 CN 2018119188 W CN2018119188 W CN 2018119188W WO 2019128646 A1 WO2019128646 A1 WO 2019128646A1
Authority
WO
WIPO (PCT)
Prior art keywords
face
image
training
neural network
training sample
Prior art date
Application number
PCT/CN2018/119188
Other languages
English (en)
French (fr)
Inventor
严蕤
牟永强
Original Assignee
深圳励飞科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳励飞科技有限公司 filed Critical 深圳励飞科技有限公司
Publication of WO2019128646A1 publication Critical patent/WO2019128646A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks

Definitions

  • the present invention relates to the field of image recognition technologies, and in particular, to a face detection method, a training method, device and medium for convolutional neural network parameters.
  • face recognition technology can help people solve many practical problems.
  • the basis of face recognition technology is face detection technology.
  • the accuracy of face detection and the change of face pose will have a significant impact on the accuracy of face recognition.
  • a face detection algorithm is generally used to detect a face in a picture, and then the posture of the captured face picture is determined, and then a picture with appropriate posture is selected for face recognition.
  • it is necessary to repeatedly calculate the vector features of the picture which takes more time, thereby reducing the efficiency of face recognition.
  • An aspect of the present invention provides a face detection method, where the face detection method includes:
  • the method includes:
  • identifying whether the face to be detected includes a face and estimating a face pose, wherein training samples of the training sample set of the convolutional neural network are trained
  • the image includes position data and posture data of the face
  • the posture information of the face in the image to be detected is output.
  • the method for detecting a face further includes:
  • the obtaining, according to the location data of the face in the training sample image and the clustering algorithm, the anchor frame of each of the plurality of feature units including:
  • the length and the width of the anchor frame to be determined when the iterative end condition corresponding to the clustering algorithm arrives are obtained, and the anchor frame of the feature unit is obtained.
  • the method for detecting a face further includes:
  • the preset convolutional neural network model is trained according to the preset loss function and the training algorithm, and the value of the network parameter of the preset convolutional neural network model is obtained, and the trained convolutional neural network is obtained, and the preset loss function is obtained. It is used to calculate the loss of the presence or absence of a face in the training sample image, the loss of the face pose in the training sample image, and the loss of the offset amount of the region determined by the position data of the face in the training sample image.
  • the method for detecting a face further includes:
  • location data of the face includes at least two sets of location data, obtain an accurate location of the face in the image to be detected by using a non-maximum suppression algorithm
  • the face pose information when the face in the image to be detected is at the accurate position is output.
  • Another aspect of the present invention also provides a training method for convolving neural network parameters, and the training method for convolving neural network parameters includes:
  • the training sample image in the training sample set includes position data and posture data of the face
  • Performing a preset convolutional neural network model according to the training sample set training, the training algorithm, and the preset loss function to obtain a value of a network parameter of the preset volume neural network model, where the preset loss function is used to calculate a training sample The loss of the presence or absence of a face in the image, the loss of the face pose in the training sample image, and the loss of the offset of the region determined by the position data of the face in the training sample image.
  • Another aspect of the present invention also provides a face detecting device, the face detecting device comprising:
  • An image acquisition module configured to acquire an image to be detected
  • a processing module configured to input the image to be detected into a trained convolutional neural network, identify whether a face is included in the image to be detected, and estimate a face pose, wherein training the convolutional neural network is trained
  • the training sample image in the sample set includes position data and posture data of the face;
  • an output module configured to output posture information of the face in the image to be detected if the image to be detected includes a human face.
  • the face detecting device further includes:
  • a feature extraction module configured to extract a feature of the training sample image by using a convolution layer of the convolutional neural network model for training, to obtain a feature map, where the feature map is composed of several feature units;
  • a calculation module configured to acquire an anchor frame of each of the plurality of feature units according to the location data of the face in the training sample image and the clustering algorithm.
  • the computing module is specifically configured to:
  • the length and the width of the anchor frame to be determined when the iterative end condition corresponding to the clustering algorithm arrives are obtained, and the anchor frame of the feature unit is obtained.
  • the face detecting device further includes:
  • a parameter obtaining module configured to train a preset convolutional neural network model according to a preset loss function and a training algorithm, obtain a value of a network parameter of the preset convolutional neural network model, and obtain a convolutional neural network obtained by the training,
  • the preset loss function is used to calculate the loss of the presence or absence of a face in the training sample image, the loss of the face pose in the training sample image, and the loss of the offset of the region determined by the position data of the face in the training sample image.
  • the face detecting device further includes:
  • a location obtaining module configured to acquire, according to the trained convolutional neural network, location data of a face in the image to be detected
  • a de-duplication module configured to acquire, by using a non-maximum value suppression algorithm, an accurate position of a face in the image to be detected if the location data of the face includes at least two sets of location data;
  • a gesture acquiring module configured to output face pose information when the face in the image to be detected is in the accurate position.
  • Another aspect of the present invention provides a training apparatus for convolving neural network parameters, and the training apparatus for convolving neural network parameters includes:
  • a sample obtaining module configured to acquire a training sample set, where the training sample image in the training sample set includes location data and posture data of the face;
  • a training module configured to train a preset convolutional neural network model according to the training sample set training, a training algorithm, and a preset loss function, to obtain a value of a network parameter of the preset volume neural network model, the preset loss function It is used to calculate the loss of the presence or absence of a face in the training sample image, the loss of the face pose in the training sample image, and the loss of the offset amount of the region determined by the position data of the face in the training sample image.
  • Still another aspect of the present invention provides a computer apparatus, the computer apparatus comprising: a memory for storing at least one instruction; and a processor for executing an instruction stored in the memory to implement the face detection method and/or Or the steps of the training method of convolutional neural network parameters.
  • Still another aspect of the present invention provides a computer readable storage medium having stored therein at least one instruction executed by a processor in a computer device to implement the above-described face detection method and / or steps of the training method of convolutional neural network parameters.
  • Still another aspect of the present invention provides an integrated circuit mounted in a computer device, such that the computer device functions as a face detection method and/or a training method of convolutional neural network parameters.
  • the present invention acquires an image to be detected; inputs the image to be detected into a trained convolutional neural network, identifies whether the image to be detected includes a human face, and estimates a face pose, wherein the convolutional neural network is trained
  • the training sample image in the training sample set includes position data and posture data of the face; if the image to be detected includes a face, the posture information of the face in the image to be detected is output. Since the training sample image in the training sample set of the trained convolution network includes the position data and the posture data of the face, the trained convolution network can identify whether the image to be detected includes a face and the face in the image to be detected.
  • the attitude data, through the convolutional neural network model not only can detect the face but also synchronously estimate the face pose, without repeatedly extracting the image features through multiple models, avoiding the cumbersome operation process in the face recognition process. Improve the efficiency of face recognition.
  • FIG. 1 is a flowchart of a method for detecting a face according to an embodiment of the present invention
  • FIG. 2 is a schematic diagram of before and after processing an image by a non-maximum suppression algorithm in an embodiment of the present invention
  • FIG. 3 is a flowchart of a training method for convolutional neural network parameters according to an embodiment of the present invention
  • FIG. 4 is a functional block diagram of a face detecting apparatus according to an embodiment of the present invention.
  • FIG. 5 is a functional block diagram of a training device for convolving neural network parameters according to an embodiment of the present invention
  • FIG. 6 is a schematic diagram of a computer device according to an embodiment of the present invention.
  • FIG. 1 is a schematic flowchart of a method for detecting a face according to an embodiment of the present invention. As shown in FIG. 1, the face detection method may include the following steps:
  • the face detection method according to the present invention can be applied to a computer device, and the computer device can be a computer device such as a network camera or a notebook.
  • the image to be detected may be an image collected by a computer device or an image received from another computer device.
  • the image to be detected may be a face image or a non-face image.
  • S11 input the image to be detected to the trained convolutional neural network, identify whether the image to be detected includes a human face, and estimate a face pose, wherein training the training sample set of the convolutional neural network
  • the training sample image includes position data and posture data of the face.
  • CNN Convolutional Neural Network
  • VGG-16 VGG-16, GoogleNet, ResNet50, and the like.
  • the convolutional neural network obtained by the training described in the embodiment of the present invention can be trained by any convolutional neural network model.
  • the training process uses the training data (used to obtain the input and output values of the model) and the training algorithm to obtain the network parameters of the convolutional neural network model.
  • the convolutional neural network obtained at this time can be called the convolution obtained by training.
  • the neural network, the trained convolutional neural network can predict the output value according to the input value, that is, output the corresponding result according to the input image.
  • the training sample set of the training convolutional neural network model includes a training sample image
  • the training sample image may include a face image and a non-face image, and the more samples of the face image, the convolution obtained by the training.
  • the accuracy of the neural network output is higher.
  • the training sample image in the training sample set of the training convolutional neural network includes the position data and the posture data of the face, that is, when training the convolutional neural network model, the position data and the posture data of the face of the training sample image can be acquired, and the training is performed.
  • the position data and the pose data of the face of the training sample image obtained in the process must first extract the feature image of the sample image to obtain the face, and then acquire the position data and the posture data of the face.
  • the position data of the face may be the abscissa and the ordinate of the face of the face of the face, and the length and width of the face.
  • the posture data of the face may be a pitch pitch of the face, a yaw angle of the yaw, and a rollover. Angle roll, pitch represents the angle of flipping on the face of the person, yaw represents the angle of the face flipping left and right, and roll represents the angle of rotation in the plane of the face.
  • Obtaining the required data from the training sample image can be referred to as labeling the training sample image.
  • the input value of the convolutional neural network model may be a training sample image, and the purpose of training the convolutional neural network model is to learn to obtain the position data and the posture data of the face in the training sample image according to the input training sample image, then the model After training, it can be used to obtain the position data and posture data of the face of any one image.
  • the position data and the posture data of the face may be empty.
  • a method of training the trained convolutional neural network may also be included.
  • the training sample image may be processed by the following method:
  • the above convolutional neural network for training refers to a convolutional neural network model used in the specific implementation.
  • Different convolutional neural network models have different convolutional layers, and each convolutional layer has its corresponding convolutional kernel. (matrix).
  • the convolutional neural network model used for training is VGG-16, which has 16 layers of network layers in VGG-16, of which the convolution layer has 13 layers.
  • the feature map obtained by extracting the features of the training sample image by the convolutional layer of the convolutional neural network model for training is a process of extracting the image features of the training sample, and the obtained feature map is used to represent the training sample image.
  • the feature of the training sample image is extracted by the convolution layer, specifically, the convolution operation is performed by the convolution layer to extract the feature of the training sample image.
  • the convolution operation is a process of multiplying and converging the convolution kernel with the corresponding position of the training sample image, and obtaining another matrix after the convolution operation. If the convolutional neural network model has multiple convolution layers, it can be performed multiple times. Convolution operation.
  • the feature map described above is composed of several feature units.
  • the feature map may divide the feature map into parts according to a preset ratio, and each part may be referred to as a feature unit, and the feature map is composed of the feature units. For example, if the feature image is divided into 9 parts according to 3*3, the feature map is composed of 9 feature units.
  • the clustering algorithm may be a K-means algorithm, an FCM clustering algorithm, a SOM clustering algorithm, etc., specifically, the length and width of the anchor frame are obtained by a clustering algorithm, and one feature unit may correspond to multiple anchor frames.
  • the acquisition of the anchor frame is usually manually marked.
  • the length of the anchor frame is obtained by the clustering algorithm, and the anchor frame of each feature unit can be determined according to the scale of each feature unit, thereby accurately Responding to the aspect ratio of the face to be detected reduces the interference of the manual prior, and also makes the detection more accurate.
  • a linear classifier (such as a linear SVM classifier) can be used to determine whether there is a face in the anchor frame.
  • the convolutional neural network for training it is possible to determine whether the current anchor frame contains a human face, and if so, extract the image features in the anchor frame.
  • the extracted feature is simple, and the content of the feature extracted again for the second time is more accurate and rich, so that the image of the training sample obtained after the second extraction feature is more To be accurate, it helps to improve the accuracy of training results.
  • the acquiring, according to the location data of the face in the training sample image and the clustering algorithm, the anchor frame of each of the plurality of feature units may include:
  • the anchor frame may also be referred to as an Anchor box. Since the length and width of the anchor frame to be determined are unknown, a value (which may be randomly initialized) may be initialized for determining the length and width of the anchor frame, respectively.
  • the ratio of the intersection of the standard frame (ie, the region determined according to the position data of the training sample image) of the anchor frame and the training sample image to be determined is calculated, and the distance parameter in the clustering algorithm is determined according to the ratio.
  • the distance in the clustering algorithm can be expressed as follows:
  • tbox represents the standard box in the training sample image (ie, the area determined according to the position data of the training sample image)
  • abox represents the length and width of the anchor frame to be determined
  • IOU tbox, abox
  • the anchor frame can be obtained by the method described above.
  • the ratio of the anchor frame to the standard frame of the training sample image to be determined may determine the overlapping area of the standard frame of the anchor frame and the training sample image to be determined, and the clustering is performed by the distance of the point-to-point.
  • the clustering method in the embodiment can more accurately reflect the problem to be solved (marking the area where the face may exist in the anchor frame), the operation efficiency is higher, and the obtained result is more accurate.
  • the convolutional neural network may also be trained by:
  • the preset convolutional neural network model is trained according to the preset loss function and the training algorithm, and the value of the network parameter of the preset convolutional neural network model is obtained, and the trained convolutional neural network is obtained, and the preset loss function is obtained. It is used to calculate the loss of the presence or absence of a face in the training sample image, the loss of the face pose in the training sample image, and the loss of the offset amount of the region determined by the position data of the face in the training sample image.
  • the above-mentioned preset convolutional neural model is a convolutional neural network model for training, such as VGG-16.
  • the purpose of training is to obtain the network parameters of the convolutional neural model, so that the output value obtained by the convolutional neural network can be as close as possible to the actual value, so that the input data can be accurately Make predictions. Therefore, during training, the loss function is used to calculate whether the output value of the convolutional neural network is close to the actual value during the training process. If the value of the loss function is smaller, the output value of the convolutional neural network is closer to the actual value.
  • the preset loss function is used to calculate the loss of the presence or absence of the face in the training sample image, the loss of the face pose in the training sample image, and the loss of the offset of the face labeling area in the training sample image.
  • the loss of the presence or absence of the face in the training sample image is determined, because the training sample image is composed of several feature units, and the anchor frame of each feature unit is obtained. Therefore, the loss of the presence or absence of the face in the training sample image can be obtained by acquiring the loss of the presence or absence of the face in each anchor frame, and the loss of the presence or absence of the face in the anchor frame can be expressed as:
  • N is the number of samples, which varies according to the number of samples selected each time;
  • c indicates confidence, specifically, Indicates the confidence that the i-th anchor box contains the face. Indicates that the jth anchor box does not contain the confidence of the face; i ⁇ Pos indicates that the i-th anchor box contains the face, and i ⁇ Neg indicates that the i-th anchor box does not contain the face.
  • the loss of the offset amount of the region determined by the position data of the face in the training sample can be obtained by the loss of the offset amount of the standard frame of the face in the training sample image, the anchor frame and the training sample.
  • the loss of the offset of the standard box of the face in the image is:
  • l represents the position information of the anchor frame
  • cx, cy, w, h represents the horizontal and vertical coordinates, length and width of the center point of the anchor frame
  • g represents the position information of the standard frame.
  • Smooth L1 represents the L1 norm map and has:
  • the loss of the face pose in the training sample can be obtained by obtaining the loss of the face pose in each anchor frame and the pose in the standard frame.
  • the loss of the face pose in the anchor frame and the pose in the standard frame can be expressed as:
  • the default loss function can be as follows:
  • the training algorithm can be gradient descent algorithm, Newton algorithm, conjugate gradient algorithm and so on.
  • the specific training algorithm can be obtained from the prior art, and details are not described herein again.
  • the invention increases the calculation of the face pose information in calculating the network loss of the neural network, and can directly output the gesture of the face while detecting the face. And because the loss function can be used to evaluate the extra-neuronal model, the more accurate the face pose, the smaller the loss. Therefore, the calculation of the face pose in the calculation of the network loss of the neural network makes the face detection and the attitude estimation mutually promote the effect, and further improves the accuracy of the face detection and the attitude estimation.
  • step S11 it can be obtained whether the image to be detected contains a human face.
  • information that does not include a face may be output.
  • the output no indicates that the face to be detected does not include a face.
  • information including a face may be output, for example, an output yes indicates that the face to be detected contains a face.
  • the convolutional neural network trained by the present invention learns to recognize the position and posture data of the face by training the convolutional neural network model, and whether the face is the basis for learning the position and posture of the face, the extraction is performed during the training process.
  • the characteristics of the image and learning can be learned whether the recognition image is found on the face, so the trained convolutional neural network can output whether the image to be detected contains the detection result of the face.
  • the image to be detected includes a human face, output posture information of the face in the image to be detected.
  • the posture data of the face in the acquired image is also trained. Therefore, the posture data of the face in the image to be detected can be output, and the posture of the face can be expressed by the pitch angle pitch, the yaw angle yaw, and the roll angle roll of the face.
  • step S12 and step S13 may be synchronous output. That is, if the image to be detected includes a human face, the output image to be detected includes the detection result of the face and the posture of the face. If the image to be detected does not include the face, the detection result that does not include the face may be directly output, and is not output.
  • the attitude information, or the output gesture information is a null value.
  • the method for detecting a face may further include:
  • location data of the face includes at least two sets of location data, obtain an accurate location of the face in the image to be detected by using a non-maximum suppression algorithm
  • the face pose information when the face in the image to be detected is at the accurate position is output.
  • Non-maximum suppression is to search for local maxima of images and suppress non-maximum elements.
  • the specific non-maximum suppression algorithm processing will not be described here, and can be obtained from the prior art.
  • FIG. 2 is a schematic diagram before and after the non-maximum suppression algorithm processing of the image.
  • the figure on the left side of Fig. 2 shows a schematic diagram in which a face is detected and an area in which a face exists (a region in which a face is a face) is identified according to the position of the face.
  • the figure on the right side of Fig. 2 shows the exact position of the acquired image after being processed by the non-maximum suppression algorithm. At this time, the redundant face frame in the image is removed, and the position of the face can be accurately obtained.
  • the position data of the face in the acquired image is also trained. Therefore, the position data of the face in the image to be detected can be output.
  • face detection it is possible to obtain position data of a plurality of sets of faces.
  • the accurate position of the face in the image to be detected is obtained by a non-maximum suppression algorithm.
  • a set of posture data of the face at the position (the pitch angle pitch, the yaw angle yaw and the roll angle roll of the face) can be detected, so that the exact position of the face is determined. After that, the posture information of the face at the accurate position can be acquired.
  • Obtaining the accurate position and posture of the face in the image to be detected by the non-maximum suppression algorithm can provide more accurate face information in the image to be detected, so as to further improve the processing accuracy by further performing image processing (such as image recognition). .
  • the face detection method provided by the present invention obtains an image to be detected by inputting the image to be detected into a convolutional neural network obtained by training, and identifies whether a face is included in the image to be detected and estimates a face pose, wherein
  • the training sample image of the training sample set of the training convolutional neural network includes position data and posture data of the face; and outputs whether the image to be detected includes a detection result of the face; if the image to be detected includes a face, the output The posture information of the face in the image to be detected. Since the training sample image in the training sample set of the trained convolution network includes the position data and the posture data of the face, the trained convolution network can identify whether the image to be detected includes a face and the face in the image to be detected.
  • the attitude data, through the convolutional neural network model not only can detect the face but also synchronously estimate the face pose, without repeatedly extracting the image features through multiple models, avoiding the cumbersome operation process in the face recognition process. Improve the efficiency of face recognition.
  • FIG. 3 is a schematic flowchart of a training method for convolutional neural network parameters according to an embodiment of the present invention. As shown in FIG. 3, the training method of convolutional neural network parameters may include the following steps:
  • the training method of the present invention can be used to train an arbitrary convolutional neural network.
  • the training sample set is used to train the convolutional neural network model.
  • the type of the specific training sample set and the processing of the training sample set can be referred to the related description in the foregoing embodiment, and details are not described herein again.
  • the training process uses the training data (used to obtain the input and output values of the model) and the training algorithm to obtain the network parameters of the convolutional neural network model.
  • the convolutional neural network obtained at this time can be called the convolution obtained by training.
  • the neural network, the trained convolutional neural network can predict the output value according to the input value, that is, output the corresponding result according to the input image.
  • the feature of the training sample image may be extracted by a convolution layer of a preset convolutional neural network model to obtain a feature map representing the training sample image. Then, the feature map is divided into several feature units according to a preset ratio, and then multiple anchor frames in each feature unit are obtained according to the clustering algorithm. After obtaining multiple anchor frames, the anchor frame of the existing face is input into the next layer network of the preset convolutional neural network model, and it is determined again whether there is a human face in the anchor frame, and the anchor frame with the face is characterized. extract.
  • the information (position and posture) of the feature and feature response extracted at this time is compared with the position data and the posture data of the included face of the training sample image, and trained according to a preset loss function to obtain a preset.
  • Network parameters of the convolutional neural network model are compared with the position data and the posture data of the included face of the training sample image, and trained according to a preset loss function to obtain a preset.
  • the training method of the convolutional neural network parameter provided by the invention can train the obtained convolutional neural network to perform face detection, and can acquire the position and posture information of the face in the image. Moreover, in the training, by increasing the calculation of the face pose in the network loss, the face detection and the attitude estimation achieve mutual promotion effects, and the accuracy of the face detection and the attitude estimation is further improved.
  • FIG. 3 is a structural diagram of a face detection apparatus according to an embodiment of the present invention.
  • the face detection apparatus may include an image acquisition module 310, a processing module 320, and an output module 330.
  • the image obtaining module 310 is configured to acquire an image to be detected.
  • the image to be detected may be an image collected by a computer device or an image received from another computer device.
  • the image to be detected may be a face image or a non-face image.
  • the processing module 320 is configured to input the image to be detected into the trained convolutional neural network, identify whether the image to be detected includes a human face, and estimate a face pose, wherein training the convolutional neural network
  • the training sample images in the training sample set include position data and posture data of the face.
  • CNN Convolutional Neural Network
  • VGG-16 VGG-16, GoogleNet, ResNet50, and the like.
  • the convolutional neural network obtained by the training described in the embodiment of the present invention can be trained by any convolutional neural network model.
  • the training process uses the training data (used to obtain the input and output values of the model) and the training algorithm to obtain the network parameters of the convolutional neural network model.
  • the convolutional neural network obtained at this time can be called the convolution obtained by training.
  • the neural network, the trained convolutional neural network can predict the output value according to the input value, that is, output the corresponding result according to the input image.
  • the training sample set of the training convolutional neural network model includes a training sample image
  • the training sample image may include a face image and a non-face image, and the more samples of the face image, the convolution obtained by the training.
  • the accuracy of the neural network output is higher.
  • the training sample image in the training sample set of the training convolutional neural network includes position data and posture data of the face, that is, when training the convolutional neural network model, position data and posture data of the face of the training sample image can be acquired.
  • the position data and the pose data of the face of the training sample image obtained during the training process must first extract the feature image of the sample image to obtain the face, and then acquire the position data and the posture data of the face.
  • the position data of the face may be the abscissa and the ordinate of the face of the face of the face, and the length and width of the face.
  • the posture data of the face may be a pitch pitch of the face, a yaw angle of the yaw, and a rollover. Angle roll, pitch represents the angle of flipping on the face of the person, yaw represents the angle of the face flipping left and right, and roll represents the angle of rotation in the plane of the face.
  • Obtaining the required data from the training sample image can be referred to as labeling the training sample image.
  • labeling the data can be normalized.
  • the abscissa of the face of the face and the length of the face are respectively divided by the length of the training sample image, and the ordinate of the face of the face of the face.
  • the width of the face is divided by the width of the training sample image. Divide the face's pitch, yaw, and roll by ⁇ , respectively.
  • the input value of the convolutional neural network model may be a training sample image, and the purpose of training the convolutional neural network model is to learn to obtain position data and posture data of the face in the training sample image according to the input training sample image, and then the model training After that, it can be used to acquire the position data and posture data of the face of any one image.
  • the position data and the posture data of the face may be empty.
  • a module for training the trained convolutional neural network may be further included, and the training sample image may be processed by the feature extraction module and the calculation module when training the convolutional neural network:
  • a feature extraction module configured to extract a feature of the training sample image by using a convolution layer of the convolutional neural network model for training, to obtain a feature map, where the feature map is composed of several feature units.
  • a calculation module configured to acquire an anchor frame of each of the plurality of feature units according to the location data of the face in the training sample image and the clustering algorithm.
  • the above convolutional neural network for training refers to a convolutional neural network model used in the specific implementation.
  • Different convolutional neural network models have different convolutional layers, and each convolutional layer has its corresponding convolutional kernel. (matrix).
  • the convolutional neural network model used for training is VGG-16, which has 16 layers of network layers in VGG-16, of which the convolution layer has 13 layers.
  • the feature map obtained by extracting the features of the training sample image by the convolutional layer of the convolutional neural network model for training is a process of extracting the image features of the training sample, and the obtained feature map is used to represent the training sample image.
  • the feature of the training sample image is extracted by the convolution layer, specifically, the convolution operation is performed by the convolution layer to extract the feature of the training sample image.
  • the convolution operation is a process of multiplying and converging the convolution kernel with the corresponding position of the training sample image, and obtaining another matrix after the convolution operation. If the convolutional neural network model has multiple convolution layers, it can be performed multiple times. Convolution operation.
  • the feature map described above is composed of several feature units.
  • the feature map may divide the feature map into parts according to a preset ratio, and each part may be referred to as a feature unit, and the feature map is composed of the feature units. For example, if the feature image is divided into 9 parts according to 3*3, the feature map is composed of 9 feature units.
  • the clustering algorithm may be a K-means algorithm, an FCM clustering algorithm, a SOM clustering algorithm, etc., specifically, the length and width of the anchor frame are obtained by a clustering algorithm, and one feature unit may correspond to multiple anchor frames.
  • the acquisition of the anchor frame is usually manually marked.
  • the length of the anchor frame is obtained by the clustering algorithm, and the anchor frame of each feature unit can be determined according to the scale of each feature unit, thereby accurately Responding to the aspect ratio of the face to be detected reduces the interference of the manual prior, and also makes the detection more accurate.
  • a linear classifier (such as a linear SVM classifier) can be used to determine whether there is a face in the anchor frame.
  • the convolutional neural network for training it is possible to determine whether the current anchor frame contains a human face, and if so, extract the image features in the anchor frame.
  • the extracted feature is simple, and the content of the feature extracted again for the second time is more accurate and rich, so that the image of the training sample obtained after the second extraction feature is more To be accurate, it helps to improve the accuracy of training results.
  • the calculating module may be specifically configured to:
  • the anchor frame may also be referred to as an Anchor box. Since the length and width of the anchor frame to be determined are unknown, a value (which may be randomly initialized) may be initialized for determining the length and width of the anchor frame, respectively.
  • the ratio of the intersection of the standard frame (ie, the region determined according to the position data of the training sample image) of the anchor frame and the training sample image to be determined is calculated, and the distance parameter in the clustering algorithm is determined according to the ratio.
  • the distance in the clustering algorithm can be expressed as follows:
  • tbox represents the standard box in the training sample image (ie, the area determined according to the position data of the training sample image)
  • abox represents the length and width of the anchor frame to be determined
  • IOU tbox, abox
  • anchor frame can be obtained by the calculation module for each of the several feature units.
  • the ratio of the anchor frame to the standard frame of the training sample image to be determined may determine the overlapping area of the standard frame of the anchor frame and the training sample image to be determined, and the clustering is performed by the distance of the point-to-point.
  • the clustering method in the embodiment can more accurately reflect the problem to be solved (marking the area where the face may exist in the anchor frame), the operation efficiency is higher, and the obtained result is more accurate.
  • the trained convolutional neural network may also be obtained by using a parameter obtaining module:
  • a parameter obtaining module configured to train a preset convolutional neural network model according to a preset loss function and a training algorithm, obtain a value of a network parameter of the preset convolutional neural network model, and obtain a convolutional neural network obtained by the training,
  • the preset loss function is used to calculate the loss of the presence or absence of a face in the training sample image, the loss of the face pose in the training sample image, and the loss of the offset of the region determined by the position data of the face in the training sample image.
  • the above-mentioned preset convolutional neural model is a convolutional neural network model for training, such as VGG-16.
  • the purpose of training is to obtain the network parameters of the convolutional neural model, so that the output value obtained by the convolutional neural network can be as close as possible to the actual value, so that the input data can be accurately Make predictions. Therefore, during training, the loss function is used to calculate whether the output value of the convolutional neural network is close to the actual value during the training process. If the value of the loss function is smaller, the output value of the convolutional neural network is closer to the actual value.
  • the preset loss function is used to calculate the loss of the presence or absence of the face in the training sample image, the loss of the face pose in the training sample image, and the loss of the offset of the face labeling area in the training sample image.
  • the loss of the presence or absence of the face in the training sample image is determined, because the training sample image is composed of several feature units, and the anchor frame of each feature unit is obtained. Therefore, the loss of the presence or absence of the face in the training sample image can be obtained by acquiring the loss of the presence or absence of the face in each anchor frame, and the loss of the presence or absence of the face in the anchor frame can be expressed as:
  • N is the number of samples, which varies according to the number of samples selected each time;
  • c indicates confidence, specifically, Indicates the confidence that the i-th anchor box contains the face. Indicates that the jth anchor box does not contain the confidence of the face; i ⁇ Pos indicates that the i-th anchor box contains the face, and i ⁇ Neg indicates that the i-th anchor box does not contain the face.
  • the loss of the offset amount of the region determined by the position data of the face in the training sample can be obtained by the loss of the offset amount of the standard frame of the face in the training sample image, the anchor frame and the training sample.
  • the loss of the offset of the standard box of the face in the image is:
  • l represents the position information of the anchor frame
  • cx, cy, w, h represents the horizontal and vertical coordinates, length and width of the center point of the anchor frame
  • g represents the position information of the standard frame.
  • Smooth L1 represents the L1 norm map and has:
  • the loss of the face pose in the training sample can be obtained by obtaining the loss of the face pose in each anchor frame and the pose in the standard frame.
  • the loss of the face pose in the anchor frame and the pose in the standard frame can be expressed as:
  • the default loss function can be as follows:
  • the training algorithm can be gradient descent algorithm, Newton algorithm, conjugate gradient algorithm and so on.
  • the specific training algorithm can be obtained from the prior art, and details are not described herein again.
  • the invention increases the calculation of the face pose information in calculating the network loss of the neural network, and can directly output the gesture of the face while detecting the face. And because the loss function can be used to evaluate the extra-neuronal model, the more accurate the face pose, the smaller the loss. Therefore, the calculation of the face pose in the calculation of the network loss of the neural network makes the face detection and the attitude estimation mutually promote the effect, and further improves the accuracy of the face detection and the attitude estimation.
  • the output module 330 is configured to output whether the image to be detected includes a detection result of a human face.
  • whether the image to be detected includes a human face can be obtained.
  • information that does not include a face may be output.
  • the output no indicates that the face to be detected does not include a face.
  • information including a face may be output, for example, an output yes indicates that the face to be detected contains a face.
  • the convolutional neural network trained by the present invention learns to recognize the position and posture data of the face by training the convolutional neural network model, and whether the face is the basis for learning the position and posture of the face, the extraction is performed during the training process.
  • the characteristics of the image and learning can be learned whether the recognition image is found on the face, so the trained convolutional neural network can output whether the image to be detected contains the detection result of the face.
  • the output module 330 is further configured to output posture information of the face in the image to be detected if the image to be detected includes a human face.
  • the posture data of the face in the acquired image is also trained. Therefore, the posture data of the face in the image to be detected can be output, and the posture of the face can be expressed by the pitch angle pitch, the yaw angle yaw, and the roll angle roll of the face.
  • the output module 330 may be a synchronous output whether the detection result of the face and the posture information of the face are included. That is, if the image to be detected includes a human face, the output image to be detected includes the detection result of the face and the posture of the face. If the image to be detected does not include the face, the detection result that does not include the face may be directly output, and is not output.
  • the attitude information, or the output gesture information is a null value.
  • the face detecting device may further include:
  • a location obtaining module configured to acquire location data of a face in the image to be detected according to the convolutional neural network obtained by the training.
  • a de-duplication module configured to obtain an accurate location of a face in the image to be detected by a non-maximum suppression algorithm if the location data of the face includes at least two sets of location data.
  • a gesture acquiring module configured to output face pose information when the face in the image to be detected is in the accurate position.
  • Non-maximum suppression is to search for local maxima of images and suppress non-maximum elements.
  • the specific non-maximum suppression algorithm processing will not be described here, and can be obtained from the prior art.
  • FIG. 2 is a schematic diagram before and after the non-maximum suppression algorithm processing of the image.
  • the diagram on the left side of Fig. 2 shows a diagram in which a face is detected, and an area in which a face exists (a region in which a face is a face) is identified according to the position of the face.
  • the figure on the right side of Fig. 2 shows the exact position of the acquired image after being processed by the non-maximum suppression algorithm. At this time, the redundant face frame in the image is removed, and the position of the face can be accurately obtained.
  • the position data of the face in the acquired image is also trained. Therefore, the position data of the face in the image to be detected can be output.
  • face detection it is possible to obtain position data of a plurality of sets of faces.
  • the accurate position of the face in the image to be detected is obtained by a non-maximum suppression algorithm.
  • a set of posture data of the face at the position (the pitch angle pitch, the yaw angle yaw and the roll angle roll of the face) can be detected, so that the exact position of the face is determined. After that, the posture information of the face at the accurate position can be acquired.
  • Obtaining the accurate position and posture of the face in the image to be detected by the non-maximum suppression algorithm can provide more accurate face information in the image to be detected, so as to further improve the processing accuracy by further performing image processing (such as image recognition). .
  • the face detection device acquires an image to be detected through an image acquisition module; the processing module inputs the image to be detected into a trained convolutional neural network, and identifies whether the image to be detected includes a face and a face Estimating the posture, wherein the training sample image in the training sample set of the training convolutional neural network includes position data and posture data of the face; and the output module outputs whether the image to be detected includes a detection result of the face;
  • the detected image includes a human face, and outputs posture information of the face in the image to be detected.
  • the trained convolution network can identify whether the image to be detected includes a face and the face in the image to be detected.
  • the attitude data through the convolutional neural network model, not only can detect the face but also synchronously estimate the face pose, without repeatedly extracting the image features through multiple models, avoiding the cumbersome operation process in the face recognition process. Improve the efficiency of face recognition.
  • FIG. 5 is a structural diagram of a training apparatus for convolving neural network parameters according to an embodiment of the present invention.
  • the training apparatus for convolving neural network parameters may include: a sample obtaining module 410 and a training module 420.
  • the sample obtaining module 410 is configured to acquire a training sample set, where the training sample image includes position data and posture data of the face.
  • the training device of the present invention can be used to train any convolutional neural network.
  • the training sample set is used to train the convolutional neural network model.
  • the type of the specific training sample set and the processing of the training sample set can be referred to the related description in the foregoing embodiment, and details are not described herein again.
  • the training module 420 is configured to train a preset convolutional neural network model according to the training sample set training, the training algorithm, and the preset loss function, to obtain a value of the network parameter of the preset volume neural network model, and the preset loss
  • the function is used to calculate the loss of the presence or absence of a face in the training sample image, the loss of the face pose in the training sample image, and the loss of the offset of the region determined by the position data of the face in the training sample image.
  • the training process uses the training data (used to obtain the input and output values of the model) and the training algorithm to obtain the network parameters of the convolutional neural network model.
  • the convolutional neural network obtained at this time can be called the convolution obtained by training.
  • the neural network, the trained convolutional neural network can predict the output value according to the input value, that is, output the corresponding result according to the input image.
  • the feature of the training sample image may be extracted by a convolution layer of a preset convolutional neural network model to obtain a feature map representing the training sample image. Then, the feature map is divided into several feature units according to a preset ratio, and then multiple anchor frames in each feature unit are obtained according to the clustering algorithm. After obtaining multiple anchor frames, the anchor frame of the existing face is input into the next layer network of the preset convolutional neural network model, and it is determined again whether there is a human face in the anchor frame, and the anchor frame with the face is characterized. extract.
  • the information (position and posture) of the feature and feature response extracted at this time is compared with the position data and the posture data of the included face of the training sample image, and trained according to a preset loss function to obtain a preset.
  • Network parameters of the convolutional neural network model are compared with the position data and the posture data of the included face of the training sample image, and trained according to a preset loss function to obtain a preset.
  • the training device for convolving neural network parameters provided by the present invention can train the obtained convolutional neural network to perform face detection, and can acquire position and posture information of a face in the image. Moreover, in the training, by increasing the calculation of the face pose in the network loss, the face detection and the attitude estimation achieve mutual promotion effects, and the accuracy of the face detection and the attitude estimation is further improved.
  • FIG. 6 is a schematic diagram of a computer device 1 according to an embodiment of the present invention.
  • the computer device 1 includes a memory 20, a processor 30, and a computer program 40 stored in the memory 20 and operable on the processor 30, such as a program for face detection.
  • the processor 30 executes the computer program 40, the steps in the embodiment of the face detection method described above, or the steps in the embodiment of the training method for convolving neural network parameters, such as steps S10-S13 shown in FIG. Or steps S20 to S21 shown in FIG. 2 .
  • the processor 30 executes the computer program 40, the functions of the modules/units in the above device embodiments are implemented, such as modules 310-330, or modules 410-420.
  • the computer program 40 can be partitioned into one or more modules/units that are stored in the memory 20 and executed by the processor 30 to complete this invention.
  • the one or more modules/units may be a series of computer program instruction segments capable of performing a particular function for describing the execution of the computer program 40 in the computer device 1.
  • the computer program 40 may be divided into the image obtaining module 310, the processing module 320, and the output module 330 in FIG. 4, or may be divided into the sample obtaining module 410 and the training module 420 in FIG. The foregoing embodiment.
  • the computer device 1 may be an embedded monitoring device such as an embedded network camera. It will be understood by those skilled in the art that the schematic diagram 6 is merely an example of the computer device 1 and does not constitute a limitation of the computer device 1. It may include more or less components than those illustrated, or may combine some components, or different. The components, such as the computer device 1, may also include input and output devices, network access devices, buses, and the like.
  • the processor 30 may be a central processing unit (CPU), or may be other general-purpose processors, a digital signal processor (DSP), an application specific integrated circuit (ASIC), Field-Programmable Gate Array (FPGA) or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, etc.
  • the general purpose processor may be a microprocessor or the processor 30 may be any conventional processor or the like, and the processor 30 is a control center of the computer device 1, and connects the entire computer device 1 by using various interfaces and lines. Various parts.
  • the memory 20 can be used to store the computer program 40 and/or modules/units by running or executing computer programs and/or modules/units stored in the memory 20, and by calling in memory.
  • the data within 20 implements various functions of the computer device 1.
  • the memory 20 may mainly include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application required for at least one function, and the like; the storage data area may store data created according to the use of the computer device 1. (such as audio data, image data, etc.) and so on.
  • the memory 20 may include a high-speed random access memory, and may also include a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (Secure Digital, SD).
  • a non-volatile memory such as a hard disk, a memory, a plug-in hard disk, a smart memory card (SMC), and a secure digital (Secure Digital, SD).
  • SMC smart memory card
  • SD Secure Digital
  • Card flash card, at least one disk storage device, flash device, or other volatile solid state storage device.
  • the modules/units integrated by the computer device 1 can be stored in a computer readable storage medium if implemented in the form of a software functional unit and sold or used as a stand-alone product. Based on such understanding, the present invention implements all or part of the processes in the foregoing embodiments, and may also be completed by a computer program to instruct related hardware.
  • the computer program may be stored in a computer readable storage medium. The steps of the various method embodiments described above may be implemented when the program is executed by the processor.
  • the computer program comprises computer program code, which may be in the form of source code, object code form, executable file or some intermediate form.
  • the computer readable medium may include any entity or device capable of carrying the computer program code, a recording medium, a USB flash drive, a removable hard disk, a magnetic disk, an optical disk, a computer memory, a read-only memory (ROM). , Random Access Memory (RAM), electrical carrier signals, telecommunications signals, and software distribution media. It should be noted that the content contained in the computer readable medium may be appropriately increased or decreased according to the requirements of legislation and patent practice in a jurisdiction, for example, in some jurisdictions, according to legislation and patent practice, computer readable media Does not include electrical carrier signals and telecommunication signals.
  • the above-described characteristic means of the present invention can be realized by an integrated circuit and control the function of the function of the face detecting method and/or the training method of the convolutional neural network parameter in any of the above embodiments.
  • the functions that can be implemented by the face detection method and the training method of the convolutional neural network parameters can be installed in the computer device by the integrated circuit of the present invention, so that the computer device can be used in any embodiment.
  • the functions that can be implemented by the computer device method are not described in detail herein.

Abstract

本发明公开了一种人脸检测方法,该人脸检测方法包括:获取待检测图像;将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;输出所述待检测图像是否包含人脸的检测结果;若所述待检测图像包含人脸,输出所述待检测图像中人脸的姿态信息。本发明还公开了一种人脸检测装置、卷积神经网络参数的训练方法、计算机装置和计算机可读存储介质。本发明可以在人脸检测时,同步估计人脸姿态,进而提高人脸识别的效率。

Description

人脸检测方法、卷积神经网络参数的训练方法、装置及介质
本申请要求于2017年12月28日提交中国专利局,申请号为201711462096.3、发明名称为“人脸检测方法、卷积神经网络参数的训练方法、装置及介质”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本发明涉及图像识别技术领域,具体涉及一种人脸检测方法、卷积神经网络参数的训练方法、装置及介质。
背景技术
随着信息技术的发展,人脸识别技术的应用逐渐广泛,在教育、交通、金融等各领域,通过人脸识别技术,能够帮助人们解决很多现实问题。人脸识别技术的基础是人脸检测技术,人脸检测的准确率以及人脸姿态的变化都会对人脸识别的准确率等方面产生重大的影响。
现有的人脸识别技术中,一般先通过人脸检测算法检测图片中的人脸,然后判断所截取的人脸图片的姿态,然后筛选姿态合适的图片用于人脸识别。但这需要重复计算图片的向量特征,耗时较多,进而降低人脸识别的效率。
发明内容
鉴于此,有必要提供一种人脸检测方法及装置、卷积神经网络参数的训练方法、计算机装置和计算机可读存储介质,能够在人脸检测时,同步估计人脸姿态,进而提高人脸识别的效率。
本发明一方面提供了一种人脸检测方法,所述人脸检测方法包括:
所述方法包括:
获取待检测图像;
将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像 中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;
若所述待检测图像包含人脸,输出所述待检测图像中人脸的姿态信息。
可选地,所述人脸检测方法还包括:
通过用于训练的卷积神经网络模型的卷积层提取训练样本图像的特征,得到特征图,所述特征图由若干特征单元组成;
根据所述训练样本图像中人脸的位置数据和聚类算法获取所述若干特征单元中每一特征单元的锚框。
可选地,所述根据所述训练样本图像中人脸的位置数据和聚类算法获取所述若干特征单元中每一特征单元的锚框,包括:
初始化特征单元的待确定锚框的长和宽;
计算所述待确定锚框与所述训练样本图像的位置数据所确定的标准框的交集与并集的比值,根据所述比值确定聚类算法中的距离参数;
迭代所述待确定锚框的长和宽,获取所述聚类算法对应的迭代结束条件到达时所述待确定锚框的长和宽,得到所述特征单元的锚框。
可选地,所述人脸检测方法还包括:
根据预设损失函数和训练算法训练预设卷积神经网络模型,获得所述预设卷积神经网络模型的网络参数的值,得到所述训练得到的卷积神经网络,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确定的区域的偏置量的损失。
可选地,所述人脸检测方法还包括:
根据所述训练得到的卷积神经网络获取所述待检测图像中人脸的位置数据;
若所述人脸的位置数据包含至少两组位置数据,通过非极大值抑制算法获取所述待检测图像中人脸的准确位置;
输出所述待检测图像中人脸在所述准确位置时的人脸姿态信息。
本发明另一方面还提供了一种卷积神经网络参数的训练方法,所述卷积神 经网络参数的训练方法包括:
获取训练样本集,所述训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;
根据所述训练样本集训练、训练算法和预设损失函数训练预设卷积神经网络模型,得到所述预设卷神经网络模型的网络参数的值,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确定的区域的偏置量的损失。
本发明另一方面还提供了一种人脸检测装置,所述人脸检测装置包括:
图像获取模块,用于获取待检测图像;
处理模块,用于将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;
输出模块,用于若所述待检测图像包含人脸,输出所述待检测图像中人脸的姿态信息。
可选地,所述人脸检测装置还包括:
特征提取模块,用于通过用于训练的卷积神经网络模型的卷积层提取训练样本图像的特征,得到特征图,所述特征图由若干特征单元组成;
计算模块,用于根据所述训练样本图像中人脸的位置数据和聚类算法获取所述若干特征单元中每一特征单元的锚框。
可选地,所述计算模块具体用于:
初始化特征单元的待确定锚框的长和宽;
计算所述待确定锚框与所述训练样本图像的位置数据所确定的标准框的交集与并集的比值,根据所述比值确定聚类算法中的距离参数;
迭代所述待确定锚框的长和宽,获取所述聚类算法对应的迭代结束条件到达时所述待确定锚框的长和宽,得到所述特征单元的锚框。
可选地,所述人脸检测装置还包括:
参数获取模块,用于根据预设损失函数和训练算法训练预设卷积神经网络模型,获得所述预设卷积神经网络模型的网络参数的值,得到所述训练得到的 卷积神经网络,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确定的区域的偏置量的损失。
可选地,所述人脸检测装置还包括:
位置获取模块,用于根据所述训练得到的卷积神经网络获取所述待检测图像中人脸的位置数据;
去重复模块,用于若所述人脸的位置数据包含至少两组位置数据,通过非极大值抑制算法获取所述待检测图像中人脸的准确位置;
姿态获取模块,用于输出所述待检测图像中人脸在所述准确位置时的人脸姿态信息。
本发明另一方面还提供了一种卷积神经网络参数的训练装置,所述卷积神经网络参数的训练装置包括:
样本获取模块,用于获取训练样本集,所述训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;
训练模块,用于根据所述训练样本集训练、训练算法和预设损失函数训练预设卷积神经网络模型,得到所述预设卷神经网络模型的网络参数的值,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确定的区域的偏置量的损失。
本发明再一方面还提供一种计算机装置,所述计算机装置包括:存储器,用于存储至少一个指令;及处理器,用于执行所述存储器中存储的指令以实现上述人脸检测方法和/或卷积神经网络参数的训练方法的步骤。
本发明再一方面还提供一种计算机可读存储介质,所述计算机可读存储介质中存储有至少一个指令,所述至少一个指令被计算机装置中的处理器执行以实现上述人脸检测方法和/或卷积神经网络参数的训练方法的步骤。
本发明又一方面还提供一种集成电路,所述集成电路安装于计算机装置中,使所述计算机装置发挥上述的人脸检测方法和/或卷积神经网络参数的训练方法能实现的功能。
本发明获取待检测图像;将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;若所述待检测图像包含人脸,输出所述待检测图像中人脸的姿态信息。由于训练得到的卷积网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据,因此训练得到的卷积网络可以识别待检测图像中是否包含人脸,以及待检测图像中人脸的姿态数据,通过卷积神经网络模型,不仅可以对人脸进行检测还能同步估计人脸姿态,无需通过多个模型重复提取图像特征,在人脸识别过程中,避免了繁琐的运算过程,提高了人脸识别的效率。
附图说明
为了更清楚地说明本发明实施例技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图是本发明的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1是本发明实施例提供的一种人脸检测方法的流程图;
图2是本发明实施例中对图像进行非极大值抑制算法处理之前与之后的示意图;
图3本发明实施例提供的一种卷积神经网络参数的训练方法的流程图
图4是本发明实施例提供的人脸检测装置的功能模块图;
图5是本发明实施例提供的卷积神经网络参数的训练装置的功能模块图;
图6是本发明实施例提供的计算机装置的示意图。
具体实施方式
为了能够更清楚地理解本发明的上述目的、特征和优点,下面结合附图和具体实施例对本发明进行详细描述。需要说明的是,在不冲突的情况下,本申 请的实施例及实施例中的特征可以相互组合。
在下面的描述中阐述了很多具体细节以便于充分理解本发明,所描述的实施例仅仅是本发明一部分实施例,而不是全部的实施例。基于本发明中的实施例,本领域普通技术人员在没有做出创造性劳动前提下所获得的所有其他实施例,都属于本发明保护的范围。
除非另有定义,本文所使用的所有的技术和科学术语与属于本发明的技术领域的技术人员通常理解的含义相同。本文中在本发明的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本发明。
图1为本发明实施例提供的人脸检测方法的示意流程图。如图1所示,人脸检测方法可包括以下步骤:
S10,获取待检测图像。
本发明所述的人脸检测方法可应用于计算机装置中,所述计算机装置可以为网络摄像机、笔记本等计算机装置。
上述待检测图像可以是计算机装置采集到的图像,或者是接收到的来自于其他计算机装置发送的图像。
同时,上述待检测图像可以为人脸图像,也可以为非人脸图像。
S11,将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据。
其中,卷积神经网络(Convolutional Neural Network,CNN)是深度学习技术中具代表的网络结构之一,通常由输入层、卷积层、池化层和全连接层组成。例如,卷积神经网络模型有VGG-16,GoogleNet、ResNet50等。
本发明实施例中所述的训练得到的卷积神经网络可以为任意卷积神经网络模型经过训练得到的。
训练的过程是利用训练数据(用于得到模型的输入值和输出值)和训练算法,来得到卷积神经网络模型的网络参数,此时得到的卷积神经网络可称为训练得到的卷积神经网络,训练得到的卷积神经网络可根据输入值来预测输出值,即根据输入的图像,输出对应的结果。
在本实施例中,训练卷积神经网络模型的训练样本集包括训练样本图像, 训练样本图像可以包括人脸图像与非人脸图像,且人脸图像的样本越多,则训练得到的卷积神经网络输出结果的准确率越高。
训练卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据,即在训练卷积神经网络模型时,可获取训练样本图像的人脸的位置数据和姿态数据,在训练过程中获取训练样本图像的人脸的位置数据和姿态数据须先对样本图像进行特征提取获取人脸,再进行人脸的位置数据和姿态数据的获取。其中,人脸的位置数据可以是人脸的脸框中心的横坐标与纵坐标,以及人脸的长度与宽度,人脸的姿态数据可以是人脸的俯仰角pitch、偏航角yaw和翻滚角roll,pitch代表人脸上下翻转的角度,yaw代表人脸左右翻转的角度,roll代表人脸平面内旋转的角度。
在从训练样本图像中获取所需的数据,可以称为对训练样本图像进行标注。在进行标注时,可以对数据进行归一化操作。例如,将人脸的脸框中心的横坐标与人脸的长度分别除以训练样本图像的长,将人脸的脸框中心的纵坐标与人脸的宽度分别除以训练样本图像的宽,将人脸的pitch、yaw和roll分别除以π。
卷积神经网络模型的输入值可以为训练样本图像,对卷积神经网络模型进行训练的目的是学习是根据输入的训练样本图像获取训练样本图像中人脸的位置数据及姿态数据,则该模型训练后,可以用于获取任意一张图像的人脸的位置数据及姿态数据。
若训练样本图像中不存在人脸,则人脸的位置数据和姿态数据可以为空。
在本发明中,还可以包括对训练得到的卷积神经网络进行训练的方法,在训练卷积神经网络时可以通过以下方法对训练样本图像进行处理:
(1)通过用于训练的卷积神经网络模型的卷积层提取训练样本图像的特征,得到特征图,所述特征图由若干特征单元组成;
(2)根据所述训练样本图像中人脸的位置数据和聚类算法获取所述若干特征单元中每一特征单元的锚框。
上述用于训练的卷积神经网络是指在具体实施时所使用的卷积神经网络模型,不同的卷积神经网络模型有不同的卷积层,每个卷积层有其对应的卷积核(矩阵)。例如,用于训练的卷积神经网络模型是VGG-16,在VGG-16中有16层网络层,其中,卷积层有13层。
通过用于训练的卷积神经网络模型的卷积层提取训练样本图像的特征得到 的特征图,是提取训练样本图像特征的过程,得到的特征图用于表示该训练样本图像。
通过卷积层提取训练样本图像的特征,具体是通过卷积层进行卷积操作来提取训练样本图像的特征。卷积操作是将卷积核与训练样本图像的对应位置相乘再求和的过程,卷积操作之后得到另一矩阵,若卷积神经网络模型有多个卷积层,则可以进行多次卷积操作。
上述所述特征图由若干特征单元组成是指,特征图可以将特征图按照预设比例划分为几部分,每部分可称为特征单元,则特征图由这几个特征单元组成。例如,将特征图像按照3*3划分为9部分,则特征图由9个特征单元组成。
上述聚类算法可以为K-means算法、FCM聚类算法、SOM聚类算法等,具体是具体通过聚类算法获取锚框的长和宽,且一个特征单元可以对应多个锚框。
在现有技术中,锚框的获取通常由人工标注,而本实施例中通过聚类算法获取锚框的长度,能根据每个特征单元的尺度,确定每个特征单元的锚框,从而准确反应待检测人脸的长宽比例,减少了人工先验的干扰,同时也使检测更加精确。
可选的,还可以将判断锚框是否存在人脸,将存在人脸的锚框输出至用于训练的卷积神经网络模型的卷积层的下一层,例如VGG-16的全连接层。此时可以通过线性分类器(如线性SVM分类器)来判断锚框中是否存在人脸。
然后在用于训练的卷积神经网络下一层可以再判断当前锚框中是否都包含人脸,若存在,则提取锚框中的图像特征。
由于通过卷积层提取的特征向量是粗略的提取,提取到的特征简单,第二次再次提取到的特征表达的内容更为精确和丰富,使得二次提取特征后得到的表示训练样本图像更为准确,有利于提高训练结果的准确度。
可选地,在本发明另一实施例中,所述根据所述训练样本图像中人脸的位置数据和聚类算法获取所述若干特征单元中每一特征单元的锚框可包括:
(1)初始化特征单元的待确定锚框的长和宽;
(2)计算所述待确定锚框与所述训练样本图像的位置数据所确定的标准框的交集与并集的比值,根据所述比值确定聚类算法中的距离参数;
(3)迭代所述待确定锚框的长和宽,获取所述聚类算法对应的迭代结束条 件达到时所述待确定锚框的长和宽,得到所述特征单元的锚框。
其中,锚框也可称为Anchor box,由于待确定锚框的长和宽为未知的,因此可以对待确定锚框的长和宽分别初始化一个值(可随机初始化赋值)。
然后,再计算待确定锚框与训练样本图像的标准框(即根据训练样本图像的位置数据所确定的区域)的交集与并集的比值,根据所述比值确定聚类算法中的距离参数。具体的,在本实施例中,聚类算法中的距离可以表示为以下:
d(tbox,abox)=1-IOU(tbox,abox)
其中,tbox表示训练样本图像中的标准框(即根据训练样本图像的位置数据所确定的区域),abox表示待确定锚框的长和宽,IOU(tbox,abox)表示tbox与abox的交集与并集的比值,定义为:
Figure PCTCN2018119188-appb-000001
然后迭代待确定锚框的长和宽,则根据不同的长和宽可以确定不同的待确定锚框,持续迭代,直至聚类算法对应的迭代结束条件达到,获取此时待确定锚框的长和宽的值。例如,持续迭代,直至锚框的长度均不再发生变化。当锚框的长度均不在发生变化时,可能获取到多个长和宽的值,根据多个长和宽的值,可以确定多个锚框。
可以理解的,对若干特征单元中的每一个特征单元都可以通过以上所述方法获取锚框。
在现有的聚类算法中,若要对若干对象进行聚类,通常定义一个距离,两个对象的距离越近,相似性越大,则归为一类。而在本实施例中,待确定锚框与训练样本图像的标准框的比值可以确定待确定锚框与训练样本图像的标准框的重叠的面积,相比通过点对点的距离来进行聚类,本实施例中的聚类方法可以更准确的反应要解决的问题(将锚框中可能存在人脸的区域标记出来),运算效率更高,而且得到的结果更准确。
可选地,在本发明另一实施例中,还可以通过以下方法训练卷积神经网络:
根据预设损失函数和训练算法训练预设卷积神经网络模型,获得所述预设卷积神经网络模型的网络参数的值,得到所述训练得到的卷积神经网络,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确定的区域的偏置量的损失。
上述预设卷积神经模型即为用于训练的卷积神经网络模型,例如VGG-16。
在对卷积神经模型进行训练时,训练的目的是为了得到卷积神经模型的网络参数,使该卷积神经网络得到的输出值能尽可能的接近实际值,从而能准确的对输入的数据进行预测。因此在训练时,通过损失函数来计算训练过程中卷积神经网络的输出值是否接近实际值,若损失函数的值越小,表明卷积神经网络的输出值越接近实际值。
具体的,本实施例中预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸标注区域的偏置量的损失。在实现时,判断训练样本图像中人脸存在与否的损失,由于将训练样本图像由若干特征单元组成,且获取了每一特征单元的锚框。因此,可以通过获取每个锚框中人脸存在与否的损失来获取训练样本图像中人脸存在与否的损失,锚框中人脸存在与否的损失可以表示为:
Figure PCTCN2018119188-appb-000002
其中,N表示样本的数量,根据每次选择的样本的数量而异;x表示锚框与标准框匹配是否匹配,具体的,x i,j={0,1}表示第i个通过聚类算法获取到的锚框与第j个样本图像的标准框是否匹配,当第i个锚框与第j个标准框的IOU值大于0.5时,x i,j=1,否则x i,j=0;c表示置信度,具体的,
Figure PCTCN2018119188-appb-000003
表示第i个锚框中包含人脸的置信度,
Figure PCTCN2018119188-appb-000004
表示第j个锚框中不包含人脸的置信度;i∈Pos表示第i个锚框中包含人脸,i∈Neg表示第i个锚框中不包含人脸。
同样的,可以通过锚框与训练样本图像中人脸的标准框的偏置量的损失来来获取训练样本中人脸的位置数据所确定的区域的偏置量的损失,锚框与训练样本图像中人脸的标准框的偏置量的损失为:
Figure PCTCN2018119188-appb-000005
其中,x,N表示的内容同前面所述,l表示锚框的位置信息,cx、cy、w、h表示锚框的中心点横纵坐标、长度与宽度,g表示标准框的位置信息,smooth L1表示L1范数映射,并且有:
Figure PCTCN2018119188-appb-000006
Figure PCTCN2018119188-appb-000007
Figure PCTCN2018119188-appb-000008
Figure PCTCN2018119188-appb-000009
可以通过获取每个锚框中人脸姿态与标准框中的姿态的损失来获取训练样本中人脸姿态的损失,锚框中人脸姿态与标准框中的姿态的损失可以表示为:
Figure PCTCN2018119188-appb-000010
其中,φ=(φ 123)表示锚框中人脸的pitch,yaw,roll三种角度信息,θ=(θ 123)表示标准中人脸的pitch,yaw,roll三种角度信息。
则,预设损失函数可以为以下:
Figure PCTCN2018119188-appb-000011
同时,在对进行训练时,训练算法可以为梯度下降算法、牛顿算法、共轭梯度算法等。具体的训练算法可以从现有技术中获取,这里不再赘述。
本发明在计算神经网络的网络损失中增加了对人脸姿态信息的计算,能够在检测人脸的同时,直接输出人脸的姿态。并且由于损失函数可以用于对神经外模型进行评价,人脸姿态越准确,损失会越小。因此,在计算神经网络的网络损失中增加对人脸姿态的计算使得人脸检测与姿态估计取得相互促进的效果,进一步提高人脸检测与姿态估计的准确率。
S12,输出所述待检测图像是否包含人脸的检测结果。
根据步骤S11,可以获取待检测图像是否包含人脸。当待检测图像不包含人脸时,可以输出不包含人脸的信息,例如,输出no表示待检测图像中不包含人脸。当待检测图像包含人脸时,可以输出包含人脸的信息,例如,输出yes表示待检测图像中包含人脸。
由于本发明训练得到的卷积神经网络通过训练卷积神经网络模型学习识别人脸的位置及姿态数据,而是否存在人脸是学习人脸的位置以及姿态的基础,则在训练过程中通过提取图像的特征并学习就可以学习到识别图像是否寻在人脸,故训练得到的卷积神经网络可以输出待检测图像是否包含人脸的检测结果。
S13,若所述待检测图像包含人脸,输出所述待检测图像中人脸的姿态信息。
当待检测图像包含人脸时,由于训练得到的卷积神经网络在训练时,也训练了获取图像的中人脸的姿态数据。因此,可以输出待检测图像中人脸的姿态数据,可通过人脸的俯仰角pitch、偏航角yaw和翻滚角roll来表示人脸的姿态。
在具体实现时,步骤S12与步骤S13可以是同步输出。即若待检测图像中包含人脸,输出待检测图像包含人脸的检测结果以及人脸的姿态,若待检测图像中不包含人脸,可直接输出不包含人脸的检测结果,且不输出姿态信息,或者输出姿态信息为空值。
同时,在具体实现时,还可以是仅输出人脸的姿态信息而不输出是否包含人脸的检测结果。由于若得到了人脸的姿态信息且姿态信息不为空值,则表明待检测图像中是存在人脸的,因此仅通过姿态信息的输出就可以直观的了解到是否包含人脸,以及人脸的姿态是怎样的。
可选地,在本发明另一实施例中,所述人脸检测方法还可包括:
根据所述训练得到的卷积神经网络获取所述待检测图像中人脸的位置数据;
若所述人脸的位置数据包含至少两组位置数据,通过非极大值抑制算法获取所述待检测图像中人脸的准确位置;
输出所述待检测图像中人脸在所述准确位置时的人脸姿态信息。
其中,非极大值抑制算法(Non-maximum suppression,NMS)是搜索图像局部极大值,抑制非极大值元素。具体的非极大值抑制算法处理过程这里不再赘述,可以从现有技术中获取。
如图2所示,图2为对图像进行非极大值抑制算法处理之前与之后的示意图。图2左边的图显示的是检测到人脸,并根据人脸位置标识了人脸存在的区域(人脸框中为人脸存在的区域)的示意图。图2右边的图为经过非极大值抑制算法处理后,获取的图像的准确位置,此时去除了图像中多余的人脸框,可以准确的得到人脸的位置。
当待检测图像包含人脸时,由于训练得到的卷积神经网络在训练时,也训练了获取图像的中人脸的位置数据。因此,可以输出待检测图像中人脸的位置数据。在进行人脸检测时,可能得到多组人脸的位置数据,此时,通过非极大值抑制算法获取待检测图像中人脸的准确位置。当检测到人脸在某一位置时,可以检测到人脸在该位置的一组姿态数据(人脸的俯仰角pitch、偏航角yaw和翻滚角roll),因此在确定人脸的准确位置之后,可以获取在该准确位置时人脸的姿态信息。
通过非极大值抑制算法获取待检测图像中人脸的准确位置及姿态,可以提 供待检测图像中更准确的人脸信息,以便后续进一步的进行图像处理(如图像识别)提高处理的准确率。
本发明提供的人脸检测方法通过获取待检测图像;将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;输出所述待检测图像是否包含人脸的检测结果;若所述待检测图像包含人脸,输出所述待检测图像中人脸的姿态信息。由于训练得到的卷积网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据,因此训练得到的卷积网络可以识别待检测图像中是否包含人脸,以及待检测图像中人脸的姿态数据,通过卷积神经网络模型,不仅可以对人脸进行检测还能同步估计人脸姿态,无需通过多个模型重复提取图像特征,在人脸识别过程中,避免了繁琐的运算过程,提高了人脸识别的效率。
图3为本发明实施例提供的卷积神经网络参数的训练方法的示意流程图。如图3所示,卷积神经网络参数的训练方法可包括以下步骤:
S20,获取训练样本集,所述训练样本集中的训练样本图像包括人脸的位置数据及姿态数据。
本发明所述的训练方法可用于训练任意卷积神经网络。
训练样本集用于对卷积神经网络模型进行训练,具体的训练样本集的类型和对训练样本集的处理可以参见上述实施例中相关描述,此处不再赘述。
S21,根据所述训练样本集训练、训练算法和预设损失函数训练预设卷积神经网络模型,得到所述预设卷神经网络模型的网络参数的值,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸标注区域的偏置量的损失。
训练的过程是利用训练数据(用于得到模型的输入值和输出值)和训练算法,来得到卷积神经网络模型的网络参数,此时得到的卷积神经网络可称为训练得到的卷积神经网络,训练得到的卷积神经网络可根据输入值来预测输出值,即根据输入的图像,输出对应的结果。
在进行训练时,可以通过预设卷积神经网络模型的卷积层提取训练样本图像的特征,得到表示训练样本图像的特征图。然后将特征图按照预设比例划分 为若干特征单元,再根据聚类算法获取每个特征单元中的多个锚框。在得到多个锚框之后将存在人脸的锚框输入至预设卷积神经网络模型的下一层网络中,再次判断锚框中是否存在人脸,并对存在人脸的锚框进行特征提取。
然后,将此时提取到的特征及特征反应的信息(位置及姿态)与训练样本图像的包括的人脸的位置数据及姿态数据进行比较,并根据预设的损失函数来训练,获得预设卷积神经网络模型的网络参数。
具体的,可以参见前述实施例中有关模型训练的内容,此处不再赘述。
本发明提供的卷积神经网络参数的训练方法可以通过训练使训练得到的卷积神经网络,进行人脸检测,以及能够获取图像中人脸的位置及姿态信息。并且,在训练时,通过在网络损失中增加对人脸姿态的计算,使得人脸检测与姿态估计取得相互促进的效果,进一步提高人脸检测与姿态估计的准确率。
图3为本发明实施例提供的人脸检测装置的结构图,如图3所示,人脸检测装置可以包括:图像获取模块310、处理模块320和输出模块330。
图像获取模块310,用于获取待检测图像。
上述待检测图像可以是计算机装置采集到的图像,或者是接收到的来自于其他计算机装置发送的图像。
同时,上述待检测图像可以为人脸图像,也可以为非人脸图像。
处理模块320,用于将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据。
其中,卷积神经网络(Convolutional Neural Network,CNN)是深度学习技术中具代表的网络结构之一,通常由输入层、卷积层、池化层和全连接层组成。例如,卷积神经网络模型有VGG-16,GoogleNet、ResNet50等。
本发明实施例中所述的训练得到的卷积神经网络可以为任意卷积神经网络模型经过训练得到的。
训练的过程是利用训练数据(用于得到模型的输入值和输出值)和训练算法,来得到卷积神经网络模型的网络参数,此时得到的卷积神经网络可称为训练得到的卷积神经网络,训练得到的卷积神经网络可根据输入值来预测输出值,即根据输入的图像,输出对应的结果。
在本实施例中,训练卷积神经网络模型的训练样本集包括训练样本图像,训练样本图像可以包括人脸图像与非人脸图像,且人脸图像的样本越多,则训练得到的卷积神经网络输出结果的准确率越高。
训练卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据,即在训练卷积神经网络模型时,可获取训练样本图像的人脸的位置数据和姿态数据。在训练过程中获取训练样本图像的人脸的位置数据和姿态数据须先对样本图像进行特征提取获取人脸,再进行人脸的位置数据和姿态数据的获取。其中,人脸的位置数据可以是人脸的脸框中心的横坐标与纵坐标,以及人脸的长度与宽度,人脸的姿态数据可以是人脸的俯仰角pitch、偏航角yaw和翻滚角roll,pitch代表人脸上下翻转的角度,yaw代表人脸左右翻转的角度,roll代表人脸平面内旋转的角度。
在从训练样本图像中获取所需的数据,可以称为对训练样本图像进行标注。在进行标注时,可以对数据进行归一化操作,例如,将人脸的脸框中心的横坐标与人脸的长度分别除以训练样本图像的长,将人脸的脸框中心的纵坐标与人脸的宽度分别除以训练样本图像的宽。将人脸的pitch、yaw和roll分别除以π。
卷积神经网络模型的输入值可以为训练样本图像,对卷积神经网络模型进行训练的目是学习根据输入的训练样本图像获取训练样本图像中人脸的位置数据及姿态数据,则该模型训练后,可以用于获取任意一张图像的人脸的位置数据及姿态数据。
若训练样本图像中不存在人脸,则人脸的位置数据和姿态数据可以为空。
在本发明中,还可包括对训练得到的卷积神经网络进行训练的模块,在训练卷积神经网络时可通过特征提取模块及计算模块对训练样本图像进行处理:
特征提取模块,用于通过用于训练的卷积神经网络模型的卷积层提取训练样本图像的特征,得到特征图,所述特征图由若干特征单元组成。
计算模块,用于根据所述训练样本图像中人脸的位置数据和聚类算法获取所述若干特征单元中每一特征单元的锚框。
上述用于训练的卷积神经网络是指在具体实施时所使用的卷积神经网络模型,不同的卷积神经网络模型有不同的卷积层,每个卷积层有其对应的卷积核(矩阵)。例如,用于训练的卷积神经网络模型是VGG-16,在VGG-16中有16层网络层,其中,卷积层有13层。
通过用于训练的卷积神经网络模型的卷积层提取训练样本图像的特征得到的特征图,是提取训练样本图像特征的过程,得到的特征图用于表示该训练样本图像。
通过卷积层提取训练样本图像的特征,具体是通过卷积层进行卷积操作来提取训练样本图像的特征。卷积操作是将卷积核与训练样本图像的对应位置相乘再求和的过程,卷积操作之后得到另一矩阵,若卷积神经网络模型有多个卷积层,则可以进行多次卷积操作。
上述所述特征图由若干特征单元组成是指,特征图可以将特征图按照预设比例划分为几部分,每部分可称为特征单元,则特征图由这几个特征单元组成。例如,将特征图像按照3*3划分为9部分,则特征图由9个特征单元组成。
上述聚类算法可以为K-means算法、FCM聚类算法、SOM聚类算法等,具体是具体通过聚类算法获取锚框的长和宽,且一个特征单元可以对应多个锚框。
在现有技术中,锚框的获取通常由人工标注,而本实施例中通过聚类算法获取锚框的长度,能根据每个特征单元的尺度,确定每个特征单元的锚框,从而准确反应待检测人脸的长宽比例,减少了人工先验的干扰,同时也使检测更加精确。
可选的,还可以将判断锚框是否存在人脸,将存在人脸的锚框输出至用于训练的卷积神经网络模型的卷积层的下一层,例如VGG-16的全连接层。此时可以通过线性分类器(如线性SVM分类器)来判断锚框中是否存在人脸。
然后在用于训练的卷积神经网络下一层可以再判断当前锚框中是否都包含人脸,若存在,则提取锚框中的图像特征。
由于通过卷积层提取的特征向量是粗略的提取,提取到的特征简单,第二次再次提取到的特征表达的内容更为精确和丰富,使得二次提取特征后得到的表示训练样本图像更为准确,有利于提高训练结果的准确度。
可选地,在本发明另一实施例中,所述计算模块可具体用于:
(1)初始化特征单元的待确定锚框的长和宽;
(2)计算所述待确定锚框与所述训练样本图像的位置数据所确定的标准框的交集与并集的比值,根据所述比值确定聚类算法中的距离参数;
(3)迭代所述待确定锚框的长和宽,获取所述聚类算法对应的迭代结束条 件达到时所述待确定锚框的长和宽,得到所述特征单元的锚框。
其中,锚框也可称为Anchor box,由于待确定锚框的长和宽为未知的,因此可以对待确定锚框的长和宽分别初始化一个值(可随机初始化赋值)。
然后,再计算待确定锚框与训练样本图像的标准框(即根据训练样本图像的位置数据所确定的区域)的交集与并集的比值,根据所述比值确定聚类算法中的距离参数。具体的,在本实施例中,聚类算法中的距离可以表示为以下:
d(tbox,abox)=1-IOU(tbox,abox)
其中,tbox表示训练样本图像中的标准框(即根据训练样本图像的位置数据所确定的区域),abox表示待确定锚框的长和宽,IOU(tbox,abox)表示tbox与abox的交集与并集的比值,定义为:
Figure PCTCN2018119188-appb-000012
然后迭代待确定锚框的长和宽,则根据不同的长和宽可以确定不同的待确定锚框,持续迭代,直至聚类算法对应的迭代结束条件达到,获取此时待确定锚框的长和宽的值。例如,持续迭代,直至锚框的长度均不再发生变化。当锚框的长度均不在发生变化时,可能获取到多个长和宽的值,根据多个长和宽的值,可以确定多个锚框。
可以理解的,对若干特征单元中的每一个特征单元都可以通过计算模块获取锚框。
在现有的聚类算法中,若要对若干对象进行聚类,通常定义一个距离,两个对象的距离越近,相似性越大,则归为一类。而在本实施例中,待确定锚框与训练样本图像的标准框的比值可以确定待确定锚框与训练样本图像的标准框的重叠的面积,相比通过点对点的距离来进行聚类,本实施例中的聚类方法可以更准确的反应要解决的问题(将锚框中可能存在人脸的区域标记出来),运算效率更高,而且得到的结果更准确。
可选地,在本发明另一实施例中,还可以通过参数获取模块得到所述训练得到的卷积神经网络:
参数获取模块,用于根据预设损失函数和训练算法训练预设卷积神经网络模型,获得所述预设卷积神经网络模型的网络参数的值,得到所述训练得到的卷积神经网络,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确 定的区域的偏置量的损失。
上述预设卷积神经模型即为用于训练的卷积神经网络模型,例如VGG-16。
在对卷积神经模型进行训练时,训练的目的是为了得到卷积神经模型的网络参数,使该卷积神经网络得到的输出值能尽可能的接近实际值,从而能准确的对输入的数据进行预测。因此在训练时,通过损失函数来计算训练过程中卷积神经网络的输出值是否接近实际值,若损失函数的值越小,表明卷积神经网络的输出值越接近实际值。
具体的,本实施例中预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸标注区域的偏置量的损失。在实现时,判断训练样本图像中人脸存在与否的损失,由于将训练样本图像由若干特征单元组成,且获取了每一特征单元的锚框。因此,可以通过获取每个锚框中人脸存在与否的损失来获取训练样本图像中人脸存在与否的损失,锚框中人脸存在与否的损失可以表示为:
Figure PCTCN2018119188-appb-000013
其中,N表示样本的数量,根据每次选择的样本的数量而异;x表示锚框与标准框匹配是否匹配,具体的,x i,j={0,1}表示第i个通过聚类算法获取到的锚框与第j个样本图像的标准框是否匹配,当第i个锚框与第j个标准框的IOU值大于0.5时,x i,j=1,否则x i,j=0;c表示置信度,具体的,
Figure PCTCN2018119188-appb-000014
表示第i个锚框中包含人脸的置信度,
Figure PCTCN2018119188-appb-000015
表示第j个锚框中不包含人脸的置信度;i∈Pos表示第i个锚框中包含人脸,i∈Neg表示第i个锚框中不包含人脸。
同样的,可以通过锚框与训练样本图像中人脸的标准框的偏置量的损失来来获取训练样本中人脸的位置数据所确定的区域的偏置量的损失,锚框与训练样本图像中人脸的标准框的偏置量的损失为:
Figure PCTCN2018119188-appb-000016
其中,x,N表示的内容同前面所述,l表示锚框的位置信息,cx、cy、w、h表示锚框的中心点横纵坐标、长度与宽度,g表示标准框的位置信息,smooth L1表示L1范数映射,并且有:
Figure PCTCN2018119188-appb-000017
Figure PCTCN2018119188-appb-000018
Figure PCTCN2018119188-appb-000019
Figure PCTCN2018119188-appb-000020
可以通过获取每个锚框中人脸姿态与标准框中的姿态的损失来获取训练样本中人脸姿态的损失,锚框中人脸姿态与标准框中的姿态的损失可以表示为:
Figure PCTCN2018119188-appb-000021
其中,φ=(φ 123)表示锚框中人脸的pitch,yaw,roll三种角度信息,θ=(θ 123)表示标准中人脸的pitch,yaw,roll三种角度信息。
则,预设损失函数可以为以下:
Figure PCTCN2018119188-appb-000022
同时,在对进行训练时,训练算法可以为梯度下降算法、牛顿算法、共轭梯度算法等。具体的训练算法可以从现有技术中获取,这里不再赘述。
本发明在计算神经网络的网络损失中增加了对人脸姿态信息的计算,能够在检测人脸的同时,直接输出人脸的姿态。并且由于损失函数可以用于对神经外模型进行评价,人脸姿态越准确,损失会越小。因此,在计算神经网络的网络损失中增加对人脸姿态的计算使得人脸检测与姿态估计取得相互促进的效果,进一步提高人脸检测与姿态估计的准确率。
输出模块330,用于输出所述待检测图像是否包含人脸的检测结果。
根据处理模块320,可以获取待检测图像是否包含人脸。当待检测图像不包含人脸时,可以输出不包含人脸的信息,例如,输出no表示待检测图像中不包含人脸。当待检测图像包含人脸时,可以输出包含人脸的信息,例如,输出yes表示待检测图像中包含人脸。
由于本发明训练得到的卷积神经网络通过训练卷积神经网络模型学习识别人脸的位置及姿态数据,而是否存在人脸是学习人脸的位置以及姿态的基础,则在训练过程中通过提取图像的特征并学习就可以学习到识别图像是否寻在人脸,故训练得到的卷积神经网络可以输出待检测图像是否包含人脸的检测结果。
输出模块330,还用于若所述待检测图像包含人脸,输出所述待检测图像 中人脸的姿态信息。
当待检测图像包含人脸时,由于训练得到的卷积神经网络在训练时,也训练了获取图像的中人脸的姿态数据。因此,可以输出待检测图像中人脸的姿态数据,可通过人脸的俯仰角pitch、偏航角yaw和翻滚角roll来表示人脸的姿态。
在具体实现时,输出模块330可以是同步输出是否包含人脸的检测结果以及人脸的姿态信息。即若待检测图像中包含人脸,输出待检测图像包含人脸的检测结果以及人脸的姿态,若待检测图像中不包含人脸,可直接输出不包含人脸的检测结果,且不输出姿态信息,或者输出姿态信息为空值。
同时,在具体实现时,还可以是仅输出人脸的姿态信息而不输出是否包含人脸的检测结果。由于若得到了人脸的姿态信息且姿态信息不为空值,则表明待检测图像中是存在人脸的,因此仅通过姿态信息的输出就可以直观的了解到是否包含人脸,以及人脸的姿态是怎样的。
可选地,在本发明另一实施例中,所述人脸检测装置还可包括:
位置获取模块,用于根据所述训练得到的卷积神经网络获取所述待检测图像中人脸的位置数据。
去重复模块,用于若所述人脸的位置数据包含至少两组位置数据,通过非极大值抑制算法获取所述待检测图像中人脸的准确位置。
姿态获取模块,用于输出所述待检测图像中人脸在所述准确位置时的人脸姿态信息。
其中,非极大值抑制算法(Non-maximum suppression,NMS)是搜索图像局部极大值,抑制非极大值元素。具体的非极大值抑制算法处理过程这里不再赘述,可以从现有技术中获取。
如图2所示,图2为对图像进行非极大值抑制算法处理之前与之后的示意图。图2左边的图显示的是检测到人脸,并根据人脸位置标识了人脸存在的区 域(人脸框中为人脸存在的区域)的示意图。图2右边的图为经过非极大值抑制算法处理后,获取的图像的准确位置,此时去除了图像中多余的人脸框,可以准确的得到人脸的位置。
当待检测图像包含人脸时,由于训练得到的卷积神经网络在训练时,也训练了获取图像的中人脸的位置数据。因此,可以输出待检测图像中人脸的位置数据。在进行人脸检测时,可能得到多组人脸的位置数据,此时,通过非极大值抑制算法获取待检测图像中人脸的准确位置。当检测到人脸在某一位置时,可以检测到人脸在该位置的一组姿态数据(人脸的俯仰角pitch、偏航角yaw和翻滚角roll),因此在确定人脸的准确位置之后,可以获取在该准确位置时人脸的姿态信息。
通过非极大值抑制算法获取待检测图像中人脸的准确位置及姿态,可以提供待检测图像中更准确的人脸信息,以便后续进一步的进行图像处理(如图像识别)提高处理的准确率。
本发明提供的人脸检测装置通过图像获取模块获取待检测图像;处理模块将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;输出模块输出所述待检测图像是否包含人脸的检测结果;若所述待检测图像包含人脸,以及输出所述待检测图像中人脸的姿态信息。由于训练得到的卷积网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据,因此训练得到的卷积网络可以识别待检测图像中是否包含人脸,以及待检测图像中人脸的姿态数据,通过卷积神经网络模型,不仅可以对人脸进行检测还能同步估计人脸姿态,无需通过多个模型重复提取图像特征,在人脸识别过程中,避免了繁琐的运算过程,提高了人脸识别的效率。
图5为本发明实施例提供的卷积神经网络参数的训练装置的结构图,卷积神经网络参数的训练装置可以包括:样本获取模块410和训练模块420。
样本获取模块410,用于获取训练样本集,所述训练样本集中的训练样本图像包括人脸的位置数据及姿态数据。
本发明所述的训练装置可用于训练任意卷积神经网络。
训练样本集用于对卷积神经网络模型进行训练,具体的训练样本集的类型和对训练样本集的处理可以参见上述实施例中相关描述,此处不再赘述。
训练模块420,用于根据所述训练样本集训练、训练算法和预设损失函数训练预设卷积神经网络模型,得到所述预设卷神经网络模型的网络参数的值,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确定的区域的偏置量的损失。
训练的过程是利用训练数据(用于得到模型的输入值和输出值)和训练算法,来得到卷积神经网络模型的网络参数,此时得到的卷积神经网络可称为训练得到的卷积神经网络,训练得到的卷积神经网络可根据输入值来预测输出值,即根据输入的图像,输出对应的结果。
在进行训练时,可以通过预设卷积神经网络模型的卷积层提取训练样本图像的特征,得到表示训练样本图像的特征图。然后将特征图按照预设比例划分为若干特征单元,再根据聚类算法获取每个特征单元中的多个锚框。在得到多个锚框之后将存在人脸的锚框输入至预设卷积神经网络模型的下一层网络中,再次判断锚框中是否存在人脸,并对存在人脸的锚框进行特征提取。
然后,将此时提取到的特征及特征反应的信息(位置及姿态)与训练样本图像的包括的人脸的位置数据及姿态数据进行比较,并根据预设的损失函数来训练,获得预设卷积神经网络模型的网络参数。
具体的,可以参见前述实施例中有关模型训练的内容,此处不再赘述。
本发明提供的卷积神经网络参数的训练装置可以通过训练使训练得到的卷积神经网络,进行人脸检测,以及能够获取图像中人脸的位置及姿态信息。并且,在训练时,通过在网络损失中增加对人脸姿态的计算,使得人脸检测与姿态估计取得相互促进的效果,进一步提高人脸检测与姿态估计的准确率。
请参照图6,图6是本发明实施例提供的计算机装置1的示意图。所述计算机装置1包括存储器20、处理器30以及存储在所述存储器20中并可在所述处理器30上运行的计算机程序40,例如人脸检测的程序。所述处理器30执行所述计算机程序40时实现上述人脸检测方法实施例中的步骤,或上述卷积神经网络参数的训练方法实施例中的步骤,例如图1所示的步骤S10~S13,或图2所示的步骤S20~S21。或者,所述处理器30执行所述计算机程序40时实现上述装置实施例中各模块/单元的功能,例如模块310~330,或模块410~420。
示例性的,所述计算机程序40可以被分割成一个或多个模块/单元,所述一个或者多个模块/单元被存储在所述存储器20中,并由所述处理器30执行,以完成本发明。所述一个或多个模块/单元可以是能够完成特定功能的一系列计算机程序指令段,该指令段用于描述所述计算机程序40在所述计算机装置1中的执行过程。例如,所述计算机程序40可以被分割成图4中的图像获取模块310、处理模块320和输出模块330,或者被分割成图5中的样本获取模块410和训练模块420,各模块具体功能参见前述实施例。
所述计算机装置1可以是嵌入式网络摄像机等嵌入式监控设备。本领域技术人员可以理解,所述示意图6仅仅是计算机装置1的示例,并不构成对计算机装置1的限定,可以包括比图示更多或更少的部件,或者组合某些部件,或者不同的部件,例如所述计算机装置1还可以包括输入输出设备、网络接入设备、总线等。
所称处理器30可以是中央处理单元(Central Processing Unit,CPU),还可 以是其他通用处理器、数字信号处理器(Digital Signal Processor,DSP)、专用集成电路(Application Specific Integrated Circuit,ASIC)、现成可编程门阵列(Field-Programmable Gate Array,FPGA)或者其他可编程逻辑器件、分立门或者晶体管逻辑器件、分立硬件组件等。通用处理器可以是微处理器或者该处理器30也可以是任何常规的处理器等,所述处理器30是所述计算机装置1的控制中心,利用各种接口和线路连接整个计算机装置1的各个部分。
所述存储器20可用于存储所述计算机程序40和/或模块/单元,所述处理器30通过运行或执行存储在所述存储器20内的计算机程序和/或模块/单元,以及调用存储在存储器20内的数据,实现所述计算机装置1的各种功能。所述存储器20可主要包括存储程序区和存储数据区,其中,存储程序区可存储操作系统、至少一个功能所需的应用程序等;存储数据区可存储根据计算机装置1的使用所创建的数据(比如音频数据、图像数据等)等。此外,存储器20可以包括高速随机存取存储器,还可以包括非易失性存储器,例如硬盘、内存、插接式硬盘,智能存储卡(Smart Media Card,SMC),安全数字(Secure Digital,SD)卡,闪存卡(Flash Card)、至少一个磁盘存储器件、闪存器件、或其他易失性固态存储器件。
所述计算机装置1集成的模块/单元如果以软件功能单元的形式实现并作为独立的产品销售或使用时,可以存储在一个计算机可读取存储介质中。基于这样的理解,本发明实现上述实施例方法中的全部或部分流程,也可以通过计算机程序来指令相关的硬件来完成,所述的计算机程序可存储于一计算机可读存储介质中,该计算机程序在被处理器执行时,可实现上述各个方法实施例的步骤。其中,所述计算机程序包括计算机程序代码,所述计算机程序代码可以为源代码形式、对象代码形式、可执行文件或某些中间形式等。所述计算机可读介质可以包括:能够携带所述计算机程序代码的任何实体或装置、记录介质、U盘、移动硬盘、磁碟、光盘、计算机存储器、只读存储器(ROM,Read-Only Memory)、随机存取存储器(RAM,Random Access Memory)、电载波信号、 电信信号以及软件分发介质等。需要说明的是,所述计算机可读介质包含的内容可以根据司法管辖区内立法和专利实践的要求进行适当的增减,例如在某些司法管辖区,根据立法和专利实践,计算机可读介质不包括电载波信号和电信信号。
以上说明的本发明的特征性的手段可以通过集成电路来实现,并控制实现上述任意实施例中所述人脸检测方法的功能和/或所述卷积神经网络参数的训练方法的功能。
在任意实施例中所述人脸检测方法以及卷积神经网络参数的训练方法所能实现的功能都能通过本发明的集成电路安装于所述计算机装置中,使所述计算机装置发挥任意实施例中所述计算机装置方法所能实现的功能,在此不再详述。
在本发明所提供的几个实施例中,应该理解到,所揭露的方法和装置,也可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的,例如,所述模块的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
最后应说明的是,以上实施例仅用以说明本发明的技术方案而非限制,尽管参照较佳实施例对本发明进行了详细说明,本领域的普通技术人员应当理解,可以对本发明的技术方案进行修改或等同替换,而不脱离本发明技术方案的精神和范围。

Claims (10)

  1. 一种人脸检测方法,其特征在于,所述方法包括:
    获取待检测图像;
    将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;
    若所述待检测图像包含人脸,输出所述待检测图像中人脸的姿态信息。
  2. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    通过用于训练的卷积神经网络模型的卷积层提取训练样本图像的特征,得到特征图,所述特征图由若干特征单元组成;
    根据所述训练样本图像中人脸的位置数据和聚类算法获取所述若干特征单元中每一特征单元的锚框。
  3. 如权利要求2所述的方法,其特征在于,所述根据所述训练样本图像中人脸的位置数据和聚类算法获取所述若干特征单元中每一特征单元的锚框,包括:
    初始化特征单元的待确定锚框的长和宽;
    计算所述待确定锚框与所述训练样本图像的位置数据所确定的标准框的交集与并集的比值,根据所述比值确定聚类算法中的距离参数;
    迭代所述待确定锚框的长和宽,获取所述聚类算法对应的迭代结束条件到达时所述待确定锚框的长和宽,得到所述特征单元的锚框。
  4. 如权利要求1所述的方法,其特征在于,所述方法还包括:
    根据预设损失函数和训练算法训练预设卷积神经网络模型,获得所述预设卷积神经网络模型的网络参数的值,得到所述训练得到的卷积神经网络,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确定的区域的偏置量的损失。
  5. 如权利要求1至4中任一项所述的方法,其特征在于,所述方法还包括:
    根据所述训练得到的卷积神经网络获取所述待检测图像中人脸的位置数据;
    若所述人脸的位置数据包含至少两组位置数据,通过非极大值抑制算法获取所述待检测图像中人脸的准确位置;
    输出所述待检测图像中人脸在所述准确位置时的人脸姿态信息。
  6. 一种卷积神经网络参数的训练方法,其特征在于,所述方法包括:
    获取训练样本集,所述训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;
    根据所述训练样本集训练、训练算法和预设损失函数训练预设卷积神经网络模型,得到所述预设卷神经网络模型的网络参数的值,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确定的区域的偏置量的损失。
  7. 一种人脸检测装置,其特征在于,所述装置包括:
    图像获取模块,用于获取待检测图像;
    处理模块,用于将所述待检测图像输入至训练得到的卷积神经网络,识别所述待检测图像中是否包含人脸并对人脸姿态进行估计,其中训练所述卷积神经网络的训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;
    输出模块,用于若所述待检测图像包含人脸,输出所述待检测图像中人脸的姿态信息。
  8. 一种卷积神经网络参数的训练装置,其特征在于,所述方法还包括:
    样本获取模块,用于获取训练样本集,所述训练样本集中的训练样本图像包括人脸的位置数据及姿态数据;
    训练模块,用于根据所述训练样本集训练、训练算法和预设损失函数训练预设卷积神经网络模型,得到所述预设卷神经网络模型的网络参数的值,所述预设损失函数用于计算训练样本图像中人脸存在与否的损失、训练样本图像中人脸姿态的损失以及训练样本图像中人脸的位置数据所确定的区域的偏置量的 损失。
  9. 一种计算机装置,其特征在于,所述计算机装置包括:
    存储器,用于存储至少一个指令;及
    处理器,用于执行所述存储器中存储的指令以实现如权利要求1-5和/或权利要求6中任意一项所述的人脸检测方法。
  10. 一种计算机可读存储介质,其上存储有计算机指令,其特征在于:所述计算机指令被处理器执行时实现如权利要求1-5和/或权利要求6中任意一项所述的人脸检测方法。
PCT/CN2018/119188 2017-12-28 2018-12-04 人脸检测方法、卷积神经网络参数的训练方法、装置及介质 WO2019128646A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201711462096.3A CN108038474B (zh) 2017-12-28 2017-12-28 人脸检测方法、卷积神经网络参数的训练方法、装置及介质
CN201711462096.3 2017-12-28

Publications (1)

Publication Number Publication Date
WO2019128646A1 true WO2019128646A1 (zh) 2019-07-04

Family

ID=62097610

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2018/119188 WO2019128646A1 (zh) 2017-12-28 2018-12-04 人脸检测方法、卷积神经网络参数的训练方法、装置及介质

Country Status (2)

Country Link
CN (1) CN108038474B (zh)
WO (1) WO2019128646A1 (zh)

Cited By (72)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110516576A (zh) * 2019-08-20 2019-11-29 西安电子科技大学 基于深度神经网络的近红外活体人脸识别方法
CN110633689A (zh) * 2019-09-23 2019-12-31 天津天地基业科技有限公司 基于半监督注意力网络的人脸识别模型
CN110647865A (zh) * 2019-09-30 2020-01-03 腾讯科技(深圳)有限公司 人脸姿态的识别方法、装置、设备及存储介质
CN110705478A (zh) * 2019-09-30 2020-01-17 腾讯科技(深圳)有限公司 人脸跟踪方法、装置、设备及存储介质
CN110781856A (zh) * 2019-11-04 2020-02-11 浙江大华技术股份有限公司 异质人脸识别模型训练方法、人脸识别方法及相关装置
CN110826538A (zh) * 2019-12-06 2020-02-21 合肥科大智能机器人技术有限公司 一种用于电力营业厅的异常离岗识别系统
CN110826402A (zh) * 2019-09-27 2020-02-21 深圳市华付信息技术有限公司 一种基于多任务的人脸质量估计方法
CN110826519A (zh) * 2019-11-14 2020-02-21 深圳市华付信息技术有限公司 人脸遮挡检测方法、装置、计算机设备及存储介质
CN110866471A (zh) * 2019-10-31 2020-03-06 Oppo广东移动通信有限公司 人脸图像质量评价方法及装置、计算机可读介质、通信终端
CN110941986A (zh) * 2019-10-10 2020-03-31 平安科技(深圳)有限公司 活体检测模型的训练方法、装置、计算机设备和存储介质
CN110942072A (zh) * 2019-12-31 2020-03-31 北京迈格威科技有限公司 基于质量评估的质量分、检测模型训练、检测方法及装置
CN111008576A (zh) * 2019-11-22 2020-04-14 高创安邦(北京)技术有限公司 行人检测及其模型训练、更新方法、设备及可读存储介质
CN111046757A (zh) * 2019-11-27 2020-04-21 西安电子科技大学 人脸画像生成模型的训练方法、装置及相关设备
CN111062324A (zh) * 2019-12-17 2020-04-24 上海眼控科技股份有限公司 人脸检测方法、装置、计算机设备和存储介质
CN111079617A (zh) * 2019-12-10 2020-04-28 上海中信信息发展股份有限公司 家禽识别方法、装置、可读存储介质及电子设备
CN111160094A (zh) * 2019-11-26 2020-05-15 苏州方正璞华信息技术有限公司 一种跑步抓拍照片中选手识别方法及装置
CN111160108A (zh) * 2019-12-06 2020-05-15 华侨大学 一种无锚点的人脸检测方法及系统
CN111160368A (zh) * 2019-12-24 2020-05-15 中国建设银行股份有限公司 图像中目标检测方法、装置、设备及存储介质
CN111191599A (zh) * 2019-12-27 2020-05-22 平安国际智慧城市科技股份有限公司 姿态识别方法、装置、设备及存储介质
CN111241998A (zh) * 2020-01-09 2020-06-05 中移(杭州)信息技术有限公司 人脸识别的方法、装置、电子设备和存储介质
CN111275005A (zh) * 2020-02-21 2020-06-12 腾讯科技(深圳)有限公司 绘制人脸图像识别方法、计算机可读存储介质和相关设备
CN111401456A (zh) * 2020-03-20 2020-07-10 杭州涂鸦信息技术有限公司 人脸姿态识别模型的训练方法及其系统和装置
CN111428609A (zh) * 2020-03-19 2020-07-17 辽宁石油化工大学 一种基于深度学习的人体姿态识别方法及系统
CN111428682A (zh) * 2020-04-09 2020-07-17 上海东普信息科技有限公司 快件分拣方法、装置、设备及存储介质
CN111553420A (zh) * 2020-04-28 2020-08-18 北京邮电大学 一种基于神经网络的x线影像识别方法及装置
CN111563502A (zh) * 2020-05-09 2020-08-21 腾讯科技(深圳)有限公司 图像的文本识别方法、装置、电子设备及计算机存储介质
CN111583159A (zh) * 2020-05-29 2020-08-25 北京金山云网络技术有限公司 一种图像补全方法、装置及电子设备
CN111639596A (zh) * 2020-05-29 2020-09-08 上海锘科智能科技有限公司 基于注意力机制和残差网络的抗眼镜遮挡人脸识别方法
CN111652798A (zh) * 2020-05-26 2020-09-11 浙江大华技术股份有限公司 人脸姿态迁移方法和计算机存储介质
CN111680546A (zh) * 2020-04-26 2020-09-18 北京三快在线科技有限公司 注意力检测方法、装置、电子设备及存储介质
CN111753961A (zh) * 2020-06-26 2020-10-09 北京百度网讯科技有限公司 模型训练方法和装置、预测方法和装置
CN111783608A (zh) * 2020-06-24 2020-10-16 南京烽火星空通信发展有限公司 一种换脸视频检测方法
CN111814646A (zh) * 2020-06-30 2020-10-23 平安国际智慧城市科技股份有限公司 基于ai视觉的监控方法、装置、设备及介质
CN111881804A (zh) * 2020-07-22 2020-11-03 汇纳科技股份有限公司 基于联合训练的姿态估计模型训练方法、系统、介质及终端
CN111914812A (zh) * 2020-08-20 2020-11-10 腾讯科技(深圳)有限公司 图像处理模型训练方法、装置、设备及存储介质
CN111986255A (zh) * 2020-09-07 2020-11-24 北京凌云光技术集团有限责任公司 一种图像检测模型的多尺度anchor初始化方法与装置
CN111985458A (zh) * 2020-09-16 2020-11-24 深圳数联天下智能科技有限公司 一种检测多目标的方法、电子设备及存储介质
CN111985374A (zh) * 2020-08-12 2020-11-24 汉王科技股份有限公司 人脸定位方法、装置、电子设备及存储介质
CN112052805A (zh) * 2020-09-10 2020-12-08 深圳数联天下智能科技有限公司 人脸检测框显示方法、图像处理装置、设备和存储介质
CN112101088A (zh) * 2020-07-27 2020-12-18 长江大学 一种无人机电力自动巡检方法、装置及系统
CN112101185A (zh) * 2020-09-11 2020-12-18 深圳数联天下智能科技有限公司 一种训练皱纹检测模型的方法、电子设备及存储介质
CN112115783A (zh) * 2020-08-12 2020-12-22 中国科学院大学 基于深度知识迁移的人脸特征点检测方法、装置及设备
CN112132040A (zh) * 2020-09-24 2020-12-25 明见(厦门)软件开发有限公司 基于视觉的安全带实时监测方法、终端设备及存储介质
CN112241664A (zh) * 2019-07-18 2021-01-19 顺丰科技有限公司 人脸识别方法、装置、服务器及存储介质
CN112347843A (zh) * 2020-09-18 2021-02-09 深圳数联天下智能科技有限公司 一种训练皱纹检测模型的方法及相关装置
CN112418344A (zh) * 2020-12-07 2021-02-26 汇纳科技股份有限公司 一种训练方法、目标检测方法、介质及电子设备
CN112446376A (zh) * 2019-09-05 2021-03-05 中国科学院沈阳自动化研究所 一种工业图像智能分割压缩方法
CN112528903A (zh) * 2020-12-18 2021-03-19 平安银行股份有限公司 人脸图像获取方法、装置、电子设备及介质
CN112560687A (zh) * 2020-04-03 2021-03-26 上海应用技术大学 人脸识别方法
CN112766028A (zh) * 2019-11-06 2021-05-07 深圳云天励飞技术有限公司 人脸模糊处理方法、装置、电子设备及存储介质
CN112766186A (zh) * 2021-01-22 2021-05-07 北京工业大学 一种基于多任务学习的实时人脸检测及头部姿态估计方法
CN112767303A (zh) * 2020-08-12 2021-05-07 腾讯科技(深圳)有限公司 一种图像检测方法、装置、设备及计算机可读存储介质
CN112906446A (zh) * 2019-12-04 2021-06-04 深圳云天励飞技术有限公司 人脸检测方法、装置、电子设备及计算机可读存储介质
CN112989869A (zh) * 2019-12-02 2021-06-18 深圳云天励飞技术有限公司 人脸质量检测模型的优化方法、装置、设备及存储介质
CN113011387A (zh) * 2021-04-20 2021-06-22 上海商汤科技开发有限公司 网络训练及人脸活体检测方法、装置、设备及存储介质
CN113034602A (zh) * 2021-04-16 2021-06-25 电子科技大学中山学院 一种朝向角度分析方法、装置、电子设备及存储介质
CN113065379A (zh) * 2019-12-27 2021-07-02 深圳云天励飞技术有限公司 融合图像质量的图像检测方法、装置、电子设备
CN113139419A (zh) * 2020-12-28 2021-07-20 西安天和防务技术股份有限公司 一种无人机检测方法及装置
CN113191195A (zh) * 2021-04-01 2021-07-30 珠海全志科技股份有限公司 基于深度学习的人脸检测方法及系统
CN113210264A (zh) * 2021-05-19 2021-08-06 江苏鑫源烟草薄片有限公司 烟草杂物剔除方法及装置
CN113239885A (zh) * 2021-06-04 2021-08-10 新大陆数字技术股份有限公司 一种人脸检测与识别方法及系统
CN113283345A (zh) * 2021-05-27 2021-08-20 新东方教育科技集团有限公司 板书书写行为检测方法、训练方法、装置、介质及设备
CN113449539A (zh) * 2020-03-24 2021-09-28 顺丰科技有限公司 动物体信息提取模型的训练方法、装置、设备及存储介质
CN113657136A (zh) * 2020-05-12 2021-11-16 阿里巴巴集团控股有限公司 识别方法及装置
CN113705690A (zh) * 2021-08-30 2021-11-26 平安科技(深圳)有限公司 正脸定位方法、装置、电子设备及计算机可读存储介质
CN113781544A (zh) * 2020-07-14 2021-12-10 北京沃东天骏信息技术有限公司 平面检测方法及装置
CN114036594A (zh) * 2022-01-10 2022-02-11 季华实验室 一种工艺图像的生成方法、装置和电子设备
CN115147902A (zh) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 人脸活体检测模型的训练方法、装置及计算机程序产品
CN116092264A (zh) * 2021-10-29 2023-05-09 青岛海尔科技有限公司 跌倒提示方法及装置
CN116264016A (zh) * 2021-12-10 2023-06-16 中国科学院软件研究所 一种轻量的实时人脸检测和头部姿态估计方法及系统
CN116310669A (zh) * 2022-11-21 2023-06-23 湖北工业大学 基于多模态特征提取网络的目标检测方法、系统及设备
CN116403080A (zh) * 2023-06-09 2023-07-07 江西云眼视界科技股份有限公司 一种人脸聚类评价方法、系统、计算机及可读存储介质

Families Citing this family (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108038474B (zh) * 2017-12-28 2020-04-14 深圳励飞科技有限公司 人脸检测方法、卷积神经网络参数的训练方法、装置及介质
CN109117753B (zh) * 2018-07-24 2021-04-20 广州虎牙信息科技有限公司 部位识别方法、装置、终端及存储介质
CN110795976B (zh) * 2018-08-03 2023-05-05 华为云计算技术有限公司 一种训练物体检测模型的方法、装置以及设备
CN110197109B (zh) * 2018-08-17 2023-11-24 平安科技(深圳)有限公司 神经网络模型训练、人脸识别方法、装置、设备及介质
CN109359526B (zh) * 2018-09-11 2022-09-27 深圳大学 一种人脸姿态估计方法、装置和设备
CN111050271B (zh) 2018-10-12 2021-01-29 北京微播视界科技有限公司 用于处理音频信号的方法和装置
CN109598267A (zh) * 2018-11-15 2019-04-09 北京天融信网络安全技术有限公司 图像数据防泄漏方法、装置及设备
CN109376693A (zh) * 2018-11-22 2019-02-22 四川长虹电器股份有限公司 人脸检测方法及系统
CN111274848A (zh) * 2018-12-04 2020-06-12 北京嘀嘀无限科技发展有限公司 一种图像检测方法、装置、电子设备及存储介质
CN109902603A (zh) * 2019-02-18 2019-06-18 苏州清研微视电子科技有限公司 基于红外图像的驾驶员身份识别认证方法和系统
CN110197113B (zh) * 2019-03-28 2021-06-04 杰创智能科技股份有限公司 一种高精度锚点匹配策略的人脸检测方法
CN110210314B (zh) * 2019-05-06 2023-06-13 深圳华付技术股份有限公司 人脸检测方法、装置、计算机设备及存储介质
CN110321821B (zh) * 2019-06-24 2022-10-25 深圳爱莫科技有限公司 基于三维投影的人脸对齐初始化方法及装置、存储介质
CN110321844B (zh) * 2019-07-04 2021-09-03 北京万里红科技股份有限公司 一种基于卷积神经网络的快速虹膜检测方法
CN110458225A (zh) * 2019-08-08 2019-11-15 北京深醒科技有限公司 一种车辆检测和姿态分类联合识别方法
CN112487852A (zh) * 2019-09-12 2021-03-12 上海齐感电子信息科技有限公司 嵌入式设备的人脸检测方法及装置、存储介质、终端
CN110796029B (zh) * 2019-10-11 2022-11-11 北京达佳互联信息技术有限公司 人脸校正及模型训练方法、装置、电子设备及存储介质
CN110879972B (zh) * 2019-10-24 2022-07-26 深圳云天励飞技术有限公司 一种人脸检测方法及装置
CN112784858B (zh) * 2019-11-01 2024-04-30 北京搜狗科技发展有限公司 一种图像数据的处理方法、装置及电子设备
CN111209822A (zh) * 2019-12-30 2020-05-29 南京华图信息技术有限公司 一种热红外图像的人脸检测方法
CN111950567B (zh) * 2020-08-18 2024-04-09 创新奇智(成都)科技有限公司 一种提取器训练方法、装置、电子设备及存储介质
CN112418074B (zh) * 2020-11-20 2022-08-23 重庆邮电大学 一种基于自注意力的耦合姿态人脸识别方法
CN112712022B (zh) * 2020-12-29 2023-05-23 华南理工大学 基于图像识别的压力检测方法、系统、装置及存储介质
CN113688887A (zh) * 2021-08-13 2021-11-23 百度在线网络技术(北京)有限公司 图像识别模型的训练与图像识别方法、装置
CN114548132A (zh) * 2022-02-22 2022-05-27 广东奥普特科技股份有限公司 条形码检测模型的训练方法、装置及条形码检测方法、装置

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504376A (zh) * 2014-12-22 2015-04-08 厦门美图之家科技有限公司 一种人脸图像的年龄分类方法和系统
CN105760836A (zh) * 2016-02-17 2016-07-13 厦门美图之家科技有限公司 基于深度学习的多角度人脸对齐方法、系统及拍摄终端
CN107491771A (zh) * 2017-09-21 2017-12-19 百度在线网络技术(北京)有限公司 人脸检测方法和装置
CN108038474A (zh) * 2017-12-28 2018-05-15 深圳云天励飞技术有限公司 人脸检测方法、卷积神经网络参数的训练方法、装置及介质

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104504376A (zh) * 2014-12-22 2015-04-08 厦门美图之家科技有限公司 一种人脸图像的年龄分类方法和系统
CN105760836A (zh) * 2016-02-17 2016-07-13 厦门美图之家科技有限公司 基于深度学习的多角度人脸对齐方法、系统及拍摄终端
CN107491771A (zh) * 2017-09-21 2017-12-19 百度在线网络技术(北京)有限公司 人脸检测方法和装置
CN108038474A (zh) * 2017-12-28 2018-05-15 深圳云天励飞技术有限公司 人脸检测方法、卷积神经网络参数的训练方法、装置及介质

Cited By (122)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112241664A (zh) * 2019-07-18 2021-01-19 顺丰科技有限公司 人脸识别方法、装置、服务器及存储介质
CN110516576A (zh) * 2019-08-20 2019-11-29 西安电子科技大学 基于深度神经网络的近红外活体人脸识别方法
CN110516576B (zh) * 2019-08-20 2022-12-06 西安电子科技大学 基于深度神经网络的近红外活体人脸识别方法
CN112446376B (zh) * 2019-09-05 2023-08-01 中国科学院沈阳自动化研究所 一种工业图像智能分割压缩方法
CN112446376A (zh) * 2019-09-05 2021-03-05 中国科学院沈阳自动化研究所 一种工业图像智能分割压缩方法
CN110633689A (zh) * 2019-09-23 2019-12-31 天津天地基业科技有限公司 基于半监督注意力网络的人脸识别模型
CN110826402B (zh) * 2019-09-27 2024-03-29 深圳市华付信息技术有限公司 一种基于多任务的人脸质量估计方法
CN110826402A (zh) * 2019-09-27 2020-02-21 深圳市华付信息技术有限公司 一种基于多任务的人脸质量估计方法
CN110647865A (zh) * 2019-09-30 2020-01-03 腾讯科技(深圳)有限公司 人脸姿态的识别方法、装置、设备及存储介质
CN110647865B (zh) * 2019-09-30 2023-08-08 腾讯科技(深圳)有限公司 人脸姿态的识别方法、装置、设备及存储介质
CN110705478A (zh) * 2019-09-30 2020-01-17 腾讯科技(深圳)有限公司 人脸跟踪方法、装置、设备及存储介质
CN110941986A (zh) * 2019-10-10 2020-03-31 平安科技(深圳)有限公司 活体检测模型的训练方法、装置、计算机设备和存储介质
CN110941986B (zh) * 2019-10-10 2023-08-01 平安科技(深圳)有限公司 活体检测模型的训练方法、装置、计算机设备和存储介质
CN110866471A (zh) * 2019-10-31 2020-03-06 Oppo广东移动通信有限公司 人脸图像质量评价方法及装置、计算机可读介质、通信终端
CN110781856B (zh) * 2019-11-04 2023-12-19 浙江大华技术股份有限公司 异质人脸识别模型训练方法、人脸识别方法及相关装置
CN110781856A (zh) * 2019-11-04 2020-02-11 浙江大华技术股份有限公司 异质人脸识别模型训练方法、人脸识别方法及相关装置
CN112766028B (zh) * 2019-11-06 2024-05-03 深圳云天励飞技术有限公司 人脸模糊处理方法、装置、电子设备及存储介质
CN112766028A (zh) * 2019-11-06 2021-05-07 深圳云天励飞技术有限公司 人脸模糊处理方法、装置、电子设备及存储介质
CN110826519A (zh) * 2019-11-14 2020-02-21 深圳市华付信息技术有限公司 人脸遮挡检测方法、装置、计算机设备及存储介质
CN110826519B (zh) * 2019-11-14 2023-08-18 深圳华付技术股份有限公司 人脸遮挡检测方法、装置、计算机设备及存储介质
CN111008576B (zh) * 2019-11-22 2023-09-01 高创安邦(北京)技术有限公司 行人检测及其模型训练、更新方法、设备及可读存储介质
CN111008576A (zh) * 2019-11-22 2020-04-14 高创安邦(北京)技术有限公司 行人检测及其模型训练、更新方法、设备及可读存储介质
CN111160094A (zh) * 2019-11-26 2020-05-15 苏州方正璞华信息技术有限公司 一种跑步抓拍照片中选手识别方法及装置
CN111046757B (zh) * 2019-11-27 2024-03-05 西安电子科技大学 人脸画像生成模型的训练方法、装置及相关设备
CN111046757A (zh) * 2019-11-27 2020-04-21 西安电子科技大学 人脸画像生成模型的训练方法、装置及相关设备
CN112989869A (zh) * 2019-12-02 2021-06-18 深圳云天励飞技术有限公司 人脸质量检测模型的优化方法、装置、设备及存储介质
CN112989869B (zh) * 2019-12-02 2024-05-07 深圳云天励飞技术有限公司 人脸质量检测模型的优化方法、装置、设备及存储介质
CN112906446A (zh) * 2019-12-04 2021-06-04 深圳云天励飞技术有限公司 人脸检测方法、装置、电子设备及计算机可读存储介质
CN111160108A (zh) * 2019-12-06 2020-05-15 华侨大学 一种无锚点的人脸检测方法及系统
CN111160108B (zh) * 2019-12-06 2023-03-31 华侨大学 一种无锚点的人脸检测方法及系统
CN110826538B (zh) * 2019-12-06 2023-05-02 合肥科大智能机器人技术有限公司 一种用于电力营业厅的异常离岗识别系统
CN110826538A (zh) * 2019-12-06 2020-02-21 合肥科大智能机器人技术有限公司 一种用于电力营业厅的异常离岗识别系统
CN111079617B (zh) * 2019-12-10 2024-03-08 上海信联信息发展股份有限公司 家禽识别方法、装置、可读存储介质及电子设备
CN111079617A (zh) * 2019-12-10 2020-04-28 上海中信信息发展股份有限公司 家禽识别方法、装置、可读存储介质及电子设备
CN111062324A (zh) * 2019-12-17 2020-04-24 上海眼控科技股份有限公司 人脸检测方法、装置、计算机设备和存储介质
CN111160368A (zh) * 2019-12-24 2020-05-15 中国建设银行股份有限公司 图像中目标检测方法、装置、设备及存储介质
CN111191599A (zh) * 2019-12-27 2020-05-22 平安国际智慧城市科技股份有限公司 姿态识别方法、装置、设备及存储介质
CN113065379B (zh) * 2019-12-27 2024-05-07 深圳云天励飞技术有限公司 融合图像质量的图像检测方法、装置、电子设备
CN111191599B (zh) * 2019-12-27 2023-05-30 平安国际智慧城市科技股份有限公司 姿态识别方法、装置、设备及存储介质
CN113065379A (zh) * 2019-12-27 2021-07-02 深圳云天励飞技术有限公司 融合图像质量的图像检测方法、装置、电子设备
CN110942072B (zh) * 2019-12-31 2024-02-02 北京迈格威科技有限公司 基于质量评估的质量分、检测模型训练、检测方法及装置
CN110942072A (zh) * 2019-12-31 2020-03-31 北京迈格威科技有限公司 基于质量评估的质量分、检测模型训练、检测方法及装置
CN111241998A (zh) * 2020-01-09 2020-06-05 中移(杭州)信息技术有限公司 人脸识别的方法、装置、电子设备和存储介质
CN111241998B (zh) * 2020-01-09 2023-04-28 中移(杭州)信息技术有限公司 人脸识别的方法、装置、电子设备和存储介质
CN111275005B (zh) * 2020-02-21 2023-04-07 腾讯科技(深圳)有限公司 绘制人脸图像识别方法、计算机可读存储介质和相关设备
CN111275005A (zh) * 2020-02-21 2020-06-12 腾讯科技(深圳)有限公司 绘制人脸图像识别方法、计算机可读存储介质和相关设备
CN111428609A (zh) * 2020-03-19 2020-07-17 辽宁石油化工大学 一种基于深度学习的人体姿态识别方法及系统
CN111401456A (zh) * 2020-03-20 2020-07-10 杭州涂鸦信息技术有限公司 人脸姿态识别模型的训练方法及其系统和装置
CN111401456B (zh) * 2020-03-20 2023-08-22 杭州涂鸦信息技术有限公司 人脸姿态识别模型的训练方法及其系统和装置
CN113449539A (zh) * 2020-03-24 2021-09-28 顺丰科技有限公司 动物体信息提取模型的训练方法、装置、设备及存储介质
CN112560687A (zh) * 2020-04-03 2021-03-26 上海应用技术大学 人脸识别方法
CN112560687B (zh) * 2020-04-03 2024-01-30 上海应用技术大学 人脸识别方法
CN111428682B (zh) * 2020-04-09 2023-04-18 上海东普信息科技有限公司 快件分拣方法、装置、设备及存储介质
CN111428682A (zh) * 2020-04-09 2020-07-17 上海东普信息科技有限公司 快件分拣方法、装置、设备及存储介质
CN111680546A (zh) * 2020-04-26 2020-09-18 北京三快在线科技有限公司 注意力检测方法、装置、电子设备及存储介质
CN111553420A (zh) * 2020-04-28 2020-08-18 北京邮电大学 一种基于神经网络的x线影像识别方法及装置
CN111553420B (zh) * 2020-04-28 2023-08-15 北京邮电大学 一种基于神经网络的x线影像识别方法及装置
CN111563502A (zh) * 2020-05-09 2020-08-21 腾讯科技(深圳)有限公司 图像的文本识别方法、装置、电子设备及计算机存储介质
CN111563502B (zh) * 2020-05-09 2023-12-15 腾讯科技(深圳)有限公司 图像的文本识别方法、装置、电子设备及计算机存储介质
CN113657136B (zh) * 2020-05-12 2024-02-13 阿里巴巴集团控股有限公司 识别方法及装置
CN113657136A (zh) * 2020-05-12 2021-11-16 阿里巴巴集团控股有限公司 识别方法及装置
CN111652798B (zh) * 2020-05-26 2023-09-29 浙江大华技术股份有限公司 人脸姿态迁移方法和计算机存储介质
CN111652798A (zh) * 2020-05-26 2020-09-11 浙江大华技术股份有限公司 人脸姿态迁移方法和计算机存储介质
CN111583159A (zh) * 2020-05-29 2020-08-25 北京金山云网络技术有限公司 一种图像补全方法、装置及电子设备
CN111639596A (zh) * 2020-05-29 2020-09-08 上海锘科智能科技有限公司 基于注意力机制和残差网络的抗眼镜遮挡人脸识别方法
CN111639596B (zh) * 2020-05-29 2023-04-28 上海锘科智能科技有限公司 基于注意力机制和残差网络的抗眼镜遮挡人脸识别方法
CN111583159B (zh) * 2020-05-29 2024-01-05 北京金山云网络技术有限公司 一种图像补全方法、装置及电子设备
CN111783608A (zh) * 2020-06-24 2020-10-16 南京烽火星空通信发展有限公司 一种换脸视频检测方法
CN111783608B (zh) * 2020-06-24 2024-03-19 南京烽火星空通信发展有限公司 一种换脸视频检测方法
CN111753961B (zh) * 2020-06-26 2023-07-28 北京百度网讯科技有限公司 模型训练方法和装置、预测方法和装置
CN111753961A (zh) * 2020-06-26 2020-10-09 北京百度网讯科技有限公司 模型训练方法和装置、预测方法和装置
CN111814646B (zh) * 2020-06-30 2024-04-05 深圳平安智慧医健科技有限公司 基于ai视觉的监控方法、装置、设备及介质
CN111814646A (zh) * 2020-06-30 2020-10-23 平安国际智慧城市科技股份有限公司 基于ai视觉的监控方法、装置、设备及介质
CN113781544A (zh) * 2020-07-14 2021-12-10 北京沃东天骏信息技术有限公司 平面检测方法及装置
CN111881804B (zh) * 2020-07-22 2023-07-28 汇纳科技股份有限公司 基于联合训练的姿态估计模型训练方法、系统、介质及终端
CN111881804A (zh) * 2020-07-22 2020-11-03 汇纳科技股份有限公司 基于联合训练的姿态估计模型训练方法、系统、介质及终端
CN112101088A (zh) * 2020-07-27 2020-12-18 长江大学 一种无人机电力自动巡检方法、装置及系统
CN112101088B (zh) * 2020-07-27 2023-10-27 长江大学 一种无人机电力自动巡检方法、装置及系统
CN112115783B (zh) * 2020-08-12 2023-11-14 中国科学院大学 基于深度知识迁移的人脸特征点检测方法、装置及设备
CN111985374A (zh) * 2020-08-12 2020-11-24 汉王科技股份有限公司 人脸定位方法、装置、电子设备及存储介质
CN112767303B (zh) * 2020-08-12 2023-11-28 腾讯科技(深圳)有限公司 一种图像检测方法、装置、设备及计算机可读存储介质
CN111985374B (zh) * 2020-08-12 2022-11-15 汉王科技股份有限公司 人脸定位方法、装置、电子设备及存储介质
CN112115783A (zh) * 2020-08-12 2020-12-22 中国科学院大学 基于深度知识迁移的人脸特征点检测方法、装置及设备
CN112767303A (zh) * 2020-08-12 2021-05-07 腾讯科技(深圳)有限公司 一种图像检测方法、装置、设备及计算机可读存储介质
CN111914812A (zh) * 2020-08-20 2020-11-10 腾讯科技(深圳)有限公司 图像处理模型训练方法、装置、设备及存储介质
CN111914812B (zh) * 2020-08-20 2022-09-16 腾讯科技(深圳)有限公司 图像处理模型训练方法、装置、设备及存储介质
CN111986255B (zh) * 2020-09-07 2024-04-09 凌云光技术股份有限公司 一种图像检测模型的多尺度anchor初始化方法与装置
CN111986255A (zh) * 2020-09-07 2020-11-24 北京凌云光技术集团有限责任公司 一种图像检测模型的多尺度anchor初始化方法与装置
CN112052805A (zh) * 2020-09-10 2020-12-08 深圳数联天下智能科技有限公司 人脸检测框显示方法、图像处理装置、设备和存储介质
CN112052805B (zh) * 2020-09-10 2023-12-12 深圳数联天下智能科技有限公司 人脸检测框显示方法、图像处理装置、设备和存储介质
CN112101185A (zh) * 2020-09-11 2020-12-18 深圳数联天下智能科技有限公司 一种训练皱纹检测模型的方法、电子设备及存储介质
CN111985458B (zh) * 2020-09-16 2023-12-08 深圳数联天下智能科技有限公司 一种检测多目标的方法、电子设备及存储介质
CN111985458A (zh) * 2020-09-16 2020-11-24 深圳数联天下智能科技有限公司 一种检测多目标的方法、电子设备及存储介质
CN112347843A (zh) * 2020-09-18 2021-02-09 深圳数联天下智能科技有限公司 一种训练皱纹检测模型的方法及相关装置
CN112132040B (zh) * 2020-09-24 2024-03-15 明见(厦门)软件开发有限公司 基于视觉的安全带实时监测方法、终端设备及存储介质
CN112132040A (zh) * 2020-09-24 2020-12-25 明见(厦门)软件开发有限公司 基于视觉的安全带实时监测方法、终端设备及存储介质
CN112418344A (zh) * 2020-12-07 2021-02-26 汇纳科技股份有限公司 一种训练方法、目标检测方法、介质及电子设备
CN112418344B (zh) * 2020-12-07 2023-11-21 汇纳科技股份有限公司 一种训练方法、目标检测方法、介质及电子设备
CN112528903A (zh) * 2020-12-18 2021-03-19 平安银行股份有限公司 人脸图像获取方法、装置、电子设备及介质
CN112528903B (zh) * 2020-12-18 2023-10-31 平安银行股份有限公司 人脸图像获取方法、装置、电子设备及介质
CN113139419A (zh) * 2020-12-28 2021-07-20 西安天和防务技术股份有限公司 一种无人机检测方法及装置
CN112766186A (zh) * 2021-01-22 2021-05-07 北京工业大学 一种基于多任务学习的实时人脸检测及头部姿态估计方法
CN113191195A (zh) * 2021-04-01 2021-07-30 珠海全志科技股份有限公司 基于深度学习的人脸检测方法及系统
CN113034602A (zh) * 2021-04-16 2021-06-25 电子科技大学中山学院 一种朝向角度分析方法、装置、电子设备及存储介质
CN113034602B (zh) * 2021-04-16 2023-04-07 电子科技大学中山学院 一种朝向角度分析方法、装置、电子设备及存储介质
CN113011387A (zh) * 2021-04-20 2021-06-22 上海商汤科技开发有限公司 网络训练及人脸活体检测方法、装置、设备及存储介质
CN113210264A (zh) * 2021-05-19 2021-08-06 江苏鑫源烟草薄片有限公司 烟草杂物剔除方法及装置
CN113210264B (zh) * 2021-05-19 2023-09-05 江苏鑫源烟草薄片有限公司 烟草杂物剔除方法及装置
CN113283345B (zh) * 2021-05-27 2023-11-24 新东方教育科技集团有限公司 板书书写行为检测方法、训练方法、装置、介质及设备
CN113283345A (zh) * 2021-05-27 2021-08-20 新东方教育科技集团有限公司 板书书写行为检测方法、训练方法、装置、介质及设备
CN113239885A (zh) * 2021-06-04 2021-08-10 新大陆数字技术股份有限公司 一种人脸检测与识别方法及系统
CN113705690A (zh) * 2021-08-30 2021-11-26 平安科技(深圳)有限公司 正脸定位方法、装置、电子设备及计算机可读存储介质
CN113705690B (zh) * 2021-08-30 2024-02-27 平安科技(深圳)有限公司 正脸定位方法、装置、电子设备及计算机可读存储介质
CN116092264A (zh) * 2021-10-29 2023-05-09 青岛海尔科技有限公司 跌倒提示方法及装置
CN116264016A (zh) * 2021-12-10 2023-06-16 中国科学院软件研究所 一种轻量的实时人脸检测和头部姿态估计方法及系统
CN114036594A (zh) * 2022-01-10 2022-02-11 季华实验室 一种工艺图像的生成方法、装置和电子设备
CN115147902A (zh) * 2022-06-30 2022-10-04 北京百度网讯科技有限公司 人脸活体检测模型的训练方法、装置及计算机程序产品
CN115147902B (zh) * 2022-06-30 2023-11-07 北京百度网讯科技有限公司 人脸活体检测模型的训练方法、装置及计算机程序产品
CN116310669A (zh) * 2022-11-21 2023-06-23 湖北工业大学 基于多模态特征提取网络的目标检测方法、系统及设备
CN116310669B (zh) * 2022-11-21 2024-02-06 湖北工业大学 基于多模态特征提取网络的目标检测方法、系统及设备
CN116403080A (zh) * 2023-06-09 2023-07-07 江西云眼视界科技股份有限公司 一种人脸聚类评价方法、系统、计算机及可读存储介质
CN116403080B (zh) * 2023-06-09 2023-08-11 江西云眼视界科技股份有限公司 一种人脸聚类评价方法、系统、计算机及可读存储介质

Also Published As

Publication number Publication date
CN108038474A (zh) 2018-05-15
CN108038474B (zh) 2020-04-14

Similar Documents

Publication Publication Date Title
WO2019128646A1 (zh) 人脸检测方法、卷积神经网络参数的训练方法、装置及介质
CN109961009B (zh) 基于深度学习的行人检测方法、系统、装置及存储介质
US20230087526A1 (en) Neural network training method, image classification system, and related device
US9978003B2 (en) Utilizing deep learning for automatic digital image segmentation and stylization
CN110020592B (zh) 物体检测模型训练方法、装置、计算机设备及存储介质
US10467459B2 (en) Object detection based on joint feature extraction
WO2019232862A1 (zh) 嘴巴模型训练方法、嘴巴识别方法、装置、设备及介质
WO2019232866A1 (zh) 人眼模型训练方法、人眼识别方法、装置、设备及介质
WO2018205467A1 (zh) 车损部位的识别方法、系统、电子装置及存储介质
CN110197146B (zh) 基于深度学习的人脸图像分析方法、电子装置及存储介质
WO2018108129A1 (zh) 用于识别物体类别的方法及装置、电子设备
WO2019169532A1 (zh) 车牌识别方法及云系统
WO2018010657A1 (zh) 结构化文本检测方法和系统、计算设备
WO2021136027A1 (zh) 相似图像检测方法、装置、设备及存储介质
CN110222703B (zh) 图像轮廓识别方法、装置、设备和介质
WO2022021029A1 (zh) 检测模型训练方法、装置、检测模型使用方法及存储介质
US11893773B2 (en) Finger vein comparison method, computer equipment, and storage medium
US11417129B2 (en) Object identification image device, method, and computer program product
CN111552837A (zh) 基于深度学习的动物视频标签自动生成方法、终端及介质
WO2021196013A1 (zh) 单词识别方法、设备及存储介质
WO2019217562A1 (en) Aggregated image annotation
CN112541394A (zh) 黑眼圈及鼻炎识别方法、系统及计算机介质
US8630483B2 (en) Complex-object detection using a cascade of classifiers
US9104450B2 (en) Graphical user interface component classification
WO2021088505A1 (zh) 目标属性检测、神经网络训练及智能行驶方法、装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 18897044

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS (EPO FORM 1205A DATED 18.11.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 18897044

Country of ref document: EP

Kind code of ref document: A1