CN114821735A - Intelligent storage cabinet based on face recognition and voice recognition - Google Patents

Intelligent storage cabinet based on face recognition and voice recognition Download PDF

Info

Publication number
CN114821735A
CN114821735A CN202210519910.5A CN202210519910A CN114821735A CN 114821735 A CN114821735 A CN 114821735A CN 202210519910 A CN202210519910 A CN 202210519910A CN 114821735 A CN114821735 A CN 114821735A
Authority
CN
China
Prior art keywords
face
image
voice
recognition
formula
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Withdrawn
Application number
CN202210519910.5A
Other languages
Chinese (zh)
Inventor
阴皓
刘伯宇
姬发家
吴晨光
张菲菲
李路远
李琳
贾静丽
古明
王军义
杨扬
朱莹
赵曜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Original Assignee
State Grid Corp of China SGCC
Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by State Grid Corp of China SGCC, Information and Telecommunication Branch of State Grid Henan Electric Power Co Ltd filed Critical State Grid Corp of China SGCC
Priority to CN202210519910.5A priority Critical patent/CN114821735A/en
Publication of CN114821735A publication Critical patent/CN114821735A/en
Withdrawn legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/22Matching criteria, e.g. proximity measures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2411Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on the proximity to a decision surface, e.g. support vector machines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Health & Medical Sciences (AREA)
  • Software Systems (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Image Analysis (AREA)

Abstract

The invention discloses an intelligent locker based on face recognition and voice recognition, which comprises an intelligent locker system, wherein the intelligent locker system comprises a data acquisition system, a face recognition and matching system and a voice recognition and man-machine interaction system, the data acquisition system acquires face images of users and preprocesses the acquired images, the face recognition and matching system performs face recognition and face feature extraction on the acquired images to perform face recognition and matching, the voice recognition and man-machine interaction system performs voice acquisition, voice signal processing and voice recognition on voice commands sent by the users, and issues operating commands of deposit or fetching to the intelligent locker system according to the voice commands, so that the problems that the existing intelligent locker is complicated to operate, the deposit and fetch certificates are easy to lose, the biological characteristic certificates are easy to be polluted, the storage capacity of the existing intelligent locker is low, and the storage capacity of the existing intelligent locker is low, The safety is not enough.

Description

Intelligent storage cabinet based on face recognition and voice recognition
Technical Field
The invention relates to the field of artificial intelligence, in particular to an intelligent storage cabinet based on face recognition and voice recognition.
Background
The intelligent storage cabinet brings great convenience to people in daily life, can provide the service of temporarily storing articles in public places, and is widely applied to public learning and working places such as communities, logistics fields, large supermarkets, libraries, amusement parks, factories and company offices, however, along with the rapid development of scientific technology, people have higher requirements on the intelligent degree, the rapidness degree, the safety degree and the like of the storage cabinet, the conventional intelligent storage cabinet cannot meet the requirements of people on quickness, convenience and intelligence, and therefore the intelligent storage cabinet which is convenient to operate and high in safety is designed by utilizing the artificial intelligence technology, and has important practical significance.
Nowadays, the intelligent locker that is using in the market of china, like feng nest express delivery cabinet, the beijing dong carry the cabinet certainly, still stop to rely on internet technology to short message password or two-dimensional code are taken as the voucher of unblanking and are accessed, and complex operation still faces the problem that the voucher of unblanking loses and gets the thing difficulty, and the fingerprint intelligence locker based on biological characteristic is convenient for operation, and intelligent degree is high, but faces the problem that gets the thing voucher fingerprint and be polluted easily and lead to getting the thing difficulty.
The present invention therefore provides a new solution to this problem.
Disclosure of Invention
Aiming at the defects in the prior art, the invention aims to provide an intelligent locker based on face recognition and voice recognition, which effectively solves the problems of complicated operation, easy loss of access certificates, easy pollution of biological characteristic certificates and insufficient safety of the conventional intelligent locker.
The intelligent locker comprises an intelligent locker system, wherein the intelligent locker system comprises a data acquisition system, a face recognition and matching system and a voice recognition and man-machine interaction system;
the data acquisition system acquires a face image of a user and preprocesses the acquired image;
the face recognition and matching system performs face recognition and face feature extraction on the acquired images, and performs face recognition and matching;
the voice recognition and man-machine interaction system carries out voice acquisition, voice signal processing and voice recognition on a voice instruction sent by a user, and issues an operating instruction of storing or fetching objects to the intelligent locker system according to the voice instruction;
further, the data acquisition system acquires a face image of a user, pre-processes the acquired image, acquires the face image of the user by using a high-definition camera on the intelligent locker and performs graying processing on the image;
further, the face recognition and matching system performs face recognition and face feature extraction on the acquired image, and performs face recognition and matching, and the specific steps are as follows:
s1: the intelligent storage cabinet acquires a user face image picture by using a high-definition camera, performs face detection and positioning by using a direction gradient histogram, a sliding window detection mechanism and a linear classifier, detects and intercepts a face image, identifies whether the image is a face by using a Support Vector Machine (SVM) binary model, deletes the picture if the image is not the face, and prompts a user to acquire the image again; the SVM is a Support Vector Machine for short and represents a Support Vector Machine;
s2: preprocessing the face image acquired in the step S1 by utilizing a wavelet transform and an illumination invariance algorithm of a denoising model;
s3: inputting the face image preprocessed in the S2 into a trained Convolutional Neural Network (CNN) model to extract the face features in the image; the CNN is short for a conditional Neural Networks and represents a Convolutional Neural network;
s4: matching the facial features extracted in the S3 with the faces in the database by using an intelligent algorithm, carrying out face image recognition, and if the similarity with the images in the face database is less than a set threshold, determining that the matching fails;
further, the voice recognition and man-machine interaction system carries out voice acquisition, voice signal processing and voice recognition on a voice instruction sent by a user, and issues an operating instruction of storing objects or fetching objects to the intelligent locker system according to the voice instruction, and the steps are as follows:
a1: carrying out voice acquisition on a voice command sent by a user, carrying out digital signal conversion, high-pass digital filtering denoising, framing and windowing preprocessing operations on a voice signal, removing environmental noise from the voice signal, and converting the voice signal into a digital signal which can be processed by a computer;
a2: carrying out voice spectrum feature extraction on a voice signal by utilizing Fourier transform, and carrying out voice recognition by applying a Convolutional Neural Network (CNN) model;
a3: the intelligent storage cabinet makes corresponding storage, storage and box opening operation instructions according to voice instruction contents sent by a user, and carries out corresponding voice broadcast prompts;
a4: the intelligent storage cabinet sends a voice broadcast prompt for requesting a face to be aligned with the camera, acquires face image information of a user, judges whether the acquired picture is a face or not by using a face recognition and matching system, if not, returns to A3, sends a voice broadcast prompt for failing to acquire face information and requesting retry, and if so, sends a voice broadcast prompt for successful acquisition of face information, and then turns to A5 and A6;
a5: if the user selects to take the object, the face recognition and matching system is used for extracting the face features of the collected face image, recognizing the face and matching the face image, if the matching fails, the face recognition and matching system returns to A3, a voice broadcast prompt of face matching failure and retry is sent out, if the matching succeeds, the intelligent locker sends a voice broadcast prompt of taking the object by opening the box, the box is opened, a corresponding storage box is opened, and a voice broadcast prompt of asking the user to close the box door is sent out after taking the object;
a6: if the user selects the deposit, the face information is extracted by using the face recognition and matching system and uploaded to the database, a voice broadcast prompt of opening the box for deposit is sent out, an empty storage box is opened, and a voice broadcast prompt of closing the box door is sent out after the deposit is taken out.
Due to the adoption of the technical scheme, compared with the prior art, the invention has the following advantages:
the invention uses high-pass digital filter, Fourier transform, voice spectrum characteristic and voice recognition algorithm based on convolutional neural network CNN model to carry out man-machine interaction, the user only needs to speak out the instruction of stored or fetched object by voice in the range of microphone capable of receiving voice, the operation can be carried out without touching the interactive screen, the voice interaction is convenient and simple, the design greatly facilitates the access operation of the user under the background of new crown epidemic situation, and also meets the requirement of contactless access, furthermore, the high-definition camera collects the user face picture and then uses the direction gradient histogram (HOG characteristic), the linear classifier (SVM), the sliding window detection mechanism and the face recognition algorithm of convolutional neural network CNN model to carry out face information detection and recognition, the face information is used as access certificate, the problems that the access certificate is easy to lose and the biological certificate is easy to be polluted are solved, meanwhile, the requirement of non-contact operation is met, and the safety and the convenience of the storage cabinet are improved.
Drawings
Fig. 1 is an architecture diagram of an intelligent locker system based on face recognition and voice recognition according to an embodiment of the present invention.
Fig. 2 is a flowchart of a face recognition and matching system according to an embodiment of the present invention.
Fig. 3 is a flowchart of a speech recognition and human-computer interaction system according to an embodiment of the present invention.
Fig. 4 is a flowchart of an operation of the intelligent locker system based on face recognition and voice recognition according to an embodiment of the present invention.
Detailed Description
In order to achieve the foregoing and other technical and other features and advantages of the invention, reference is made to the accompanying drawings, in which reference is made to the following detailed description of the embodiments, which is illustrated in the accompanying drawings and described below, in which reference is made to the accompanying drawings.
Exemplary embodiments of the present invention will be described below with reference to the accompanying drawings.
An intelligent locker based on face recognition and voice recognition comprises an intelligent locker system, wherein the intelligent locker system comprises a data acquisition system, a face recognition and matching system and a voice recognition and man-machine interaction system;
the data acquisition system acquires a face image of a user and preprocesses the acquired image;
the face recognition and matching system performs face recognition and face feature extraction on the acquired images, and performs face recognition and matching;
the voice recognition and man-machine interaction system carries out voice acquisition, voice information processing and voice recognition on voice instructions sent by people, and issues operating instructions of deposit or fetching of the intelligent storage cabinet system according to the voice instructions.
Further, the data acquisition system acquires a face image of a user, pre-processes the acquired image, acquires the face image of the user by using a high-definition camera on the intelligent locker, and performs graying processing on the image, wherein the graying processing is to convert a color picture into a grayscale image, the color image has R, G, B three component channels, and a formula (1) for converting the color image into the grayscale image is shown as follows:
Gray=0.3·R+0.59·G+0.11·B (1)
in formula (1), Gray represents a Gray image, R represents a red channel of the image, G represents a green channel of the image, and B represents a blue channel of the image.
Further, the face recognition and matching system performs face recognition and face feature extraction on the acquired image, and performs face recognition and matching, and the specific steps are as follows:
s1: the intelligent storage cabinet collects a user face image picture by using a high-definition camera, performs face detection and positioning by using a direction gradient histogram, a sliding window detection mechanism and a linear classifier, detects the position of a face in the image, determines the size of the face image, and detects and intercepts the face image; identifying whether the image is a human face or not through a Support Vector Machine (SVM) binary classification model, and if not, deleting the image to prompt a user to acquire the image again; the SVM is a Support Vector Machine for short and represents a Support Vector Machine;
s2: preprocessing the face image acquired in the step S1 by using a wavelet transform and an illumination invariant algorithm of a denoising model, and highlighting characteristic information of the face image;
s3: inputting the face image preprocessed in the S2 into a trained Convolutional Neural Network (CNN) model to extract the face features in the image; the CNN is short for a conditional Neural Networks and represents a Convolutional Neural network;
s4: and matching the facial features extracted in the step S3 with the faces in the database by using an intelligent algorithm to perform face image recognition, and if the similarity between the facial features and the images in the face database is less than a set threshold, determining that the matching fails.
Further, in step S1, the directional gradient histogram, the sliding window detection mechanism, and the linear classifier are used to perform face detection and positioning, and detect and intercept a face image, and the specific steps are as follows:
s1.1: the collected human face sample image is corrected by Gamma, so that the problem that the shot light is too strong or too dark is solved, normalization of the whole image is completed, the purpose is to adjust the contrast of the image, reduce the influence caused by local illumination and shadow, and simultaneously reduce the interference of noise, and the Gamma correction formula is shown as formula (2):
G(x,y)=F(x,y) 1/γ (2)
in formula (2), G (x, y) represents the luminance of a pixel point with coordinates (x, y) in the processed image, F (x, y) represents the luminance of a pixel point with coordinates (x, y) in the original image, and γ is 0.5;
s1.2, calculating the gradient and the gradient direction of the face image subjected to color space normalization, calculating the gradient of each pixel in the horizontal direction and the vertical direction according to the following formulas, and calculating the gradient size and the gradient direction of each pixel position, wherein the gradient of the image in the horizontal direction and the gradient in the vertical direction at a pixel point (x, y) are shown in a formula (3):
Figure BDA0003641209600000051
g in formula (3) x (x, y) and G y (x, y) respectively representing the gradient values of the current pixel point (x, y) in the horizontal direction and the vertical direction, and then calculating the pixel point (x, y) according to the formula (4)Gradient magnitude and gradient direction:
Figure BDA0003641209600000061
in the formula (4), the first and second groups,
Figure BDA0003641209600000062
the gradient amplitude of the pixel point of the face sample image at the coordinate (x, y) is represented, and theta (x, y) represents the gradient direction of the pixel point of the face sample image at the coordinate (x, y);
s1.3, calculating weight projection of a face image based on a gradient direction, dividing the face image into a plurality of unit blocks which are different in size and are not overlapped with each other, namely statistical cell units (cells), performing histogram statistical characteristics on each cell, dividing by taking the gradient direction as a reference when performing the histogram statistical characteristics, calculating and accumulating gradient values falling into the same direction, equally dividing by 20 degrees within a range of 180 degrees, namely, HOG (histogram of oriented gradients) of one cell unit is a 9-dimensional vector, and HOG is short for histogram of oriented gradients and has the meaning of the histogram of oriented gradients;
s1.4, normalizing the histogram by using a normalization function of L2 norm L2-norm, counting a gradient histogram in each cell unit to form a descriptor of each cell unit, wherein cells form a larger descriptor called a Block, the histogram of the gradient direction of the Block is counted, feature vectors of four cells in one Block are connected in series to form the gradient direction histogram of the Block, according to the 9-dimensional HOG feature of one cell unit, the HOG feature of one Block is 4x 9-dimensional, due to the change of local illumination and the change of foreground and background contrast, the change range of the gradient strength is very large, so that local contrast normalization needs to be performed on the gradient, the strategy is to perform contrast normalization on each Block, and L2-norm is generally used, and the expression is shown in formula (5):
Figure BDA0003641209600000063
the normalized feature vector V is calculated as shown in equation (6):
Figure BDA0003641209600000064
wherein, V (V) 1 ,V 2 ,...,V n ) Representing the normalized feature vector, v (v) 1 ,v 2 ,...,v n ) Representing the normalized feature vector, epsilon represents a number greater than 0 and infinitely close to 0;
s1.5, connecting the feature vectors V of each image block after normalization in series to form a dimension with the size of
Figure BDA0003641209600000065
HOG feature vectors of the face image, wherein beta represents the number of pixel degree range bin in each unit, the bin represents the degree range of the pixel,
Figure BDA0003641209600000073
showing the number of image blocks in the human face sample image, and zeta shows the number of units in each image block;
s1.6, inputting the facial image feature vector obtained in the S1.5 into a trained SVM model face and non-face two-classifier to obtain a finally detected face positioning image, wherein the final goal of the SVM is to find an optimal hyperplane which firstly meets the classification requirement and simultaneously is a plane which maximizes the classification interval and is suitable for being used as a model for two-classification, giving a face training sample set, and setting a face training set sample as (x) 1 ,y 1 ),(x 2 ,y 2 ),K,(x n ,y n ) Wherein x is i As HOG multi-dimensional feature vector, y i If a hyperplane is defined as (w · x + b ═ 0), then the hyperplane must satisfy equation (7):
y i ((w·x+b))≥1,i=1,2,K,l. (7)
the constrained optimization problem thus derived is as follows:
Figure BDA0003641209600000071
in the formula 8, α i Refers to the lagrange multiplier;
equation (8) is a quadratic programming problem, which must be solved
Figure BDA0003641209600000074
And get w * ,b * As shown in formula (9):
Figure BDA0003641209600000072
so as to obtain the optimal classification hyperplane w * ·x+b * 0, and an optimal classification function f (x) sgn (w) * ·x+b * ) If f (x) is 1, it is determined as a face.
In the step S2, the face image collected in S1 is preprocessed by using a wavelet transform and a denoising model illumination invariant algorithm to highlight the feature information of the face image, and the specific steps are as follows:
s2.1, removing noise by applying wavelet transformation and an illumination invariance algorithm of a denoising model to the face image collected in S1, firstly carrying out logarithmic transformation on the face image G to obtain an image G', carrying out wavelet transformation on the image after logarithmic transformation to obtain a low-frequency coefficient LL i And a high-frequency coefficient matrix LH for each decomposition level i ,HL i ,HH i Multiplying the high-frequency coefficient matrix by a scaling parameter lambda (lambda is more than 0 and less than 1) to obtain a new high-frequency coefficient matrix LH i ',HL i ',HH i ', using unmodified low-frequency coefficients LL i And a new high-frequency coefficient matrix LH i ',HL i ',HH i Performing wavelet reconstruction to obtain an image L ', and finally obtaining an illumination invariant R' by using G '-L', namely, a de-noised image;
s2.2: performing gray level histogram equalization on the face de-noised image obtained in the step S2.1, performing gray level histogram equalization on the face de-noised image in order to increase the contrast of the face image and highlight face information, performing gray level histogram equalization on the face de-noised image, firstly counting the number of each pixel in a gray level in the whole image, and calculating the probability distribution of each gray level in the image, as shown in a formula (10):
Figure BDA0003641209600000081
in the formula (10), r k The representation is the k-th degree gray, L is the gray level, n k Is the grey level r in the image k N represents the total number of pixels in the image, i.e.
Figure BDA0003641209600000082
Then calculate the cumulative distribution probability s k As shown in formula (11):
Figure BDA0003641209600000083
at this time, equation (11) maps the gray levels to the domain [0,1], and in order to map the values to the gray values [0, L-1], a conversion is required as shown in equation (12):
Figure BDA0003641209600000084
in the formula (12), the first and second groups,
Figure BDA0003641209600000085
round is rounding down for the converted gray value;
the denoised enhanced face image can be obtained through step S2.
In step S3, the specific steps of extracting the face features in the image by using the convolutional neural network CNN are as follows:
s3.1: inputting the face image preprocessed by S2 into a trained CNN model, extracting features, and performing face recognition model training by using a convolutional neural network CNN, wherein the convolutional neural network CNN consists of convolutional layers, an excitation layer, a pooling layer and a full-link layer, each convolutional layer comprises a plurality of convolutional kernels, the convolutional kernels sweep an input feature map in sequence, and elements in the input feature matrix are multiplied and summed in a receptive field range of the input feature map, wherein the receptive field refers to the size of a region mapped by pixel points on the feature map output by each layer of the CNN on the input map, and the size of the region depends on the size of the convolutional kernels; the pooling layer is generally used for feature selection after the convolutional layer, and meanwhile, the dimensionality can be reduced, and the calculated amount is reduced; the input image is subjected to different calculation processing of three layers, namely a convolutional layer, an excitation layer and a pooling layer in a Convolutional Neural Network (CNN) structure, characteristic values of image data are extracted, and the data characteristic values of the image are calculated and classified by a full-connection layer according to parameters and weights;
s3.2, multiplying and summing each image block of the face image and the trained convolution kernel by using the convolution layer, dividing the input of the convolution layer into k two-dimensional m multiplied by m matrixes X in the calculation process, wherein the convolution kernel is k two-dimensional n multiplied by n matrixes W, performing convolution operation on the matrixes X and the matrixes W by a fixed step length until the characteristics of each position of the input matrix X are extracted to obtain a face characteristic matrix;
s3.3, calculating a feature matrix obtained by calculating the convolutional layer by using a RELU activation function by using the excitation layer to obtain a new feature matrix, wherein the RELU is a function which is short for a Rectified Linear Unit;
and S3.4, dividing a pixel matrix formed by calculation of the excitation layer into a plurality of regions with the same size and without overlapping with each other by using the pooling layer, extracting the maximum value of each region as a new element, and reducing the dimension of the feature matrix passing through the excitation layer, wherein the maximum value matrix is the feature matrix of the face image.
In step S4, the face features extracted in step S3 are matched with the face in the database by using an intelligent algorithm to perform face image recognition, and if the similarity to the image in the face database is smaller than a set threshold, the matching is considered to be failed, and the specific contents are as follows:
s4.1, the purpose of the full connection layer is classification, usually, the input is mapped to a label on the last layer of the CNN model, namely the full connection layer, the characteristic matrix obtained in the S3.3 is input into the full connection layer to be compressed, the final face recognition result is obtained, and if the image similarity probability of the recognition result is smaller than a set threshold value, the matching is considered to be failed;
and S4.2, updating the CNN model, adding the newly uploaded face image into CNN model training, and updating CNN model parameters.
Further, voice recognition and man-machine interaction system carries out voice acquisition, speech information processing, speech recognition to the voice command that the user sent to according to the operating instruction of speech command to giving deposit or getting the thing to intelligent locker system, speech recognition and man-machine interaction system accomplish and utilize pronunciation to realize man-machine interaction, the step is as follows:
a1: carrying out voice acquisition on a voice command sent by a user, carrying out preprocessing operations such as digital signal conversion, high-pass digital filtering denoising, framing, windowing and the like on a voice signal, removing environmental noise from the voice signal, and converting the voice signal into a digital signal which can be processed by a computer;
a2: carrying out voice spectrum feature extraction on a voice signal by utilizing Fourier transform, and carrying out voice recognition by applying a Convolutional Neural Network (CNN) model;
a3: the intelligent storage cabinet makes corresponding operation instructions such as storage, fetching and unpacking according to the voice instruction content sent by the user, and gives corresponding voice broadcast prompt to the user;
a4: the intelligent storage cabinet sends a voice broadcast prompt for requesting a face to be aligned with the camera, acquires face image information of a user, judges whether the acquired picture is a face or not by using a face recognition and matching system, if not, returns to A3, sends a voice broadcast prompt for failing to acquire face information and requesting retry, and if so, sends a voice broadcast prompt for successful acquisition of face information, and then turns to A5 and A6;
a5: if the user selects to take the object, the face recognition and matching system is used for extracting the face features of the collected face image, the face recognition is used for matching, if the matching fails, the A3 is returned, a voice broadcast prompt of face matching failure and retry is sent, if the matching succeeds, the intelligent storage cabinet sends a voice broadcast prompt of taking the object by opening the box, the box is opened, a corresponding storage box is opened, and a voice broadcast prompt of asking for closing the box door is sent after the object is taken;
a6: if the user selects the deposit, the face information is extracted by using the face recognition and matching system and uploaded to the database, a voice broadcast prompt of opening the box for deposit is sent out, the vacant storage box is opened, and a voice broadcast prompt of closing the box door is sent out after the deposit is taken out.
The step a1 is to perform voice acquisition on a voice command sent by a user, perform preprocessing operations such as digital signal conversion, high-pass digital filtering and denoising, framing, windowing and the like on a voice signal, remove environmental noise from the voice signal, and convert the voice signal into a digital signal that can be processed by a computer, and specifically includes the following steps:
a1.1: recording voice by using a microphone, converting the voice signal into a digital signal which can be recognized by a computer, and performing digital-to-analog conversion on the voice signal, wherein the digital-to-analog conversion comprises two parts of sampling and quantization, the sampling and quantization are used for recording the height of sound waves at equidistant points, and the sampling frequency, namely the sampling number per second, is set in the mathematical process;
a1.2: the method is characterized in that a voice digital signal is preprocessed, and in order to better perform voice recognition, parts such as environmental noise or aliasing and higher harmonic distortion caused by a collecting and recording device need to be removed, and the parts are realized by adopting a high-pass digital filter, as shown in formula (13):
H(z)=1-a·z -1 (13)
in equation (13), z is a digital signal, a is a pre-emphasis coefficient, and (0.9 < a < 1.0), the speech sample value is set to x (n), and the output after high-pass digital filtering is shown in equation (14):
y(n)=x(n)-a·x(n-1) (14)
in the formula (14), n is a sampling time, and a is 0.975;
a1.3: performing framing and windowing on the voice digital signal output in the step A1.2 to obtain a clean digital voice signal, setting the frame length to be 20ms, and setting the frame shift to be 10ms, framing the signal, because the signal is easily interrupted before the frame and the frame after framing, in order to smooth the signal, a window function needs to be selected for multiplication, and the definition of a Hamming window is shown as a formula (15):
Figure BDA0003641209600000111
in the formula (15), k is the width of the window function, i.e. the frame length, and N is the number of points of fourier transform;
the step A2: the method comprises the following steps of utilizing Fourier transform to extract voice spectrum characteristics of a voice signal, and applying a Convolutional Neural Network (CNN) model to perform voice recognition, wherein the method comprises the following specific steps:
a2.1, converting the voice digital signal preprocessed by the A1 from a time domain to a frequency domain, setting the voice digital signal as x (n), and performing frequency domain conversion by using Fourier transform, wherein the formula is shown as (16):
Figure BDA0003641209600000112
in the formula (16), N is the number of points of fourier transform, and a speech digital signal spectrogram is drawn by using color to express an amplitude value by taking a horizontal axis as time and a vertical axis as frequency of a signal after fourier transform;
a2.2, outputting a voice recognition result, taking the spectrogram extracted in A2.1 as the input of a voice recognition model, obtaining the probability that each frame of voice signal obtained in an acoustic model is mapped to a phoneme, obtaining the probability of a word sequence in a language model, then combining the acoustic model and the language model with a dictionary to obtain the probabilities of voice and character strings, wherein the highest probability is the recognition result, when the model is trained, a common short command for controlling an intelligent storage cabinet is used as a corpus, and a corresponding note is established, inputting the spectrogram extracted in A2.1 into a trained CNN model, extracting the characteristics of the spectrogram, and training the voice recognition model by using a convolutional neural network CNN; the convolutional neural network CNN comprises a convolutional layer, an excitation layer, a pooling layer and a fully-connected layer, wherein each convolutional layer comprises a plurality of convolutional kernels, the convolutional kernels sequentially sweep an input feature map, elements in the input feature matrix are multiplied and summed within the scope of a receptive field of the convolutional kernels, the receptive field refers to the size of a region mapped by pixel points on the feature map output by each layer of the CNN on the input map, the size of the region depends on the size of the convolutional kernels, the excitation layer utilizes a nonlinear function to nonlinearize the feature function, the pooling layer is generally used for feature selection after the excitation layer, meanwhile, the dimensionality can be reduced, the calculated amount is reduced, the fully-connected layer is used for classification, the input is generally mapped to a label in the last fully-connected layer of the CNN model, the input spectrogram is subjected to different calculation processing of three layers of the convolutional, the excitation layer and the pooling layer in the convolutional layer neural network CNN structure, extracting characteristic values of spectrogram data, wherein a convolution layer multiplies and sums each image block of a spectrogram with a trained convolution kernel, in the calculation process, the input of the convolution layer is divided into k two-dimensional m × m matrixes X, the convolution kernel is k two-dimensional n × n matrixes W, the matrixes X and the matrixes W are subjected to convolution operation with fixed step length until the characteristics of each position of the input matrix X are extracted to obtain spectrogram characteristic matrixes, an excitation layer calculates the characteristic matrixes obtained by the computation of the convolution layer by using a RELU activation function to obtain new characteristic matrixes, finally, a pooling layer divides pixel matrixes formed by the computation of the excitation layer into a plurality of areas which have the same size and are mutually overlapped, extracts the maximum value of each area as a new element, reduces the dimension of the characteristic matrixes behind the excitation layer, and the maximum value matrix is the characteristic matrix of the spectrogram image, finally, inputting the feature matrix obtained by the pooling layer into the full-connection layer, and calculating a classification result;
and A2.3, according to the received voice recognition result, the intelligent storage cabinet performs corresponding actions such as opening the box after receiving the voice command, and sends a corresponding voice broadcast prompt to the user.
When the intelligent storage cabinet is used specifically, the intelligent storage cabinet comprises an intelligent storage cabinet system, and the intelligent storage cabinet system comprises a data acquisition system, a face recognition and matching system and a voice recognition and man-machine interaction system; the data acquisition system acquires a face image of a user and preprocesses the acquired image; the face recognition and matching system performs face recognition and face feature extraction on the acquired images, and performs face recognition and matching; speech recognition carries out speech acquisition, speech information processing, speech recognition with man-machine interactive system to the voice command that sends the people to give the operating instruction of deposit or getting the thing to intelligent locker system according to voice command, solved current intelligent locker operation numerous and diverse effectively, the access voucher easily loses, the easy problem polluted of biological characteristic voucher, the security is not enough.

Claims (8)

1. The intelligent locker based on face recognition and voice recognition is characterized by comprising an intelligent locker system, wherein the intelligent locker system comprises a data acquisition system, a face recognition and matching system and a voice recognition and man-machine interaction system;
the data acquisition system acquires a face image of a user and preprocesses the acquired image;
the face recognition and matching system performs face recognition and face feature extraction on the acquired images, and performs face recognition and matching;
the voice recognition and man-machine interaction system carries out voice acquisition, voice signal processing and voice recognition on a voice instruction sent by a user, and issues an operating instruction of storing and fetching objects to the intelligent storage cabinet system according to the voice instruction.
2. The intelligent storage cabinet based on face recognition and voice recognition as claimed in claim 1, wherein the data acquisition system acquires a face image of a user, and pre-processes the acquired image, specifically comprising: the method comprises the following steps of collecting a face image of a user by using a camera on an intelligent storage cabinet, carrying out gray processing on the image, converting a color image into a gray image by the gray processing, wherein the color image has R, G, B component channels, and the formula for converting the color image into the gray image is shown as a formula (1):
Gray=0.3·R+0.59·G+0.11·B (1)
in formula (1), Gray represents a Gray image, R represents a red channel of the image, G represents a green channel of the image, and B represents a blue channel of the image.
3. The intelligent storage cabinet based on face recognition and voice recognition as claimed in claim 1, wherein the face recognition and matching system performs face recognition and face feature extraction on the collected images, and performs face recognition and matching, and the specific steps are as follows:
s1: the intelligent storage cabinet acquires a user face image picture by using a camera, performs face detection and positioning by using a direction gradient histogram, a sliding window detection mechanism and a linear classifier, detects and intercepts a face image, identifies whether the image is a face by using a Support Vector Machine (SVM) two-classification model, deletes the picture if the image is not the face, and prompts a user to acquire the image again; the SVM is a Support Vector Machine for short and represents a Support Vector Machine;
s2: preprocessing the face image acquired in the step S1 by utilizing a wavelet transform and an illumination invariance algorithm of a denoising model;
s3: inputting the face image preprocessed in the S2 into a trained Convolutional Neural Network (CNN) model to extract the face features in the image; the CNN is short for a conditional Neural Networks and represents a Convolutional Neural network;
s4: and matching the facial features extracted in the step S3 with the faces in the database, carrying out facial image recognition, and if the similarity between the facial features and the images in the face database is less than a set threshold, determining that the matching fails.
4. The intelligent cabinet based on face recognition and voice recognition as claimed in claim 3, wherein the step S1 is performed by using histogram of oriented gradient, sliding window detection mechanism, linear classifier to detect and intercept the face image, and the specific steps are as follows:
s1.1: the collected face sample image is corrected by using Gamma, and the Gamma correction formula is shown as the formula (2):
G(x,y)=F(x,y) 1/γ (2)
in formula (2), G (x, y) represents the luminance of a pixel point with coordinates (x, y) in the processed image, F (x, y) represents the luminance of a pixel point with coordinates (x, y) in the original image, and γ is 0.5;
s1.2, calculating the gradient and the gradient direction of the face image subjected to color space normalization, calculating the gradient of each pixel in the horizontal direction and the vertical direction, and calculating the gradient size and the gradient direction of each pixel position, wherein the gradients of the image in the horizontal direction and the vertical direction at pixel points (x, y) are shown in a formula (3):
Figure FDA0003641209590000021
g in formula (3) x (x, y) and G y (x, y) respectively representing the gradient values in the horizontal direction and the vertical direction at the current pixel point (x, y), and then calculating the gradient amplitude and the gradient direction at the pixel point (x, y) according to a formula (4):
Figure FDA0003641209590000022
in the formula (4), v (x, y) represents the gradient amplitude of a pixel point of the face sample image at the coordinate (x, y), and θ (x, y) represents the gradient direction of the pixel point of the face sample image at the coordinate (x, y);
s1.3, calculating weight projection of a face image based on a gradient direction, dividing the face image into a plurality of unit blocks which are different in size and are not overlapped with each other, namely statistical cell units (cells), performing histogram statistical characteristics on each cell, dividing by taking the gradient direction as a reference during the histogram statistical characteristics, calculating and accumulating gradient values falling into the same direction, and equally dividing by 20 degrees within a range of 180 degrees, namely the HOG characteristic of the directional gradient histogram of one cell unit is a 9-dimensional vector, the HOG is a short name of histogram of oriented gradients, and the meaning is a directional gradient histogram;
s1.4, normalizing the histogram by using a normalization function of an L2 norm L2-norm, counting a gradient histogram in each cell unit to form a descriptor of each cell unit, wherein the descriptor is a larger descriptor consisting of cells and is called a Block, counting the gradient direction histogram of the Block, connecting feature vectors of four cells in one Block in series to form the gradient direction histogram of the Block, performing local contrast normalization on the gradient according to the condition that one cell unit is a 9-dimensional HOG feature and the HOG feature of one Block is 36-dimensional, performing contrast normalization on each Block, and using L2-norm, wherein the expression is shown as formula (5):
Figure FDA0003641209590000031
the normalized feature vector V is calculated as shown in equation (6):
Figure FDA0003641209590000032
wherein, V (V) 1 ,V 2 ,...,V n ) Representing the normalized feature vector, v (v) 1 ,v 2 ,...,v n ) Representing the normalized feature vector, epsilon represents a number greater than 0 and infinitely close to 0;
s1.5, connecting the feature vectors V of each image block after normalization in series to form a dimension with the size of
Figure FDA0003641209590000033
HOG feature vectors of the face image, wherein beta represents the number of pixel degree range bin in each unit, the bin represents the degree range of the pixel,
Figure FDA0003641209590000034
showing the number of image blocks in the human face sample image, and zeta shows the number of units in each image block;
s1.6, inputting the facial image feature vector obtained in the step S1.5 into a trained SVM model human face and non-human face two-classifier to obtain the finally detected humanThe face positioning image is used for finding an optimal hyperplane, the hyperplane firstly needs to meet the classification requirement and is a plane for maximizing the classification interval, a face training sample set is given, and the face training sample set is set as (x) 1 ,y 1 ),(x 2 ,y 2 ),K,(x n ,y n ) Wherein x is i As HOG multi-dimensional feature vector, y i If a hyperplane is defined as (w · x + b ═ 0), then the hyperplane must satisfy equation (7):
y i ((w·x+b))≥1,i=1,2,K,l (7)
the constraint optimization problem is as follows:
Figure FDA0003641209590000041
in the formula 8, α i Refers to the lagrange multiplier;
equation (8) is a quadratic programming problem, which must be solved
Figure FDA0003641209590000042
To obtain w * ,b * As shown in formula (9):
Figure FDA0003641209590000043
obtaining an optimal classification hyperplane w * ·x+b * 0, and an optimal classification function f (x) sgn (w) * ·x+b * ) If f (x) is 1, it is determined as a face.
5. The intelligent cabinet based on face recognition and voice recognition as claimed in claim 3, wherein in step S2, the face image collected in S1 is preprocessed by the illumination invariant algorithm of wavelet transform and denoising model, which includes the following specific steps:
s2.1 wavelet transform and denoising model for face image collected in S1Removing noise by using an illumination invariant algorithm, firstly carrying out logarithmic transformation on a face image G to obtain an image G', carrying out wavelet transformation on the image subjected to logarithmic transformation to obtain a low-frequency coefficient LL i And a high-frequency coefficient matrix LH for each decomposition level i ,HL i ,HH i Multiplying the high-frequency coefficient matrix by a scaling parameter lambda (lambda is more than 0 and less than 1) to obtain a new high-frequency coefficient matrix LH i ',HL i ',HH i ', using unmodified low-frequency coefficients LL i And a new high-frequency coefficient matrix LH i ',HL i ',HH i Performing wavelet reconstruction to obtain an image L ', and finally obtaining an illumination invariant R' by using G '-L', namely, a de-noised image;
s2.2: performing gray histogram equalization on the face de-noised image obtained in the step S2.1, firstly counting the number of each pixel in the whole image in gray levels, and calculating the probability distribution of each gray level in the image, as shown in a formula (10):
Figure FDA0003641209590000051
in the formula (10), r k Representing the k-th degree of gray, L being the gray level, n k Is the grey level r in the image k N represents the total number of pixels in the image, i.e.
Figure FDA0003641209590000052
Then calculate the cumulative distribution probability s k As shown in formula (11):
Figure FDA0003641209590000053
equation (11) maps gray levels to the domain [0,1], maps values to gray values [0, L-1], and requires a conversion as shown in equation (12):
Figure FDA0003641209590000054
in the formula (12), the first and second groups,
Figure FDA0003641209590000055
round is rounding down for the converted gray value;
the denoised enhanced face image is obtained through step S2.
6. The intelligent cabinet based on face recognition and voice recognition as claimed in claim 3, wherein in step S3, the face image preprocessed in S2 is input into a trained convolutional neural network CNN model for face feature extraction in the image, and the specific steps are as follows:
s3.1: inputting the face image preprocessed by S2 into a trained CNN model, extracting features, performing face recognition model training by using a convolutional neural network CNN, wherein the convolutional neural network CNN comprises convolutional layers, excitation layers, pooling layers and full-connection layers, each convolutional layer comprises a plurality of convolutional kernels, the convolutional kernels sweep the input feature map in sequence, elements in the input feature matrix are multiplied and summed in the receptive field range of the input feature map, wherein the receptive field refers to the area size of the mapping of the pixel points on the characteristic diagram output by each layer of the CNN on the input image, the size of the region depends on the size of a convolution kernel, a pooling layer is used for carrying out feature selection after the convolution layer, an input image is subjected to different calculation processing of the convolution layer, an excitation layer and the pooling layer in a Convolutional Neural Network (CNN) structure, feature values of image data are extracted, and a full-connection layer carries out calculation classification on the data feature values of the image according to parameters and weights;
s3.2, multiplying and summing each image block of the face image and the trained convolution kernel by using the convolution layer, dividing the input of the convolution layer into k two-dimensional m X m matrixes X, wherein the convolution kernel is k two-dimensional n X n matrixes W, and performing convolution operation on the matrixes X and the matrixes W by a fixed step length until the characteristics of each position of the input matrix X are extracted to obtain a face characteristic matrix;
s3.3, calculating a feature matrix obtained by calculating the convolutional layer by using a RELU activation function by using the excitation layer to obtain a new feature matrix, wherein the RELU is a function which is short for a Rectified Linear Unit;
and S3.4, dividing a pixel matrix formed by calculation of the excitation layer into a plurality of regions with the same size and without overlapping with each other by using the pooling layer, extracting the maximum value of each region as a new element, and reducing the dimension of the feature matrix passing through the excitation layer, wherein the maximum value matrix is the feature matrix of the face image.
7. The intelligent cabinet based on face recognition and voice recognition as claimed in claim 3, wherein in step S4, the extracted face features of S3 are matched with the faces in the database for face image recognition, and if the similarity with the images in the face database is less than a set threshold, the matching is considered to be failed, specifically:
s4.1, mapping input to a label on the last layer of the CNN model, namely a full-link layer, inputting the characteristic matrix obtained in the S3.3 into the full-link layer for compression to obtain a final face recognition result, and if the similarity probability of the recognition result images is smaller than a set threshold value, determining that matching fails;
and S4.2, updating the CNN model, adding the newly uploaded face image into CNN model training, and updating CNN model parameters.
8. The intelligent locker based on face recognition and voice recognition as claimed in claim 1, wherein the voice recognition and man-machine interaction system performs voice acquisition, voice signal processing and voice recognition on the voice command sent by the user, and issues an operation command of storing or fetching objects to the intelligent locker system according to the voice command, and the steps are as follows:
a1: carrying out voice acquisition on a voice command sent by a user, carrying out digital signal conversion, high-pass digital filtering denoising, framing and windowing preprocessing operations on a voice signal, removing environmental noise from the voice signal, and converting the voice signal into a digital signal which can be processed by a computer;
a2: performing voice spectrum feature extraction on a voice signal by utilizing Fourier transform, and performing voice recognition by utilizing a Convolutional Neural Network (CNN) model;
a3: the intelligent storage cabinet makes corresponding storage, storage and box opening operation instructions according to voice instruction contents sent by a user, and carries out corresponding voice broadcast prompts;
a4: the intelligent storage cabinet sends a voice broadcast prompt for requesting a face to be aligned with the camera, acquires face image information of a user, judges whether the acquired picture is a face or not by using a face recognition and matching system, if not, returns to A3, sends a voice broadcast prompt for failing to acquire face information and requesting retry, and if so, sends a voice broadcast prompt for successful acquisition of face information, and then turns to A5 and A6;
a5: if the user selects to take the object, the face recognition and matching system is used for extracting the face features of the collected face image, the face recognition is used for matching, if the matching fails, the A3 is returned, a voice broadcast prompt of face matching failure and retry is sent, if the matching succeeds, the intelligent storage cabinet sends a voice broadcast prompt of taking the object by opening the box, the box is opened, a corresponding storage box is opened, and a voice broadcast prompt of asking for closing the box door is sent after the object is taken;
a6: if the user selects the deposit, extracting face information by using a face recognition and matching system and uploading the face information to a database, sending a voice broadcast prompt of opening the deposit for the user, opening an empty storage box, and sending a voice broadcast prompt of closing a box door after the deposit is taken;
the step a1 is to perform voice acquisition on a voice command sent by a user, perform digital signal conversion, high-pass digital filtering and denoising, framing and windowing preprocessing operations on a voice signal, remove environmental noise from the voice signal, and convert the voice signal into a digital signal that can be processed by a computer, and specifically includes the following steps:
a1.1: recording voice by using a microphone, converting the voice signal into a digital signal recognized by a computer, and performing digital-to-analog conversion on the voice signal, wherein the digital-to-analog conversion comprises two parts of sampling and quantization, the sampling and quantization are used for recording the height of a sound wave at an equidistant point, and the sampling frequency, namely the sampling number per second, is set in the mathematical process;
a1.2: preprocessing a voice digital signal, removing environmental noise or aliasing and higher harmonic distortion parts caused by acquisition and recording equipment, and adopting a high-pass digital filter, as shown in a formula (13):
H(z)=1-a·z -1 (13)
in equation (13), z is a digital signal, a is a pre-emphasis coefficient, and (0.9 < a < 1.0), the speech sample value is set to x (n), and the output after high-pass digital filtering is shown in equation (14):
y(n)=x(n)-a·x(n-1) (14)
in the formula (14), n is a sampling time, and a is 0.975;
a1.3: performing framing and windowing processing on the voice digital signal output in A1.2 to obtain a clean digital voice signal, setting the frame length to be 20ms and the frame shift to be 10ms, framing the signal, smoothing the signal, and multiplying by a window function, wherein a Hamming window is defined as a formula (15):
Figure FDA0003641209590000081
in the formula (15), k is the width of the window function, i.e. the frame length, and N is the number of points of fourier transform;
the step A2: the method comprises the following steps of utilizing Fourier transform to extract voice spectrum characteristics of a voice signal, and applying a Convolutional Neural Network (CNN) model to perform voice recognition, wherein the method comprises the following specific steps:
a2.1, converting the voice digital signal preprocessed by the A1 from a time domain to a frequency domain, setting the voice digital signal as x (n), and performing frequency domain conversion by using Fourier transform, wherein the formula is shown as (16):
Figure FDA0003641209590000082
in the formula (16), N is the number of points of fourier transform, and a speech digital signal spectrogram is drawn by using color to express an amplitude value by taking a horizontal axis as time and a vertical axis as frequency of a signal after fourier transform;
a2.2, outputting a voice recognition result, taking the spectrogram extracted in A2.1 as the input of a voice recognition model, mapping each frame of voice signal obtained in an acoustic model to the probability of phonemes, obtaining the probability of word sequences in a language model, combining the acoustic model and the language model with a dictionary to obtain the probabilities of voice and character strings, wherein the highest probability is the recognition result, when the model is trained, using a short command for controlling an intelligent storage cabinet as a corpus and establishing a corresponding note, inputting the spectrogram extracted in A2.1 into a trained CNN model for carrying out voice spectrogram feature extraction, carrying out voice recognition model training by using a convolutional neural network CNN, carrying out different calculation processing on three layers of a convolutional layer, an excitation layer and a pooling layer in the structure of the convolutional neural network CNN to extract a feature value of spectrogram data, wherein the convolutional layer multiplies each image block of the convolutional spectrogram by the trained image block, dividing the input of a convolutional layer into k two-dimensional m X m matrixes X, wherein a convolutional core is k two-dimensional n X n matrixes W, performing convolutional operation on the matrixes X and the matrixes W with a fixed step length until the characteristics of all positions of the input matrix X are extracted to obtain a spectrogram characteristic matrix, calculating the characteristic matrix obtained by calculating the convolutional layer by using a RELU (remote location reference) activation function by using an excitation layer to obtain a new characteristic matrix, dividing a pixel matrix formed by calculating the excitation layer into a plurality of regions which are the same in size and do not overlap with each other by a pooling layer, extracting the maximum value of each region as a new element, reducing the dimension of the characteristic matrix behind the excitation layer, wherein the maximum value matrix is the characteristic matrix of a spectrogram image, and finally inputting the characteristic matrix obtained by the pooling layer into a full-connection layer to calculate a classification result;
and A2.3, according to the received voice recognition result, carrying out corresponding box opening action on the intelligent storage cabinet after receiving the voice command, and sending a corresponding voice broadcast prompt to a user.
CN202210519910.5A 2022-05-12 2022-05-12 Intelligent storage cabinet based on face recognition and voice recognition Withdrawn CN114821735A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210519910.5A CN114821735A (en) 2022-05-12 2022-05-12 Intelligent storage cabinet based on face recognition and voice recognition

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210519910.5A CN114821735A (en) 2022-05-12 2022-05-12 Intelligent storage cabinet based on face recognition and voice recognition

Publications (1)

Publication Number Publication Date
CN114821735A true CN114821735A (en) 2022-07-29

Family

ID=82515033

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210519910.5A Withdrawn CN114821735A (en) 2022-05-12 2022-05-12 Intelligent storage cabinet based on face recognition and voice recognition

Country Status (1)

Country Link
CN (1) CN114821735A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117523465A (en) * 2024-01-03 2024-02-06 山东朝辉自动化科技有限责任公司 Automatic identification method for material types of material yard
CN117649933A (en) * 2023-11-28 2024-03-05 广州方舟信息科技有限公司 Online consultation assistance method and device, electronic equipment and storage medium

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117649933A (en) * 2023-11-28 2024-03-05 广州方舟信息科技有限公司 Online consultation assistance method and device, electronic equipment and storage medium
CN117649933B (en) * 2023-11-28 2024-05-28 广州方舟信息科技有限公司 Online consultation assistance method and device, electronic equipment and storage medium
CN117523465A (en) * 2024-01-03 2024-02-06 山东朝辉自动化科技有限责任公司 Automatic identification method for material types of material yard
CN117523465B (en) * 2024-01-03 2024-04-19 山东朝辉自动化科技有限责任公司 Automatic identification method for material types of material yard

Similar Documents

Publication Publication Date Title
CN114821735A (en) Intelligent storage cabinet based on face recognition and voice recognition
CN110084281A (en) Image generating method, the compression method of neural network and relevant apparatus, equipment
CN111639558B (en) Finger vein authentication method based on ArcFace Loss and improved residual error network
CN111612017B (en) Target detection method based on information enhancement
CN109871780B (en) Face quality judgment method and system and face identification method and system
CN109033994B (en) Facial expression recognition method based on convolutional neural network
CN105976809A (en) Voice-and-facial-expression-based identification method and system for dual-modal emotion fusion
CN113705769A (en) Neural network training method and device
CN113343707A (en) Scene text recognition method based on robustness characterization learning
CN113191489B (en) Training method of binary neural network model, image processing method and device
CN111695456A (en) Low-resolution face recognition method based on active discriminability cross-domain alignment
Plichoski et al. A face recognition framework based on a pool of techniques and differential evolution
CN114863189B (en) Intelligent image identification method based on big data
Gangwar et al. Deepirisnet2: Learning deep-iriscodes from scratch for segmentation-robust visible wavelength and near infrared iris recognition
CN109522865A (en) A kind of characteristic weighing fusion face identification method based on deep neural network
CN115601820A (en) Face fake image detection method, device, terminal and storage medium
CN110633691A (en) Binocular in-vivo detection method based on visible light and near-infrared camera
CN111652164B (en) Isolated word sign language recognition method and system based on global-local feature enhancement
CN109658627A (en) A kind of Intelligent logistics pickup system based on block chain
CN112259086A (en) Speech conversion method based on spectrogram synthesis
CN112784836A (en) Text and graphic offset angle prediction and correction method thereof
CN111950461A (en) Finger vein identification method based on deformation detection and correction of convolutional neural network
CN115795394A (en) Biological feature fusion identity recognition method for hierarchical multi-modal and advanced incremental learning
CN106326891A (en) Mobile terminal, target detection method and device of mobile terminal
CN111754459A (en) Dyeing counterfeit image detection method based on statistical depth characteristics and electronic device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WW01 Invention patent application withdrawn after publication
WW01 Invention patent application withdrawn after publication

Application publication date: 20220729