CN113220114B

CN113220114B - Face recognition-fused embeddable non-contact elevator key interaction method

Info

Publication number: CN113220114B
Application number: CN202110086981.6A
Authority: CN
Inventors: 谢巍; 许练濠; 卢永辉; 吴伟林
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2021-01-22
Filing date: 2021-01-22
Publication date: 2023-06-20
Anticipated expiration: 2041-01-22
Also published as: CN113220114A

Abstract

The invention discloses an embeddable non-contact elevator key interaction method integrating face recognition, which comprises the steps of firstly carrying out edge detection on a shooting area in an original image through a Laplace filtering operator to obtain an edge image, and filtering the edge image by utilizing a horizontal direction and a vertical direction linear filtering operator; then, respectively carrying out straight line detection on the images filtered in the horizontal direction and the vertical direction by adopting a Hough straight line detection algorithm so as to position the area of the elevator key panel and solve a homography transformation matrix; and then detecting and positioning the fingers of the elevator user by utilizing an improved YOLOv3 algorithm, obtaining the floor keys pointed by the fingers according to the homography transformation matrix, and simultaneously obtaining the face information of the resident to perform double verification. The elevator key-press identification device can accurately identify the elevator key-press selected by the elevator user, can realize non-contact elevator riding, and ensures the safety of the resident through double verification of floor and resident face information.

Description

Face recognition-fused embeddable non-contact elevator key interaction method

Technical Field

The invention relates to the technical field of computer vision and man-machine interaction, in particular to an embeddable non-contact elevator key interaction method integrating face recognition.

Background

Today, the wide application of elevators in urban high-rise buildings has become an indispensable boarding tool for people living and working at high floors. In general, the elevator button adopts the contact, and people need to contact the elevator button and select the floor that goes to and control elevator door open/close, and all have different people to press the elevator button every day in the elevator, this can make to have multiple bacterium or virus on the elevator button, arouses cross infection easily, very easily increases the transmission probability.

With the development of science and technology, man-machine interaction technology becomes diversified, people are no longer satisfied with simply presenting virtual scenes, and begin to explore an interaction method with a virtual world, so that more and more novel man-machine interaction technologies are generated. Man-machine interaction techniques fall into several categories: a traditional interaction technology taking a keyboard and a mouse as input; interaction technology based on touch screen equipment, such as smart phones and tablet computers; non-contact interaction technology based on machine vision and image processing technology, such as a virtual keyboard, a gesture interaction system and the like.

Hiroki Goto et al studied a camera projection interaction system based on a frame difference method and a hand skin color extraction method: firstly, separating hands from a scene based on clustering features of skin colors of the hands in HSV and YCbCr spaces, and then detecting fingertip positions on the separated foreground images by using a template matching method, so as to realize projection interaction between a user and a computer or a household television. Fitriani et al propose a human-computer interaction system based on a deformable projection surface that projects a virtual scene onto the surface of an easily deformable object, then detects the deformation of the user when touching the projection screen, and analyzes the interaction information through an image processing algorithm and a deformation model of the object.

However, the solutions based on machine vision techniques and image processing algorithms described above all have a common drawback: the diversity of the projected scene cannot be guaranteed. For example, under an interactive system based on hand skin tone, the effect of the hand foreground separation algorithm is greatly compromised when the projected scene is similar to the hand skin tone. For an interactive system based on a deformation surface, although the system can stably run under a projection scene set by the system, if the system is applied to a changeable projection scene, deformation detection of a projection image becomes inaccurate, different schemes are required to be designed for different scenes, and the development cost of the system is high.

Disclosure of Invention

The invention aims to overcome the defects and shortcomings of the prior art, and provides an embeddable non-contact elevator key interaction method integrating face recognition, which can be applied to a changeable environment, can accurately identify elevator keys selected by elevator users, can realize non-contact elevator taking, and ensures the safety of households through double verification of floor and household face information.

It is a second object of the present invention to provide a computing device.

A third object of the invention is to provide an elevator.

The first object of the invention is achieved by the following technical scheme: an embeddable non-contact elevator key interaction method integrating face recognition comprises the following steps:

s1, acquiring an original image shot by a camera in an elevator car, and performing edge detection on a shooting area of the original image through a Laplace filtering operator so as to obtain an edge image;

s2, filtering the edge image by utilizing a linear filtering operator in the horizontal direction and the vertical direction to strengthen the linear edges in the horizontal direction and the vertical direction, and removing noise while keeping the edges of the elevator key panel area;

s3, respectively carrying out straight line detection on the image filtered in the horizontal direction and the image filtered in the vertical direction by adopting a Hough straight line detection algorithm so as to position the area of the elevator key panel;

s4, solving a mapping relation under view angle transformation by utilizing a homography transformation matrix;

s5, detecting and positioning the finger of the elevator user in the original image by using an improved YOLOv3 algorithm, and obtaining a floor key pointed by the finger according to a homography transformation matrix;

s6, acquiring resident face information of the floor pointed by the finger, performing double verification on whether the elevator user is a resident and whether the floor pointed by the finger is the floor occupied by the elevator user, and finally controlling the elevator car to run to the floor only when the floor key is selected under the condition of double verification.

Preferably, the camera is arranged above the elevator key panel and shoots the elevator key panel downwards;

in step S1, the process of edge detection of the camera shooting area by the Laplace filter operator is as follows:

s11, carrying out graying treatment on an original image to obtain a gray image;

s12, detecting the edge of the gray image by adopting a second-order gradient Laplace filter operator based on the principle of non-miss detection of the boundary of the elevator key panel, wherein the Laplace filter operator specifically calculates the edge gradient by utilizing a second-order difference, and the process is as follows:

considering the one-dimensional sequence { f (1), f (2), … f (x-1), f (x), f (x+1) } the second order difference at the x position is expressed as:

f``(x)＝(f(x+1)-f(x))-(f(x)-f(x-1))

further simplified as:

f``(x)＝f(x-1)-2*f(x)+f(x+1)

that is, the second order difference of the one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with the one-dimensional convolution kernel [ +1, -2, +1], generalizing this conclusion to a two-dimensional matrix of gray scale images:

for gray scale image I _gray Two-dimensional kernel K with definition scale of 3*3 _L ：

Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is added into consideration, and the convolution kernel K is obtained _L The substitution is as follows:

the second-order differential information of the gray level image is obtained by convolution of the convolution kernel and the gray level image, namely:

G＝K _L *I _gray

as the convolution kernel scale increases, the more pronounced the detected edge is;

and (3) taking out points with the convolution result of 0, wherein the points are edges, and the edge image is a set of points with obvious gray level change in the gray level image.

Preferably, the procedure of step S2 is as follows:

s21, defining a horizontal linear filter operator K with the size of 1 Xn _horizontal And a vertical linear filter operator K of size n x 1 _vertical ：

Wherein T represents a vector pair transposition and n represents the size of a filtering operator; k (K) _horizontal Sensitivity to horizontal straight edges, K _vertical Sensitive to vertical straight edges;

s22, filtering the Laplace to obtain an edge image I _Laplace Convolving with two operators to obtain a horizontal direction filtering image I _horizontal And filtering image I in the vertical direction _vertical ：

I _horizontal ＝K _horizontal *I _Laplace

I _vertical ＝K _vertical *I _Laplace 。

Preferably, the procedure of step S3 is as follows:

s31, considering that the non-horizontal or vertical linear edges of the edge image are restrained after the edge image is filtered in the horizontal direction and the vertical direction, firstly, dividing the non-horizontal linear edges and the non-vertical linear edges by using a threshold value and removing the non-horizontal linear edges and the non-vertical linear edges;

s32, respectively carrying out straight line detection on the horizontal direction filtering image and the vertical direction filtering image which are subjected to threshold segmentation by using a Hough straight line detection algorithm, and finally obtaining four elevator key panel boundary straight lines;

s33, solving intersection points of the four elevator key panel boundary lines to obtain four vertex coordinates (x _lt ,y _lt ),(x _lb ,y _lb ),(x _rb ,y _rb ),(x _rt ,y _rt )。

Furthermore, the homography transformation reflects the process of mapping from one two-dimensional plane to three-dimensional space and then mapping from the three-dimensional space to another two-dimensional plane, wherein X-Y-Z is taken as a three-dimensional space coordinate system, and can be understood as a world coordinate system, X-Y is taken as a pixel plane space coordinate system, and X '-Y' is taken as an elevator key panel plane coordinate system; homography transforms can be described as: a point (X, Y) on the X-Y coordinate system corresponds to a straight line passing through the origin and the point on the X-Y-Z coordinate system

The straight line intersects the x '-y' coordinate system plane at a point (x ', y'), and then the process from the point (x, y) to the point (x ', y') is called homography;

the process of solving the mapping relation under the view angle transformation by utilizing the homography transformation matrix is as follows:

s41, setting an X '-Y' plane to be perpendicular to a Z axis of an X-Y-Z space coordinate system and intersecting the Z axis at a point (0, 1), namely, setting a point (X ', Y') under the X '-Y' plane coordinate as a point (X ', Y', 1) under the X-Y-Z space coordinate system, and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by utilizing a homography transformation matrix H:

in the formula, h ₁ ～h ₉ 9 transformation parameters of the homography matrix;

and then the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system is obtained as follows:

the H matrix has 9 transformation parameters, but in practice only 8 degrees of freedom, since the X-Y-Z space coordinate system is a homogeneous coordinate system, irrespective of the coordinate transformation of coordinate scaling, when multiplying the H matrix by a scaling factor k:

k represents the same mapping relation with H, so H has only 8 degrees of freedom;

s42, when solving H, one method is to solve H ₉ Set to 1, the equation to be solved is:

another method is to add a constraint to the homography matrix H, making its modulus equal to 1, as follows:

the equation to be solved is:

s43, defining target coordinate points of each of the four vertexes of the elevator key panel under the pixel coordinate system obtained in the step S3 under the scene coordinate system of the elevator key panel:

(x _lt ,y _lt )→(x _lt ′,y _lt ′)

(x _lb ,y _lb )→(x _lb ′,y _lb ′)

(x _rb ,y _rb )→(x _rb ′,y _rb ′)

(x _rt ,y _rt )→(x _rt ′,y _rt ′)

and substituting the target coordinates into the equation to be solved in the step S42 respectively, and solving an H matrix simultaneously.

Preferably, the improved YOLOv3 algorithm includes an improvement of its loss function based on the YOLOv3 object detection algorithm, and an adaptive pruning algorithm is employed to reduce the feature extraction portion of the YOLOv3 network.

Further, the loss function of the YOLOv3 network is designed as follows:

wherein the first term is the coordinate error loss, lambda _coord The coefficient of the coordinate loss function is obtained; s denotes dividing the input image into S×S grids; b represents the number of frames contained in a grid;

indicating whether the jth border of the ith grid contains an object, wherein the containing value is 1, and the non-containing value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r is (r) _ij X, y, w, h representing the jth prediction box of the ith grid; />

X, y, w, h representing the jth real box of the ith network;

the second term and the third term are confidence losses,

indicating whether the jth border of the ith grid does not contain an object, has a value of 1 and contains an object value of 0; lambda (lambda) _noobj To balance the loss weight of an object-and object-free grid, the objective isReducing confidence loss of the grid frame without the object; c (C) _ij Representing the confidence level of the jth frame prediction of the ith grid; />

Representing the true confidence of the jth frame of the ith grid;

the fourth term is category loss, categories represents the number of categories; p is p _ij (c) Representing the prediction probability that the jth frame of the ith grid belongs to the c-th object;

representing the true probability that the jth frame of the ith grid belongs to the c-th object;

the improvement of the loss function is specifically as follows:

(1) FocalLoss is referenced for the third term, i.e., confidence loss, to improve the model's ability to learn difficult samples, where Focaloss improves based on cross entropy as a function:

wherein, y is,

representing predicted and true probability values, respectively, p _ij (c),/>

Alpha is FocalLoss super parameter;

the improved confidence loss function is as follows:

(2) An adaptive scaling factor is added to the first term, the coordinate loss, as follows:

in the method, in the process of the invention,

representing the width and height of the real frame; ρ _box The range of (2) is 1-2, and the smaller the real frame is, the larger the numerical value is;

the improved coordinate loss is as follows:

furthermore, the Yolov3 network adopts the dark net-53 as a feature extraction main body, and performs channel-level pruning on the network by adopting a network pruning algorithm in a structure pruning method aiming at the problem of dark net-53 complexity redundancy so as to reduce the number of feature channels of the network:

first, adding a BN layer behind each convolution layer, when using BN operation in convolution neural network, each input characteristic channel is allocated with separate gamma _ik And beta _ik The output result of the parameters, BN layer, is expressed as:

in the method, in the process of the invention,

output of BN layer; c (C) _ik A kth characteristic channel representing an ith convolutional layer; mu (mu) _ik ,σ _ik Respectively represent channel characteristics C _ik The mean value and the variance of the data are obtained through statistics of historical training data;

γ _ik equivalent to a scaling factor, networkscaling uses the scaling factor as the weight of the feature channel, and sparsifies the scaling factors by Lasso algorithm:

in the loss of _new Loss as the final loss function _old For the improved loss function, layers are the network layer number of the YOLOv3 network, and Channels are the channel number of the YOLOv3 network;

finally, all gamma parameters are arranged in order from big to small, and then the gamma parameters which are ordered after the gamma parameters are deleted in proportion _ik The corresponding characteristic channel and BN channel.

The second object of the invention is achieved by the following technical scheme: the invention relates to a computing device, which comprises a processor and a memory for storing a program executable by the processor, wherein the processor realizes the embedded non-contact elevator key interaction method integrating face recognition according to the first object of the invention when executing the program stored by the memory.

The third object of the present invention is achieved by the following technical scheme: according to the elevator, the identification of floor keys and the running control of the elevator car are realized through the embedded non-contact elevator key interaction method integrating face recognition.

Compared with the prior art, the invention has the following advantages and effects:

(1) The method comprises the steps of firstly positioning the area of an elevator key panel in an image through edge detection, filtering and linear detection operation, solving a homography transformation matrix, then detecting the fingers of an elevator user in the image by using a deep learning technology, and obtaining floor keys selected by the fingers of the elevator user according to the solved homography transformation matrix transformation. The method avoids the interference of environmental factors on target detection, improves the accuracy of identifying the selected floor keys, and can be applied to varied environments, so that the interactive scene is more diversified.

(2) The method can be applied to realizing non-contact elevator keys during epidemic situations, and cross infection caused by multiple times of touching the elevator keys by multiple people is avoided.

(3) The invention recognizes the floor keys selected by the elevator user through the computer vision technology, and adds the face recognition technology to form double verification, thereby ensuring that the people entering and exiting the target floor are residents or are led by the residents, and greatly improving the interactivity of the elevator and the safety of the residents.

(4) The YOLOv3 algorithm has the advantages in speed, and on the basis, the training speed of the YOLOv3 network can be further improved by improving the learning ability of the YOLOv3 network on difficult samples and improving the loss of small objects; by reducing the number of characteristic channels of the YOLOv3 network, the calculation complexity can be further reduced, so that the target detection efficiency is greatly improved, and the real-time detection is facilitated.

(5) In the invention, the extracted edge image is the combined image containing the horizontal edge and the vertical edge, the edge image can be further filtered by using the horizontal linear filtering operator and the vertical filtering operator to be divided into the filtered image only containing the horizontal direction and the filtered image in the vertical direction, and then the linear detection is carried out, so that the redundant detection after the edges of the horizontal channel and the vertical channel are combined can be avoided, and the complexity of the linear detection algorithm is effectively reduced.

Drawings

Fig. 1 is a flow chart of an embeddable non-contact elevator key interaction method incorporating face recognition in accordance with the present invention.

Fig. 2 is a schematic diagram of a cartesian-coordinate hough straight-line detection algorithm.

Fig. 3 is a schematic diagram of a hough line detection algorithm in a polar coordinate system.

Fig. 4 is a schematic diagram of a homography transformation.

Fig. 5 is a pruning schematic of networkslip.

Detailed Description

The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.

The embodiment discloses an embeddable non-contact elevator key interaction method integrating face recognition, which can be applied to an elevator, and the elevator can realize the recognition of floor keys and the operation control of an elevator car through the method. As shown in fig. 1, the method comprises the steps of:

s1, acquiring an original image shot by a camera in an elevator car, wherein the camera is arranged above an elevator key panel and shoots the elevator key panel downwards at a certain angle.

Then, edge detection is carried out on a shooting area of the original image through a Laplace filtering operator, so that an edge image is obtained:

s12, because the edges are the set of points with obvious brightness change in the image, and the gradient can reflect the change speed in value, the edge of the gray level image is detected by adopting a second-order gradient Laplace filter operator based on the principle of not missing the boundary of the elevator key panel, wherein the Laplace filter operator adopts a large-scale convolution kernel, and specifically, the edge gradient is calculated by utilizing a second-order difference, and the process is as follows:

f``(x)＝(f(x+1)-f(x))-(f(x)-f(x-1))

further simplified as:

f``(x)＝f(x-1)-2*f(x)+f(x+1)

that is, the second order difference of the one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with the one-dimensional convolution kernel [ +1, -2, +1], and generalizing this conclusion to a two-dimensional matrix of gray scale images, a one-dimensional sequence can be understood as a single pixel value in either the horizontal or vertical direction:

Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is added to the consideration, and the convolution kernel K _L The substitution is as follows:

the second-order differential information of the gray level image is obtained by convolution of the convolution kernel and the image, namely:

G＝K _L *I _gray

convolution kernel K _L I.e. Laplace filter operator, the more pronounced the detected edges as the convolution kernel scale increases.

And (3) taking out points with the convolution result of 0, wherein the points are edges, and the edge image is a set of points with obvious gray level change in the gray level image. The extracted edge image is the combined image containing the horizontal edge and the vertical edge.

S2, filtering the edge image by using a linear filtering operator in the horizontal direction and the vertical direction.

Since the edge image obtained by the Laplace operator of the large-scale convolution kernel can generate a lot of noise points, the positioning key points of the elevator key panel area are positioned by four straight lines of the boundary, the four straight lines are in a horizontal or vertical state in the image, the linear edges in the horizontal and vertical directions can be enhanced by the straight line filtering operators in the horizontal and vertical directions, and the edges of the elevator key panel area are reserved while noise is removed. The filtering process is as follows:

Wherein T represents a vector pair transposition and n represents the size of a filtering operator; k (K) _horizontal Sensitivity to horizontal straight edges, K _vertical For vertical straight edgeSensitive, the two operators can effectively reject the orphan noise. Generally, the larger n is, the higher the length requirement on the straight line is, and the nonlinear noise part is also removed more favorably; however, when the n value is too large, the sensitivity to the angle of the straight line is also increased, which may result in that the slightly inclined straight line is filtered out, and the boundary of the projection area in the acquired image is generally not strictly horizontal or vertical, so that the n value cannot be set too large, and needs to be set according to the actual situation.

I _horizontal ＝K _horizontal *I _Laplace

I _vertical ＝K _vertical *I _Laplace

S3, respectively carrying out straight line detection on the image filtered in the horizontal direction and the image filtered in the vertical direction by adopting a Hough straight line detection algorithm so as to position the area of the elevator key panel:

and S32, respectively carrying out straight line detection on the horizontal direction filtering image and the vertical direction filtering image which are subjected to threshold segmentation by using a Hough straight line detection algorithm, and finally obtaining four elevator key panel boundary straight lines.

Because the edge image extracted in the step S1 is a combined image containing a horizontal edge and a vertical edge, the edge image can be further filtered by using a horizontal linear filtering operator and a vertical filtering operator in the step S2 to be divided into a filtered image only containing a horizontal direction and a filtered image only containing a vertical direction, and then linear detection is carried out in the step S3, so that redundant detection after the edges of a horizontal channel and a vertical channel are combined can be avoided, and the complexity of a linear detection algorithm is effectively reduced.

The Hough straight line detection algorithm is to map each point on the Cartesian coordinate system to a straight line in the Hough space by utilizing the principle of the duality of the Cartesian coordinate system and the dotted line of the Hough space, and then the straight line passing through a plurality of points in the Cartesian coordinate system corresponds to one intersection point of the straight lines passing through a plurality of points in the Hough space.

Specifically, for a straight line y=kx+b on a cartesian coordinate system, where (x, y) represents a coordinate point in the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line. Transforming the straight line into: b=y-xk, and defines the abscissa in the hough space as k and the ordinate as b, b=y-xk is a straight line with the slope of-x and the intercept of y in the hough space. Points (x) ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _n ,y _n ) The hough space corresponds to a plurality of straight lines, and the common intersection point (k, b) of the straight lines is the slope and intercept of the same straight line in the cartesian coordinate system, and the schematic diagram is shown in fig. 2.

Since the slope of a vertical line in an image cannot be calculated, hough transform is generally performed in the form of polar coordinates. Specifically, a straight line is represented by a polar coordinate equation ρ=xcos θ+ysin θ, where ρ is the polar distance, i.e., the distance from the origin to the straight line in polar coordinate space; θ is the polar angle, i.e., the angle between the x-axis and the line segment passing through the origin and perpendicular to the straight line. Defining the horizontal coordinate as theta and the vertical coordinate as rho in the Hough space, and then defining the coordinates (x ₁ ,y ₁ ),(x ₂ ,y ₂ ),…,(x _n ,y _n ) The hough space corresponds to a plurality of curves, and the common intersection points (θ, ρ) of the curves are the polar angles and the polar distances of the same straight line in the polar coordinate system, and the schematic diagram is shown in fig. 3.

S4, solving a mapping relation under view angle transformation by utilizing a homography transformation matrix:

s41, mapping from one two-dimensional plane to a three-dimensional space, and mapping from the three-dimensional space to another two-dimensional plane, wherein X-Y-Z is taken as a three-dimensional space coordinate system, which can be understood as a world coordinate system, and X-Y is taken as a pixel plane space coordinate system; x '-y' is the elevator key panel plane coordinate system, and the homography transformation can be described as: a point (X, Y) on the X-Y coordinate system corresponds to a straight line passing through the origin and the point on the X-Y-Z coordinate system

The straight line intersects the x '-y' coordinate system plane at a point (x ', y'), and the process from point (x, y) to point (x ', y') is called homography transformation.

Let X '-Y' plane be perpendicular to Z axis of X-Y-Z space coordinate system and intersect with Z axis at point (0, 1), namely point (X ', Y') under X '-Y' plane coordinate is point (X ', Y', 1) under X-Y-Z space coordinate system, describe X-Y plane coordinate system and X-Y-Z space coordinate system mapping relation by homography transformation matrix H:

the equation to be solved is:

(x _lt ,y _lt )→(x _lt ′,y _lt ′)

(x _lb ,y _lb )→(x _lb ′,y _lb ′)

(x _rb ,y _rb )→(x _rb ′,y _rb ′)

(x _rt ,y _rt )→(x _rt ′,y _rt ′)

the target coordinates are substituted into the equation to be solved in step S42, and the four vertex coordinates in the pixel coordinate system are obtained by solving first, so that the H matrix can be solved simultaneously.

S5, detecting and positioning the finger of the elevator user in the original image by using an improved YOLOv3 algorithm, mapping and converting through a homography conversion matrix after obtaining the position coordinate of the finger, obtaining the corresponding position coordinate in the elevator key panel, and further determining which floor key the position coordinate is located on, so that the floor key pointed by the finger can be determined.

The input of the network is an original image acquired by a camera in an elevator car, the output is the position coordinates (x, y, w, h) and the confidence of the fingers of the elevator user in the original image, and the original image with the position coordinates, the confidence (1 or 0) and the classification probability (namely the probability of the fingers) of the elevator user known in the training is taken as training data. The loss function of the network is designed before the network training.

Here, the modified YOLOv3 algorithm includes modifying its loss function based on the YOLOv3 object detection algorithm (i.e., YOLOv3 network), and employing an adaptive pruning algorithm to reduce the feature extraction portion of the YOLOv3 network.

Specifically, for YOLOv3 networks, the loss function is designed as follows:

X, y, w, h representing the jth real frame of the ith grid;

the second term and the third term are confidence losses,

indicating whether the jth border of the ith grid does not contain an object, has a value of 1 and contains an object value of 0; lambda (lambda) _noobj To balance the loss weights of the object-and object-free grids, the goal is to reduce the confidence loss of the grid frame without the object; c (C) _ij Representing the confidence level of the jth frame prediction of the ith grid; />

Representing the true confidence of the jth frame of the ith grid;

the j-th frame of the i-th grid is represented as the true probability of belonging to the c-th object.

The above-mentioned Yolov3 uses a positive and negative sample balance factor lambda _noobj To reduce the confidence loss caused by most grids which are not responsible for predicting targets, the imbalance of positive and negative samples (positive samples refer to targets to be detected by a network, and negative samples refer to the background except the targets) can be reduced to a certain extent, but the training problem of difficult samples is not solved. Thus, the third term of the loss function, i.e., confidence loss, refers to FocalLoss, to improve the model's ability to learn difficult samples.

Wherein, focalloss is improved based on cross entropy, and the function form is as follows:

wherein, y is,

representing predicted and true probability values, respectively, p _ij (c),/>

Alpha is FocalLoss super parameter, generally in [0,5]And takes a value.

Focalloss has been provided with (1-y) for positive and negative samples, respectively ^α Y ^α Two weights, for example negative samples, when y is close to 0 when it is easy to learn, then weight y ^α The number of (2) is also small; when the sample is difficult to learn, y is close to 0.5, and the weight y is weight y ^α The number of (2) is relatively large. This makes the difficult-to-classify samples more heavily weighted than the easy-to-classify samples, thus improving the ability of the model to learn the difficult-to-classify samples.

The improved confidence loss function is as follows:

in addition, in the elevator application scene, the finger of the elevator user occupies a very small area in the image, namely, the frame of the small object in the data set occupies a very large proportion, so the training speed of the network can be accelerated by improving the loss of the small object, and the embodiment also adds an adaptive scaling factor for the first item, namely, the coordinate loss, and the scaling factor is as follows:

in the method, in the process of the invention,

representing the width and height of the real frame; ρ _box The range of (2) is 1-2, and the smaller the real frame is, the larger the numerical value is, so that the loss specific gravity of the small object can be improved.

The improved coordinate loss is as follows:

in the convolutional neural network structure, one convolutional channel represents a certain characteristic of an image, and a model predicts by integrating characteristic information of all channels, so that the more complex the structure is, the more characteristics can be extracted by the network. The YOLOv3 network adopts a dark-53 as a feature extraction main body, the structure is provided with 53 convolution layers, the number of channels of each convolution layer sampled is doubled, the total number of channels reaches 17856, the target required to be detected by the elevator is a finger, the analysis is performed from an intuitive point of view, the structure of the dark-53 is provided with enough complexity to extract arrow features and a great amount of redundancy exists, and therefore the network structure or the size needs to be reduced.

Current pruning techniques for convolutional neural networks can be divided into the following categories: a method based on weight quantization (Weight Quantization), such as HashNet, groups weight variables through hash, and the variables in the same group share the same weight value, so that the method effectively reduces the parameter size of a model, but cannot improve the forward calculation speed of a network; based on the weight sparsification method, the method carries out sparse training on weight variables in a network, and then a large number of weight variables close to 0 in the network can be deleted, but the method can accelerate the forward calculation process only under special hardware; the method based on structure pruning reduces the structure of the network in a self-adaptive way through training data, so that the size of model parameters can be effectively reduced and the running speed can be improved.

Therefore, the embodiment aims at the problem of complexity redundancy of the dark net-53, and the network pruning algorithm based on the structure pruning method is adopted to prune the network at the channel level so as to reduce the number of characteristic channels of the network.

In order to perform channel-level pruning on a network by using the Lasso algorithm, the network pruning method is as follows:

first, adding a BN layer to each convolution layer, when using BN operation in convolution neural network, the BN layer willAssigning individual gamma to each input characteristic channel _ik And beta _ik The output result of the parameters, BN layer, is expressed as:

in the method, in the process of the invention,

output of BN layer; c (C) _ik A kth characteristic channel representing an ith convolutional layer; mu (mu) _ik ,σ _ik Respectively represent channel characteristics C _ik The mean value and the variance of the data are obtained through statistics of historical training data; />

in the loss of _new Loss as the final loss function _old As the improved loss function, the Layers are the network Layers of the YOLOv3 network, and the Channels are the channel numbers of the YOLOv3 network;

finally, all gamma parameters are arranged in order from big to small, and then the gamma ordered at the back (with smaller value) is deleted in proportion _ik The corresponding characteristic channel and BN channel. A schematic diagram of pruning of networkslim is shown in fig. 4.

S6, acquiring the face information of the resident on the floor pointed by the finger, wherein the face information of the resident can be registered in the elevator background system in advance;

then, whether the elevator user is a resident and whether the floor pointed by the finger is the floor occupied by the elevator user is subjected to double verification, the floor key is selected under the condition of double verification, and finally the elevator car is controlled to run to the floor, so that the elevator user is the resident of the floor, and the interactivity of the elevator and the safety among the resident are greatly improved.

The techniques described herein may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. For a hardware implementation, the processing modules may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), processors, controllers, microcontrollers, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.

For a firmware and/or software implementation, the techniques may be implemented with modules (e.g., procedures, steps, flow, and so on) that perform the functions described herein. The firmware and/or software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.

Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks. For example, the hardware is a computing device including a processor and a memory for storing a program executable by the processor, where the processor implements the embeddable contactless elevator key interaction method described above when executing the program stored by the memory.

The embodiments described above are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the embodiments described above, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principles of the present invention should be made in the equivalent manner, and are included in the scope of the present invention.

Claims

1. An embeddable non-contact elevator key interaction method integrating face recognition is characterized by comprising the following steps of:

the improved YOLOv3 algorithm comprises the steps of improving the loss function of the YOLOv3 target detection algorithm based on the YOLOv3 target detection algorithm, and adopting an adaptive pruning algorithm to reduce the feature extraction part of the YOLOv3 network;

the loss function of the YOLOv3 network is designed as follows:

indicating whether the jth border of the ith grid contains an object, wherein the containing value is 1, and the non-containing value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r is (r) _ij The jth prediction frame representing the ith gridx，y，w，h；/>

X, y, w, h representing the jth real box of the ith network;

the second term and the third term are confidence losses,

Representing the true confidence of the jth frame of the ith grid;

the improvement of the loss function is specifically as follows:

in the method, in the process of the invention,

respectively represent prediction and realityProbability values of (i.e.)>

Alpha is FocalLoss super parameter;

the improved confidence loss function is as follows:

in the method, in the process of the invention,

the improved coordinate loss is as follows:

2. The embedded non-contact elevator key interaction method integrating face recognition according to claim 1, wherein the camera is installed above the elevator key panel and shoots the elevator key panel downwards;

considering the one-dimensional sequence { f (1), f (2),. F (x-1), f (x), f (x+1) } the second order difference at the x position is expressed as:

f``(x)＝(f(x+1)-f(x))-(f(x)-f(x-1))

further simplified as:

f``(x)＝f(x-1)-2*f(x)+f(x+1)

G＝K _L *I _gray

3. The method for interacting embedded non-contact elevator keys with human face recognition according to claim 1, wherein the process of step S2 is as follows:

I _horizontal ＝K _horizontal *I _Laplace

I _vertical ＝K _vertical *I _Laplace 。

4. The embedded non-contact elevator key interaction method integrating face recognition according to claim 1, wherein the process of step S3 is as follows:

s33, solving intersection points of the four elevator key panel boundary lines to obtain four vertex coordinates (x _lt ，y _lt )，(x _lb ，y _lb )，(x _rb ，y _rb )，(x _rt ，y _rt )。

5. The embedded non-contact elevator key interaction method based on the face recognition fusion of claim 4, wherein the homography transformation reflects the process of mapping from one two-dimensional plane to three-dimensional space and then mapping from the three-dimensional space to another two-dimensional plane, wherein X-Y-Z is taken as a three-dimensional space coordinate system, the world coordinate system can be understood, X-Y is a pixel plane space coordinate system, and X '-Y' is an elevator key panel plane coordinate system; homography transforms can be described as: a point (X, Y) on the X-Y coordinate system corresponds to a straight line l passing through the origin and the point on the X-Y-Z coordinate system:

the straight line intersects the x '-y' coordinate system plane at a point (x ', y'), then the process from point (x, y) to point (x ', y') is called homography;

the equation to be solved is:

(x _lt ，y _lt )→(x _lt ′，y _lt ′)

(x _lb ，y _lb )→(x _lb ′，y _lb ′)

(x _rb ，y _rb )→(x _rb ′，y _rb ′)

(x _rt ，y _rt )→(x _rt ′，y _rt ′)

6. The embedded non-contact elevator key interaction method based on the fusion face recognition of claim 1, wherein the YOLOv3 network adopts a dark net-53 as a feature extraction main body, and performs channel-level pruning on the network by adopting a networkslim algorithm in a structure-based pruning method aiming at the problem of dark net-53 complexity redundancy so as to reduce the number of feature channels of the network:

in the method, in the process of the invention,

output of BN layer; c (C) _ik A kth characteristic channel representing an ith convolutional layer; mu (mu) _ik ，σ _ik Respectively represent channel characteristics C _ik Mean and variance of (a)Statistical analysis is carried out on historical training data;

7. A computing device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the human face recognition fusion embeddable non-contact elevator key interaction method of any one of claims 1 to 6.

8. An elevator is characterized in that the elevator realizes the identification of floor keys and the operation control of a car by the embedded non-contact elevator key interaction method integrating face recognition according to any one of claims 1 to 6.