CN113220114B - Face recognition-fused embeddable non-contact elevator key interaction method - Google Patents

Face recognition-fused embeddable non-contact elevator key interaction method Download PDF

Info

Publication number
CN113220114B
CN113220114B CN202110086981.6A CN202110086981A CN113220114B CN 113220114 B CN113220114 B CN 113220114B CN 202110086981 A CN202110086981 A CN 202110086981A CN 113220114 B CN113220114 B CN 113220114B
Authority
CN
China
Prior art keywords
elevator
image
coordinate system
elevator key
horizontal
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110086981.6A
Other languages
Chinese (zh)
Other versions
CN113220114A (en
Inventor
谢巍
许练濠
卢永辉
吴伟林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
South China University of Technology SCUT
Original Assignee
South China University of Technology SCUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by South China University of Technology SCUT filed Critical South China University of Technology SCUT
Priority to CN202110086981.6A priority Critical patent/CN113220114B/en
Publication of CN113220114A publication Critical patent/CN113220114A/en
Application granted granted Critical
Publication of CN113220114B publication Critical patent/CN113220114B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/12Edge-based segmentation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T7/00Image analysis
    • G06T7/10Segmentation; Edge detection
    • G06T7/13Edge detection
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/40Extraction of image or video features
    • G06V10/44Local feature extraction by analysis of parts of the pattern, e.g. by detecting edges, contours, loops, corners, strokes or intersections; Connectivity analysis, e.g. of connected components
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V10/00Arrangements for image or video recognition or understanding
    • G06V10/70Arrangements for image or video recognition or understanding using pattern recognition or machine learning
    • G06V10/74Image or video pattern matching; Proximity measures in feature spaces
    • G06V10/75Organisation of the matching processes, e.g. simultaneous or sequential comparisons of image or video features; Coarse-fine approaches, e.g. multi-scale approaches; using context analysis; Selection of dictionaries
    • G06V10/753Transform-based matching, e.g. Hough transform
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/10Human or animal bodies, e.g. vehicle occupants or pedestrians; Body parts, e.g. hands
    • G06V40/16Human faces, e.g. facial parts, sketches or expressions
    • G06V40/161Detection; Localisation; Normalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V40/00Recognition of biometric, human-related or animal-related patterns in image or video data
    • G06V40/20Movements or behaviour, e.g. gesture recognition
    • G06V40/28Recognition of hand or arm movements, e.g. recognition of deaf sign language
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06TIMAGE DATA PROCESSING OR GENERATION, IN GENERAL
    • G06T2207/00Indexing scheme for image analysis or image enhancement
    • G06T2207/20Special algorithmic details
    • G06T2207/20081Training; Learning
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02BCLIMATE CHANGE MITIGATION TECHNOLOGIES RELATED TO BUILDINGS, e.g. HOUSING, HOUSE APPLIANCES OR RELATED END-USER APPLICATIONS
    • Y02B50/00Energy efficient technologies in elevators, escalators and moving walkways, e.g. energy saving or recuperation technologies

Abstract

The invention discloses an embeddable non-contact elevator key interaction method integrating face recognition, which comprises the steps of firstly carrying out edge detection on a shooting area in an original image through a Laplace filtering operator to obtain an edge image, and filtering the edge image by utilizing a horizontal direction and a vertical direction linear filtering operator; then, respectively carrying out straight line detection on the images filtered in the horizontal direction and the vertical direction by adopting a Hough straight line detection algorithm so as to position the area of the elevator key panel and solve a homography transformation matrix; and then detecting and positioning the fingers of the elevator user by utilizing an improved YOLOv3 algorithm, obtaining the floor keys pointed by the fingers according to the homography transformation matrix, and simultaneously obtaining the face information of the resident to perform double verification. The elevator key-press identification device can accurately identify the elevator key-press selected by the elevator user, can realize non-contact elevator riding, and ensures the safety of the resident through double verification of floor and resident face information.

Description

Face recognition-fused embeddable non-contact elevator key interaction method
Technical Field
The invention relates to the technical field of computer vision and man-machine interaction, in particular to an embeddable non-contact elevator key interaction method integrating face recognition.
Background
Today, the wide application of elevators in urban high-rise buildings has become an indispensable boarding tool for people living and working at high floors. In general, the elevator button adopts the contact, and people need to contact the elevator button and select the floor that goes to and control elevator door open/close, and all have different people to press the elevator button every day in the elevator, this can make to have multiple bacterium or virus on the elevator button, arouses cross infection easily, very easily increases the transmission probability.
With the development of science and technology, man-machine interaction technology becomes diversified, people are no longer satisfied with simply presenting virtual scenes, and begin to explore an interaction method with a virtual world, so that more and more novel man-machine interaction technologies are generated. Man-machine interaction techniques fall into several categories: a traditional interaction technology taking a keyboard and a mouse as input; interaction technology based on touch screen equipment, such as smart phones and tablet computers; non-contact interaction technology based on machine vision and image processing technology, such as a virtual keyboard, a gesture interaction system and the like.
Hiroki Goto et al studied a camera projection interaction system based on a frame difference method and a hand skin color extraction method: firstly, separating hands from a scene based on clustering features of skin colors of the hands in HSV and YCbCr spaces, and then detecting fingertip positions on the separated foreground images by using a template matching method, so as to realize projection interaction between a user and a computer or a household television. Fitriani et al propose a human-computer interaction system based on a deformable projection surface that projects a virtual scene onto the surface of an easily deformable object, then detects the deformation of the user when touching the projection screen, and analyzes the interaction information through an image processing algorithm and a deformation model of the object.
However, the solutions based on machine vision techniques and image processing algorithms described above all have a common drawback: the diversity of the projected scene cannot be guaranteed. For example, under an interactive system based on hand skin tone, the effect of the hand foreground separation algorithm is greatly compromised when the projected scene is similar to the hand skin tone. For an interactive system based on a deformation surface, although the system can stably run under a projection scene set by the system, if the system is applied to a changeable projection scene, deformation detection of a projection image becomes inaccurate, different schemes are required to be designed for different scenes, and the development cost of the system is high.
Disclosure of Invention
The invention aims to overcome the defects and shortcomings of the prior art, and provides an embeddable non-contact elevator key interaction method integrating face recognition, which can be applied to a changeable environment, can accurately identify elevator keys selected by elevator users, can realize non-contact elevator taking, and ensures the safety of households through double verification of floor and household face information.
It is a second object of the present invention to provide a computing device.
A third object of the invention is to provide an elevator.
The first object of the invention is achieved by the following technical scheme: an embeddable non-contact elevator key interaction method integrating face recognition comprises the following steps:
s1, acquiring an original image shot by a camera in an elevator car, and performing edge detection on a shooting area of the original image through a Laplace filtering operator so as to obtain an edge image;
s2, filtering the edge image by utilizing a linear filtering operator in the horizontal direction and the vertical direction to strengthen the linear edges in the horizontal direction and the vertical direction, and removing noise while keeping the edges of the elevator key panel area;
s3, respectively carrying out straight line detection on the image filtered in the horizontal direction and the image filtered in the vertical direction by adopting a Hough straight line detection algorithm so as to position the area of the elevator key panel;
s4, solving a mapping relation under view angle transformation by utilizing a homography transformation matrix;
s5, detecting and positioning the finger of the elevator user in the original image by using an improved YOLOv3 algorithm, and obtaining a floor key pointed by the finger according to a homography transformation matrix;
s6, acquiring resident face information of the floor pointed by the finger, performing double verification on whether the elevator user is a resident and whether the floor pointed by the finger is the floor occupied by the elevator user, and finally controlling the elevator car to run to the floor only when the floor key is selected under the condition of double verification.
Preferably, the camera is arranged above the elevator key panel and shoots the elevator key panel downwards;
in step S1, the process of edge detection of the camera shooting area by the Laplace filter operator is as follows:
s11, carrying out graying treatment on an original image to obtain a gray image;
s12, detecting the edge of the gray image by adopting a second-order gradient Laplace filter operator based on the principle of non-miss detection of the boundary of the elevator key panel, wherein the Laplace filter operator specifically calculates the edge gradient by utilizing a second-order difference, and the process is as follows:
considering the one-dimensional sequence { f (1), f (2), … f (x-1), f (x), f (x+1) } the second order difference at the x position is expressed as:
f``(x)=(f(x+1)-f(x))-(f(x)-f(x-1))
further simplified as:
f``(x)=f(x-1)-2*f(x)+f(x+1)
that is, the second order difference of the one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with the one-dimensional convolution kernel [ +1, -2, +1], generalizing this conclusion to a two-dimensional matrix of gray scale images:
for gray scale image I gray Two-dimensional kernel K with definition scale of 3*3 L
Figure GDA0003112064630000031
Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is added into consideration, and the convolution kernel K is obtained L The substitution is as follows:
Figure GDA0003112064630000032
the second-order differential information of the gray level image is obtained by convolution of the convolution kernel and the gray level image, namely:
G=K L *I gray
as the convolution kernel scale increases, the more pronounced the detected edge is;
and (3) taking out points with the convolution result of 0, wherein the points are edges, and the edge image is a set of points with obvious gray level change in the gray level image.
Preferably, the procedure of step S2 is as follows:
s21, defining a horizontal linear filter operator K with the size of 1 Xn horizontal And a vertical linear filter operator K of size n x 1 vertical
Figure GDA0003112064630000033
Figure GDA0003112064630000034
Wherein T represents a vector pair transposition and n represents the size of a filtering operator; k (K) horizontal Sensitivity to horizontal straight edges, K vertical Sensitive to vertical straight edges;
s22, filtering the Laplace to obtain an edge image I Laplace Convolving with two operators to obtain a horizontal direction filtering image I horizontal And filtering image I in the vertical direction vertical
I horizontal =K horizontal *I Laplace
I vertical =K vertical *I Laplace
Preferably, the procedure of step S3 is as follows:
s31, considering that the non-horizontal or vertical linear edges of the edge image are restrained after the edge image is filtered in the horizontal direction and the vertical direction, firstly, dividing the non-horizontal linear edges and the non-vertical linear edges by using a threshold value and removing the non-horizontal linear edges and the non-vertical linear edges;
s32, respectively carrying out straight line detection on the horizontal direction filtering image and the vertical direction filtering image which are subjected to threshold segmentation by using a Hough straight line detection algorithm, and finally obtaining four elevator key panel boundary straight lines;
s33, solving intersection points of the four elevator key panel boundary lines to obtain four vertex coordinates (x lt ,y lt ),(x lb ,y lb ),(x rb ,y rb ),(x rt ,y rt )。
Furthermore, the homography transformation reflects the process of mapping from one two-dimensional plane to three-dimensional space and then mapping from the three-dimensional space to another two-dimensional plane, wherein X-Y-Z is taken as a three-dimensional space coordinate system, and can be understood as a world coordinate system, X-Y is taken as a pixel plane space coordinate system, and X '-Y' is taken as an elevator key panel plane coordinate system; homography transforms can be described as: a point (X, Y) on the X-Y coordinate system corresponds to a straight line passing through the origin and the point on the X-Y-Z coordinate system
Figure GDA0003112064630000041
The straight line intersects the x '-y' coordinate system plane at a point (x ', y'), and then the process from the point (x, y) to the point (x ', y') is called homography;
the process of solving the mapping relation under the view angle transformation by utilizing the homography transformation matrix is as follows:
s41, setting an X '-Y' plane to be perpendicular to a Z axis of an X-Y-Z space coordinate system and intersecting the Z axis at a point (0, 1), namely, setting a point (X ', Y') under the X '-Y' plane coordinate as a point (X ', Y', 1) under the X-Y-Z space coordinate system, and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by utilizing a homography transformation matrix H:
Figure GDA0003112064630000042
Figure GDA0003112064630000051
in the formula, h 1 ~h 9 9 transformation parameters of the homography matrix;
and then the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system is obtained as follows:
Figure GDA0003112064630000052
the H matrix has 9 transformation parameters, but in practice only 8 degrees of freedom, since the X-Y-Z space coordinate system is a homogeneous coordinate system, irrespective of the coordinate transformation of coordinate scaling, when multiplying the H matrix by a scaling factor k:
Figure GDA0003112064630000053
k represents the same mapping relation with H, so H has only 8 degrees of freedom;
s42, when solving H, one method is to solve H 9 Set to 1, the equation to be solved is:
Figure GDA0003112064630000054
another method is to add a constraint to the homography matrix H, making its modulus equal to 1, as follows:
Figure GDA0003112064630000055
the equation to be solved is:
Figure GDA0003112064630000056
s43, defining target coordinate points of each of the four vertexes of the elevator key panel under the pixel coordinate system obtained in the step S3 under the scene coordinate system of the elevator key panel:
(x lt ,y lt )→(x lt ′,y lt ′)
(x lb ,y lb )→(x lb ′,y lb ′)
(x rb ,y rb )→(x rb ′,y rb ′)
(x rt ,y rt )→(x rt ′,y rt ′)
and substituting the target coordinates into the equation to be solved in the step S42 respectively, and solving an H matrix simultaneously.
Preferably, the improved YOLOv3 algorithm includes an improvement of its loss function based on the YOLOv3 object detection algorithm, and an adaptive pruning algorithm is employed to reduce the feature extraction portion of the YOLOv3 network.
Further, the loss function of the YOLOv3 network is designed as follows:
Figure GDA0003112064630000061
wherein the first term is the coordinate error loss, lambda coord The coefficient of the coordinate loss function is obtained; s denotes dividing the input image into S×S grids; b represents the number of frames contained in a grid;
Figure GDA0003112064630000062
indicating whether the jth border of the ith grid contains an object, wherein the containing value is 1, and the non-containing value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r is (r) ij X, y, w, h representing the jth prediction box of the ith grid; />
Figure GDA0003112064630000063
X, y, w, h representing the jth real box of the ith network;
the second term and the third term are confidence losses,
Figure GDA0003112064630000064
indicating whether the jth border of the ith grid does not contain an object, has a value of 1 and contains an object value of 0; lambda (lambda) noobj To balance the loss weight of an object-and object-free grid, the objective isReducing confidence loss of the grid frame without the object; c (C) ij Representing the confidence level of the jth frame prediction of the ith grid; />
Figure GDA0003112064630000065
Representing the true confidence of the jth frame of the ith grid;
the fourth term is category loss, categories represents the number of categories; p is p ij (c) Representing the prediction probability that the jth frame of the ith grid belongs to the c-th object;
Figure GDA0003112064630000066
representing the true probability that the jth frame of the ith grid belongs to the c-th object;
the improvement of the loss function is specifically as follows:
(1) FocalLoss is referenced for the third term, i.e., confidence loss, to improve the model's ability to learn difficult samples, where Focaloss improves based on cross entropy as a function:
Figure GDA0003112064630000071
wherein, y is,
Figure GDA0003112064630000072
representing predicted and true probability values, respectively, p ij (c),/>
Figure GDA0003112064630000073
Alpha is FocalLoss super parameter;
the improved confidence loss function is as follows:
Figure GDA0003112064630000074
(2) An adaptive scaling factor is added to the first term, the coordinate loss, as follows:
Figure GDA0003112064630000075
in the method, in the process of the invention,
Figure GDA0003112064630000076
representing the width and height of the real frame; ρ box The range of (2) is 1-2, and the smaller the real frame is, the larger the numerical value is;
the improved coordinate loss is as follows:
Figure GDA0003112064630000077
furthermore, the Yolov3 network adopts the dark net-53 as a feature extraction main body, and performs channel-level pruning on the network by adopting a network pruning algorithm in a structure pruning method aiming at the problem of dark net-53 complexity redundancy so as to reduce the number of feature channels of the network:
first, adding a BN layer behind each convolution layer, when using BN operation in convolution neural network, each input characteristic channel is allocated with separate gamma ik And beta ik The output result of the parameters, BN layer, is expressed as:
Figure GDA0003112064630000078
in the method, in the process of the invention,
Figure GDA0003112064630000079
output of BN layer; c (C) ik A kth characteristic channel representing an ith convolutional layer; mu (mu) ikik Respectively represent channel characteristics C ik The mean value and the variance of the data are obtained through statistics of historical training data;
γ ik equivalent to a scaling factor, networkscaling uses the scaling factor as the weight of the feature channel, and sparsifies the scaling factors by Lasso algorithm:
Figure GDA00031120646300000710
in the loss of new Loss as the final loss function old For the improved loss function, layers are the network layer number of the YOLOv3 network, and Channels are the channel number of the YOLOv3 network;
finally, all gamma parameters are arranged in order from big to small, and then the gamma parameters which are ordered after the gamma parameters are deleted in proportion ik The corresponding characteristic channel and BN channel.
The second object of the invention is achieved by the following technical scheme: the invention relates to a computing device, which comprises a processor and a memory for storing a program executable by the processor, wherein the processor realizes the embedded non-contact elevator key interaction method integrating face recognition according to the first object of the invention when executing the program stored by the memory.
The third object of the present invention is achieved by the following technical scheme: according to the elevator, the identification of floor keys and the running control of the elevator car are realized through the embedded non-contact elevator key interaction method integrating face recognition.
Compared with the prior art, the invention has the following advantages and effects:
(1) The method comprises the steps of firstly positioning the area of an elevator key panel in an image through edge detection, filtering and linear detection operation, solving a homography transformation matrix, then detecting the fingers of an elevator user in the image by using a deep learning technology, and obtaining floor keys selected by the fingers of the elevator user according to the solved homography transformation matrix transformation. The method avoids the interference of environmental factors on target detection, improves the accuracy of identifying the selected floor keys, and can be applied to varied environments, so that the interactive scene is more diversified.
(2) The method can be applied to realizing non-contact elevator keys during epidemic situations, and cross infection caused by multiple times of touching the elevator keys by multiple people is avoided.
(3) The invention recognizes the floor keys selected by the elevator user through the computer vision technology, and adds the face recognition technology to form double verification, thereby ensuring that the people entering and exiting the target floor are residents or are led by the residents, and greatly improving the interactivity of the elevator and the safety of the residents.
(4) The YOLOv3 algorithm has the advantages in speed, and on the basis, the training speed of the YOLOv3 network can be further improved by improving the learning ability of the YOLOv3 network on difficult samples and improving the loss of small objects; by reducing the number of characteristic channels of the YOLOv3 network, the calculation complexity can be further reduced, so that the target detection efficiency is greatly improved, and the real-time detection is facilitated.
(5) In the invention, the extracted edge image is the combined image containing the horizontal edge and the vertical edge, the edge image can be further filtered by using the horizontal linear filtering operator and the vertical filtering operator to be divided into the filtered image only containing the horizontal direction and the filtered image in the vertical direction, and then the linear detection is carried out, so that the redundant detection after the edges of the horizontal channel and the vertical channel are combined can be avoided, and the complexity of the linear detection algorithm is effectively reduced.
Drawings
Fig. 1 is a flow chart of an embeddable non-contact elevator key interaction method incorporating face recognition in accordance with the present invention.
Fig. 2 is a schematic diagram of a cartesian-coordinate hough straight-line detection algorithm.
Fig. 3 is a schematic diagram of a hough line detection algorithm in a polar coordinate system.
Fig. 4 is a schematic diagram of a homography transformation.
Fig. 5 is a pruning schematic of networkslip.
Detailed Description
The present invention will be described in further detail with reference to examples and drawings, but embodiments of the present invention are not limited thereto.
The embodiment discloses an embeddable non-contact elevator key interaction method integrating face recognition, which can be applied to an elevator, and the elevator can realize the recognition of floor keys and the operation control of an elevator car through the method. As shown in fig. 1, the method comprises the steps of:
s1, acquiring an original image shot by a camera in an elevator car, wherein the camera is arranged above an elevator key panel and shoots the elevator key panel downwards at a certain angle.
Then, edge detection is carried out on a shooting area of the original image through a Laplace filtering operator, so that an edge image is obtained:
s11, carrying out graying treatment on an original image to obtain a gray image;
s12, because the edges are the set of points with obvious brightness change in the image, and the gradient can reflect the change speed in value, the edge of the gray level image is detected by adopting a second-order gradient Laplace filter operator based on the principle of not missing the boundary of the elevator key panel, wherein the Laplace filter operator adopts a large-scale convolution kernel, and specifically, the edge gradient is calculated by utilizing a second-order difference, and the process is as follows:
considering the one-dimensional sequence { f (1), f (2), … f (x-1), f (x), f (x+1) } the second order difference at the x position is expressed as:
f``(x)=(f(x+1)-f(x))-(f(x)-f(x-1))
further simplified as:
f``(x)=f(x-1)-2*f(x)+f(x+1)
that is, the second order difference of the one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with the one-dimensional convolution kernel [ +1, -2, +1], and generalizing this conclusion to a two-dimensional matrix of gray scale images, a one-dimensional sequence can be understood as a single pixel value in either the horizontal or vertical direction:
for gray scale image I gray Two-dimensional kernel K with definition scale of 3*3 L
Figure GDA0003112064630000101
Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is added to the consideration, and the convolution kernel K L The substitution is as follows:
Figure GDA0003112064630000102
the second-order differential information of the gray level image is obtained by convolution of the convolution kernel and the image, namely:
G=K L *I gray
convolution kernel K L I.e. Laplace filter operator, the more pronounced the detected edges as the convolution kernel scale increases.
And (3) taking out points with the convolution result of 0, wherein the points are edges, and the edge image is a set of points with obvious gray level change in the gray level image. The extracted edge image is the combined image containing the horizontal edge and the vertical edge.
S2, filtering the edge image by using a linear filtering operator in the horizontal direction and the vertical direction.
Since the edge image obtained by the Laplace operator of the large-scale convolution kernel can generate a lot of noise points, the positioning key points of the elevator key panel area are positioned by four straight lines of the boundary, the four straight lines are in a horizontal or vertical state in the image, the linear edges in the horizontal and vertical directions can be enhanced by the straight line filtering operators in the horizontal and vertical directions, and the edges of the elevator key panel area are reserved while noise is removed. The filtering process is as follows:
s21, defining a horizontal linear filter operator K with the size of 1 Xn horizontal And a vertical linear filter operator K of size n x 1 vertical
Figure GDA0003112064630000103
Figure GDA0003112064630000104
Wherein T represents a vector pair transposition and n represents the size of a filtering operator; k (K) horizontal Sensitivity to horizontal straight edges, K vertical For vertical straight edgeSensitive, the two operators can effectively reject the orphan noise. Generally, the larger n is, the higher the length requirement on the straight line is, and the nonlinear noise part is also removed more favorably; however, when the n value is too large, the sensitivity to the angle of the straight line is also increased, which may result in that the slightly inclined straight line is filtered out, and the boundary of the projection area in the acquired image is generally not strictly horizontal or vertical, so that the n value cannot be set too large, and needs to be set according to the actual situation.
S22, filtering the Laplace to obtain an edge image I Laplace Convolving with two operators to obtain a horizontal direction filtering image I horizontal And filtering image I in the vertical direction vertical
I horizontal =K horizontal *I Laplace
I vertical =K vertical *I Laplace
S3, respectively carrying out straight line detection on the image filtered in the horizontal direction and the image filtered in the vertical direction by adopting a Hough straight line detection algorithm so as to position the area of the elevator key panel:
s31, considering that the non-horizontal or vertical linear edges of the edge image are restrained after the edge image is filtered in the horizontal direction and the vertical direction, firstly, dividing the non-horizontal linear edges and the non-vertical linear edges by using a threshold value and removing the non-horizontal linear edges and the non-vertical linear edges;
and S32, respectively carrying out straight line detection on the horizontal direction filtering image and the vertical direction filtering image which are subjected to threshold segmentation by using a Hough straight line detection algorithm, and finally obtaining four elevator key panel boundary straight lines.
Because the edge image extracted in the step S1 is a combined image containing a horizontal edge and a vertical edge, the edge image can be further filtered by using a horizontal linear filtering operator and a vertical filtering operator in the step S2 to be divided into a filtered image only containing a horizontal direction and a filtered image only containing a vertical direction, and then linear detection is carried out in the step S3, so that redundant detection after the edges of a horizontal channel and a vertical channel are combined can be avoided, and the complexity of a linear detection algorithm is effectively reduced.
The Hough straight line detection algorithm is to map each point on the Cartesian coordinate system to a straight line in the Hough space by utilizing the principle of the duality of the Cartesian coordinate system and the dotted line of the Hough space, and then the straight line passing through a plurality of points in the Cartesian coordinate system corresponds to one intersection point of the straight lines passing through a plurality of points in the Hough space.
Specifically, for a straight line y=kx+b on a cartesian coordinate system, where (x, y) represents a coordinate point in the coordinate system, k represents a slope of the straight line, and b represents an intercept of the straight line. Transforming the straight line into: b=y-xk, and defines the abscissa in the hough space as k and the ordinate as b, b=y-xk is a straight line with the slope of-x and the intercept of y in the hough space. Points (x) 1 ,y 1 ),(x 2 ,y 2 ),…,(x n ,y n ) The hough space corresponds to a plurality of straight lines, and the common intersection point (k, b) of the straight lines is the slope and intercept of the same straight line in the cartesian coordinate system, and the schematic diagram is shown in fig. 2.
Since the slope of a vertical line in an image cannot be calculated, hough transform is generally performed in the form of polar coordinates. Specifically, a straight line is represented by a polar coordinate equation ρ=xcos θ+ysin θ, where ρ is the polar distance, i.e., the distance from the origin to the straight line in polar coordinate space; θ is the polar angle, i.e., the angle between the x-axis and the line segment passing through the origin and perpendicular to the straight line. Defining the horizontal coordinate as theta and the vertical coordinate as rho in the Hough space, and then defining the coordinates (x 1 ,y 1 ),(x 2 ,y 2 ),…,(x n ,y n ) The hough space corresponds to a plurality of curves, and the common intersection points (θ, ρ) of the curves are the polar angles and the polar distances of the same straight line in the polar coordinate system, and the schematic diagram is shown in fig. 3.
S33, solving intersection points of the four elevator key panel boundary lines to obtain four vertex coordinates (x lt ,y lt ),(x lb ,y lb ),(x rb ,y rb ),(x rt ,y rt )。
S4, solving a mapping relation under view angle transformation by utilizing a homography transformation matrix:
s41, mapping from one two-dimensional plane to a three-dimensional space, and mapping from the three-dimensional space to another two-dimensional plane, wherein X-Y-Z is taken as a three-dimensional space coordinate system, which can be understood as a world coordinate system, and X-Y is taken as a pixel plane space coordinate system; x '-y' is the elevator key panel plane coordinate system, and the homography transformation can be described as: a point (X, Y) on the X-Y coordinate system corresponds to a straight line passing through the origin and the point on the X-Y-Z coordinate system
Figure GDA0003112064630000121
The straight line intersects the x '-y' coordinate system plane at a point (x ', y'), and the process from point (x, y) to point (x ', y') is called homography transformation.
Let X '-Y' plane be perpendicular to Z axis of X-Y-Z space coordinate system and intersect with Z axis at point (0, 1), namely point (X ', Y') under X '-Y' plane coordinate is point (X ', Y', 1) under X-Y-Z space coordinate system, describe X-Y plane coordinate system and X-Y-Z space coordinate system mapping relation by homography transformation matrix H:
Figure GDA0003112064630000122
Figure GDA0003112064630000123
in the formula, h 1 ~h 9 9 transformation parameters of the homography matrix;
and then the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system is obtained as follows:
Figure GDA0003112064630000124
the H matrix has 9 transformation parameters, but in practice only 8 degrees of freedom, since the X-Y-Z space coordinate system is a homogeneous coordinate system, irrespective of the coordinate transformation of coordinate scaling, when multiplying the H matrix by a scaling factor k:
Figure GDA0003112064630000131
k represents the same mapping relation with H, so H has only 8 degrees of freedom;
s42, when solving H, one method is to solve H 9 Set to 1, the equation to be solved is:
Figure GDA0003112064630000132
another method is to add a constraint to the homography matrix H, making its modulus equal to 1, as follows:
Figure GDA0003112064630000133
the equation to be solved is:
Figure GDA0003112064630000134
s43, defining target coordinate points of each of the four vertexes of the elevator key panel under the pixel coordinate system obtained in the step S3 under the scene coordinate system of the elevator key panel:
(x lt ,y lt )→(x lt ′,y lt ′)
(x lb ,y lb )→(x lb ′,y lb ′)
(x rb ,y rb )→(x rb ′,y rb ′)
(x rt ,y rt )→(x rt ′,y rt ′)
the target coordinates are substituted into the equation to be solved in step S42, and the four vertex coordinates in the pixel coordinate system are obtained by solving first, so that the H matrix can be solved simultaneously.
S5, detecting and positioning the finger of the elevator user in the original image by using an improved YOLOv3 algorithm, mapping and converting through a homography conversion matrix after obtaining the position coordinate of the finger, obtaining the corresponding position coordinate in the elevator key panel, and further determining which floor key the position coordinate is located on, so that the floor key pointed by the finger can be determined.
The input of the network is an original image acquired by a camera in an elevator car, the output is the position coordinates (x, y, w, h) and the confidence of the fingers of the elevator user in the original image, and the original image with the position coordinates, the confidence (1 or 0) and the classification probability (namely the probability of the fingers) of the elevator user known in the training is taken as training data. The loss function of the network is designed before the network training.
Here, the modified YOLOv3 algorithm includes modifying its loss function based on the YOLOv3 object detection algorithm (i.e., YOLOv3 network), and employing an adaptive pruning algorithm to reduce the feature extraction portion of the YOLOv3 network.
Specifically, for YOLOv3 networks, the loss function is designed as follows:
Figure GDA0003112064630000141
wherein the first term is the coordinate error loss, lambda coord The coefficient of the coordinate loss function is obtained; s denotes dividing the input image into S×S grids; b represents the number of frames contained in a grid;
Figure GDA0003112064630000142
indicating whether the jth border of the ith grid contains an object, wherein the containing value is 1, and the non-containing value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r is (r) ij X, y, w, h representing the jth prediction box of the ith grid; />
Figure GDA0003112064630000143
X, y, w, h representing the jth real frame of the ith grid;
the second term and the third term are confidence losses,
Figure GDA0003112064630000144
indicating whether the jth border of the ith grid does not contain an object, has a value of 1 and contains an object value of 0; lambda (lambda) noobj To balance the loss weights of the object-and object-free grids, the goal is to reduce the confidence loss of the grid frame without the object; c (C) ij Representing the confidence level of the jth frame prediction of the ith grid; />
Figure GDA0003112064630000145
Representing the true confidence of the jth frame of the ith grid;
the fourth term is category loss, categories represents the number of categories; p is p ij (c) Representing the prediction probability that the jth frame of the ith grid belongs to the c-th object;
Figure GDA0003112064630000146
the j-th frame of the i-th grid is represented as the true probability of belonging to the c-th object.
The above-mentioned Yolov3 uses a positive and negative sample balance factor lambda noobj To reduce the confidence loss caused by most grids which are not responsible for predicting targets, the imbalance of positive and negative samples (positive samples refer to targets to be detected by a network, and negative samples refer to the background except the targets) can be reduced to a certain extent, but the training problem of difficult samples is not solved. Thus, the third term of the loss function, i.e., confidence loss, refers to FocalLoss, to improve the model's ability to learn difficult samples.
Wherein, focalloss is improved based on cross entropy, and the function form is as follows:
Figure GDA0003112064630000151
wherein, y is,
Figure GDA0003112064630000152
representing predicted and true probability values, respectively, p ij (c),/>
Figure GDA0003112064630000153
Alpha is FocalLoss super parameter, generally in [0,5]And takes a value.
Focalloss has been provided with (1-y) for positive and negative samples, respectively α Y α Two weights, for example negative samples, when y is close to 0 when it is easy to learn, then weight y α The number of (2) is also small; when the sample is difficult to learn, y is close to 0.5, and the weight y is weight y α The number of (2) is relatively large. This makes the difficult-to-classify samples more heavily weighted than the easy-to-classify samples, thus improving the ability of the model to learn the difficult-to-classify samples.
The improved confidence loss function is as follows:
Figure GDA0003112064630000154
in addition, in the elevator application scene, the finger of the elevator user occupies a very small area in the image, namely, the frame of the small object in the data set occupies a very large proportion, so the training speed of the network can be accelerated by improving the loss of the small object, and the embodiment also adds an adaptive scaling factor for the first item, namely, the coordinate loss, and the scaling factor is as follows:
Figure GDA0003112064630000155
in the method, in the process of the invention,
Figure GDA0003112064630000156
representing the width and height of the real frame; ρ box The range of (2) is 1-2, and the smaller the real frame is, the larger the numerical value is, so that the loss specific gravity of the small object can be improved.
The improved coordinate loss is as follows:
Figure GDA0003112064630000157
in the convolutional neural network structure, one convolutional channel represents a certain characteristic of an image, and a model predicts by integrating characteristic information of all channels, so that the more complex the structure is, the more characteristics can be extracted by the network. The YOLOv3 network adopts a dark-53 as a feature extraction main body, the structure is provided with 53 convolution layers, the number of channels of each convolution layer sampled is doubled, the total number of channels reaches 17856, the target required to be detected by the elevator is a finger, the analysis is performed from an intuitive point of view, the structure of the dark-53 is provided with enough complexity to extract arrow features and a great amount of redundancy exists, and therefore the network structure or the size needs to be reduced.
Current pruning techniques for convolutional neural networks can be divided into the following categories: a method based on weight quantization (Weight Quantization), such as HashNet, groups weight variables through hash, and the variables in the same group share the same weight value, so that the method effectively reduces the parameter size of a model, but cannot improve the forward calculation speed of a network; based on the weight sparsification method, the method carries out sparse training on weight variables in a network, and then a large number of weight variables close to 0 in the network can be deleted, but the method can accelerate the forward calculation process only under special hardware; the method based on structure pruning reduces the structure of the network in a self-adaptive way through training data, so that the size of model parameters can be effectively reduced and the running speed can be improved.
Therefore, the embodiment aims at the problem of complexity redundancy of the dark net-53, and the network pruning algorithm based on the structure pruning method is adopted to prune the network at the channel level so as to reduce the number of characteristic channels of the network.
In order to perform channel-level pruning on a network by using the Lasso algorithm, the network pruning method is as follows:
first, adding a BN layer to each convolution layer, when using BN operation in convolution neural network, the BN layer willAssigning individual gamma to each input characteristic channel ik And beta ik The output result of the parameters, BN layer, is expressed as:
Figure GDA0003112064630000161
in the method, in the process of the invention,
Figure GDA0003112064630000162
output of BN layer; c (C) ik A kth characteristic channel representing an ith convolutional layer; mu (mu) ikik Respectively represent channel characteristics C ik The mean value and the variance of the data are obtained through statistics of historical training data; />
γ ik Equivalent to a scaling factor, networkscaling uses the scaling factor as the weight of the feature channel, and sparsifies the scaling factors by Lasso algorithm:
Figure GDA0003112064630000163
in the loss of new Loss as the final loss function old As the improved loss function, the Layers are the network Layers of the YOLOv3 network, and the Channels are the channel numbers of the YOLOv3 network;
finally, all gamma parameters are arranged in order from big to small, and then the gamma ordered at the back (with smaller value) is deleted in proportion ik The corresponding characteristic channel and BN channel. A schematic diagram of pruning of networkslim is shown in fig. 4.
S6, acquiring the face information of the resident on the floor pointed by the finger, wherein the face information of the resident can be registered in the elevator background system in advance;
then, whether the elevator user is a resident and whether the floor pointed by the finger is the floor occupied by the elevator user is subjected to double verification, the floor key is selected under the condition of double verification, and finally the elevator car is controlled to run to the floor, so that the elevator user is the resident of the floor, and the interactivity of the elevator and the safety among the resident are greatly improved.
The techniques described herein may be implemented by various means. For example, these techniques may be implemented in hardware, firmware, software, or a combination thereof. For a hardware implementation, the processing modules may be implemented within one or more Application Specific Integrated Circuits (ASICs), digital Signal Processors (DSPs), programmable Logic Devices (PLDs), field Programmable Gate Arrays (FPGAs), processors, controllers, microcontrollers, electronic devices, other electronic units designed to perform the functions described herein, or a combination thereof.
For a firmware and/or software implementation, the techniques may be implemented with modules (e.g., procedures, steps, flow, and so on) that perform the functions described herein. The firmware and/or software codes may be stored in a memory and executed by a processor. The memory may be implemented within the processor or external to the processor.
Those of ordinary skill in the art will appreciate that: all or part of the steps for implementing the above method embodiments may be implemented by hardware associated with program instructions, where the foregoing program may be stored in a computer readable storage medium, and when executed, the program performs steps including the above method embodiments; and the aforementioned storage medium includes: various media that can store program code, such as ROM, RAM, magnetic or optical disks. For example, the hardware is a computing device including a processor and a memory for storing a program executable by the processor, where the processor implements the embeddable contactless elevator key interaction method described above when executing the program stored by the memory.
The embodiments described above are preferred embodiments of the present invention, but the embodiments of the present invention are not limited to the embodiments described above, and any other changes, modifications, substitutions, combinations, and simplifications that do not depart from the spirit and principles of the present invention should be made in the equivalent manner, and are included in the scope of the present invention.

Claims (8)

1. An embeddable non-contact elevator key interaction method integrating face recognition is characterized by comprising the following steps of:
s1, acquiring an original image shot by a camera in an elevator car, and performing edge detection on a shooting area of the original image through a Laplace filtering operator so as to obtain an edge image;
s2, filtering the edge image by utilizing a linear filtering operator in the horizontal direction and the vertical direction to strengthen the linear edges in the horizontal direction and the vertical direction, and removing noise while keeping the edges of the elevator key panel area;
s3, respectively carrying out straight line detection on the image filtered in the horizontal direction and the image filtered in the vertical direction by adopting a Hough straight line detection algorithm so as to position the area of the elevator key panel;
s4, solving a mapping relation under view angle transformation by utilizing a homography transformation matrix;
s5, detecting and positioning the finger of the elevator user in the original image by using an improved YOLOv3 algorithm, and obtaining a floor key pointed by the finger according to a homography transformation matrix;
the improved YOLOv3 algorithm comprises the steps of improving the loss function of the YOLOv3 target detection algorithm based on the YOLOv3 target detection algorithm, and adopting an adaptive pruning algorithm to reduce the feature extraction part of the YOLOv3 network;
the loss function of the YOLOv3 network is designed as follows:
Figure FDA0004141432690000011
wherein the first term is the coordinate error loss, lambda coord The coefficient of the coordinate loss function is obtained; s denotes dividing the input image into S×S grids; b represents the number of frames contained in a grid;
Figure FDA0004141432690000012
indicating whether the jth border of the ith grid contains an object, wherein the containing value is 1, and the non-containing value is 0; x and y respectively represent the center coordinates of the frame; w and h respectively represent the length and width of the frame; r is (r) ij The jth prediction frame representing the ith gridx,y,w,h;/>
Figure FDA0004141432690000013
X, y, w, h representing the jth real box of the ith network;
the second term and the third term are confidence losses,
Figure FDA0004141432690000021
indicating whether the jth border of the ith grid does not contain an object, has a value of 1 and contains an object value of 0; lambda (lambda) noobj To balance the loss weights of the object-and object-free grids, the goal is to reduce the confidence loss of the grid frame without the object; c (C) ij Representing the confidence level of the jth frame prediction of the ith grid; />
Figure FDA0004141432690000022
Representing the true confidence of the jth frame of the ith grid;
the fourth term is category loss, categories represents the number of categories; p is p ij (c) Representing the prediction probability that the jth frame of the ith grid belongs to the c-th object;
Figure FDA0004141432690000023
representing the true probability that the jth frame of the ith grid belongs to the c-th object;
the improvement of the loss function is specifically as follows:
(1) FocalLoss is referenced for the third term, i.e., confidence loss, to improve the model's ability to learn difficult samples, where Focaloss improves based on cross entropy as a function:
Figure FDA0004141432690000024
in the method, in the process of the invention,
Figure FDA0004141432690000025
respectively represent prediction and realityProbability values of (i.e.)>
Figure FDA0004141432690000026
Alpha is FocalLoss super parameter;
the improved confidence loss function is as follows:
Figure FDA0004141432690000027
(2) An adaptive scaling factor is added to the first term, the coordinate loss, as follows:
Figure FDA0004141432690000028
in the method, in the process of the invention,
Figure FDA0004141432690000029
representing the width and height of the real frame; ρ box The range of (2) is 1-2, and the smaller the real frame is, the larger the numerical value is;
the improved coordinate loss is as follows:
Figure FDA00041414326900000210
s6, acquiring resident face information of the floor pointed by the finger, performing double verification on whether the elevator user is a resident and whether the floor pointed by the finger is the floor occupied by the elevator user, and finally controlling the elevator car to run to the floor only when the floor key is selected under the condition of double verification.
2. The embedded non-contact elevator key interaction method integrating face recognition according to claim 1, wherein the camera is installed above the elevator key panel and shoots the elevator key panel downwards;
in step S1, the process of edge detection of the camera shooting area by the Laplace filter operator is as follows:
s11, carrying out graying treatment on an original image to obtain a gray image;
s12, detecting the edge of the gray image by adopting a second-order gradient Laplace filter operator based on the principle of non-miss detection of the boundary of the elevator key panel, wherein the Laplace filter operator specifically calculates the edge gradient by utilizing a second-order difference, and the process is as follows:
considering the one-dimensional sequence { f (1), f (2),. F (x-1), f (x), f (x+1) } the second order difference at the x position is expressed as:
f``(x)=(f(x+1)-f(x))-(f(x)-f(x-1))
further simplified as:
f``(x)=f(x-1)-2*f(x)+f(x+1)
that is, the second order difference of the one-dimensional discrete sequence can be expressed as the result of the convolution of the sequence with the one-dimensional convolution kernel [ +1, -2, +1], generalizing this conclusion to a two-dimensional matrix of gray scale images:
for gray scale image I gray Two-dimensional kernel K with definition scale of 3*3 L
Figure FDA0004141432690000031
Since the two-dimensional kernel only considers the horizontal direction and the vertical direction, the diagonal information is added into consideration, and the convolution kernel K is obtained L The substitution is as follows:
Figure FDA0004141432690000032
the second-order differential information of the gray level image is obtained by convolution of the convolution kernel and the gray level image, namely:
G=K L *I gray
as the convolution kernel scale increases, the more pronounced the detected edge is;
and (3) taking out points with the convolution result of 0, wherein the points are edges, and the edge image is a set of points with obvious gray level change in the gray level image.
3. The method for interacting embedded non-contact elevator keys with human face recognition according to claim 1, wherein the process of step S2 is as follows:
s21, defining a horizontal linear filter operator K with the size of 1 Xn horizontal And a vertical linear filter operator K of size n x 1 vertical
Figure FDA0004141432690000041
Figure FDA0004141432690000042
Wherein T represents a vector pair transposition and n represents the size of a filtering operator; k (K) horizontal Sensitivity to horizontal straight edges, K vertical Sensitive to vertical straight edges;
s22, filtering the Laplace to obtain an edge image I Laplace Convolving with two operators to obtain a horizontal direction filtering image I horizontal And filtering image I in the vertical direction vertical
I horizontal =K horizontal *I Laplace
I vertical =K vertical *I Laplace
4. The embedded non-contact elevator key interaction method integrating face recognition according to claim 1, wherein the process of step S3 is as follows:
s31, considering that the non-horizontal or vertical linear edges of the edge image are restrained after the edge image is filtered in the horizontal direction and the vertical direction, firstly, dividing the non-horizontal linear edges and the non-vertical linear edges by using a threshold value and removing the non-horizontal linear edges and the non-vertical linear edges;
s32, respectively carrying out straight line detection on the horizontal direction filtering image and the vertical direction filtering image which are subjected to threshold segmentation by using a Hough straight line detection algorithm, and finally obtaining four elevator key panel boundary straight lines;
s33, solving intersection points of the four elevator key panel boundary lines to obtain four vertex coordinates (x lt ,y lt ),(x lb ,y lb ),(x rb ,y rb ),(x rt ,y rt )。
5. The embedded non-contact elevator key interaction method based on the face recognition fusion of claim 4, wherein the homography transformation reflects the process of mapping from one two-dimensional plane to three-dimensional space and then mapping from the three-dimensional space to another two-dimensional plane, wherein X-Y-Z is taken as a three-dimensional space coordinate system, the world coordinate system can be understood, X-Y is a pixel plane space coordinate system, and X '-Y' is an elevator key panel plane coordinate system; homography transforms can be described as: a point (X, Y) on the X-Y coordinate system corresponds to a straight line l passing through the origin and the point on the X-Y-Z coordinate system:
Figure FDA0004141432690000043
the straight line intersects the x '-y' coordinate system plane at a point (x ', y'), then the process from point (x, y) to point (x ', y') is called homography;
the process of solving the mapping relation under the view angle transformation by utilizing the homography transformation matrix is as follows:
s41, setting an X '-Y' plane to be perpendicular to a Z axis of an X-Y-Z space coordinate system and intersecting the Z axis at a point (0, 1), namely, setting a point (X ', Y') under the X '-Y' plane coordinate as a point (X ', Y', 1) under the X-Y-Z space coordinate system, and describing the mapping relation between the X-Y plane coordinate system and the X-Y-Z space coordinate system by utilizing a homography transformation matrix H:
Figure FDA0004141432690000051
Figure FDA0004141432690000052
in the formula, h 1 ~h 9 9 transformation parameters of the homography matrix;
and then the mapping relation from the x-y plane coordinate system to the x '-y' plane coordinate system is obtained as follows:
Figure FDA0004141432690000053
the H matrix has 9 transformation parameters, but in practice only 8 degrees of freedom, since the X-Y-Z space coordinate system is a homogeneous coordinate system, irrespective of the coordinate transformation of coordinate scaling, when multiplying the H matrix by a scaling factor k:
Figure FDA0004141432690000054
k represents the same mapping relation with H, so H has only 8 degrees of freedom;
s42, when solving H, one method is to solve H 9 Set to 1, the equation to be solved is:
Figure FDA0004141432690000055
another method is to add a constraint to the homography matrix H, making its modulus equal to 1, as follows:
Figure FDA0004141432690000056
the equation to be solved is:
Figure FDA0004141432690000061
s43, defining target coordinate points of each of the four vertexes of the elevator key panel under the pixel coordinate system obtained in the step S3 under the scene coordinate system of the elevator key panel:
(x lt ,y lt )→(x lt ′,y lt ′)
(x lb ,y lb )→(x lb ′,y lb ′)
(x rb ,y rb )→(x rb ′,y rb ′)
(x rt ,y rt )→(x rt ′,y rt ′)
and substituting the target coordinates into the equation to be solved in the step S42 respectively, and solving an H matrix simultaneously.
6. The embedded non-contact elevator key interaction method based on the fusion face recognition of claim 1, wherein the YOLOv3 network adopts a dark net-53 as a feature extraction main body, and performs channel-level pruning on the network by adopting a networkslim algorithm in a structure-based pruning method aiming at the problem of dark net-53 complexity redundancy so as to reduce the number of feature channels of the network:
first, adding a BN layer behind each convolution layer, when using BN operation in convolution neural network, each input characteristic channel is allocated with separate gamma ik And beta ik The output result of the parameters, BN layer, is expressed as:
Figure FDA0004141432690000062
in the method, in the process of the invention,
Figure FDA0004141432690000063
output of BN layer; c (C) ik A kth characteristic channel representing an ith convolutional layer; mu (mu) ik ,σ ik Respectively represent channel characteristics C ik Mean and variance of (a)Statistical analysis is carried out on historical training data;
γ ik equivalent to a scaling factor, networkscaling uses the scaling factor as the weight of the feature channel, and sparsifies the scaling factors by Lasso algorithm:
Figure FDA0004141432690000064
in the loss of new Loss as the final loss function old For the improved loss function, layers are the network layer number of the YOLOv3 network, and Channels are the channel number of the YOLOv3 network;
finally, all gamma parameters are arranged in order from big to small, and then the gamma parameters which are ordered after the gamma parameters are deleted in proportion ik The corresponding characteristic channel and BN channel.
7. A computing device comprising a processor and a memory for storing a program executable by the processor, wherein the processor, when executing the program stored in the memory, implements the human face recognition fusion embeddable non-contact elevator key interaction method of any one of claims 1 to 6.
8. An elevator is characterized in that the elevator realizes the identification of floor keys and the operation control of a car by the embedded non-contact elevator key interaction method integrating face recognition according to any one of claims 1 to 6.
CN202110086981.6A 2021-01-22 2021-01-22 Face recognition-fused embeddable non-contact elevator key interaction method Active CN113220114B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110086981.6A CN113220114B (en) 2021-01-22 2021-01-22 Face recognition-fused embeddable non-contact elevator key interaction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110086981.6A CN113220114B (en) 2021-01-22 2021-01-22 Face recognition-fused embeddable non-contact elevator key interaction method

Publications (2)

Publication Number Publication Date
CN113220114A CN113220114A (en) 2021-08-06
CN113220114B true CN113220114B (en) 2023-06-20

Family

ID=77084468

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110086981.6A Active CN113220114B (en) 2021-01-22 2021-01-22 Face recognition-fused embeddable non-contact elevator key interaction method

Country Status (1)

Country Link
CN (1) CN113220114B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113658088B (en) * 2021-08-27 2022-12-02 诺华视创电影科技(江苏)有限公司 Face synthesis method and device based on multiple discriminators

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1887317A1 (en) * 2006-08-04 2008-02-13 Fasep 2000 S.r.l. Method and device for non-contact measurement of the alignment of motor vehicle wheels
CN102701033A (en) * 2012-05-08 2012-10-03 华南理工大学 Elevator key and method based on image recognition technology
CN106598221A (en) * 2016-11-17 2017-04-26 电子科技大学 Eye key point detection-based 3D sight line direction estimation method
JP2019177973A (en) * 2018-03-30 2019-10-17 三菱電機株式会社 Input apparatus and input method

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8180114B2 (en) * 2006-07-13 2012-05-15 Northrop Grumman Systems Corporation Gesture recognition interface system with vertical display
US8768492B2 (en) * 2012-05-21 2014-07-01 Tait Towers Manufacturing Llc Automation and motion control system
US9292103B2 (en) * 2013-03-13 2016-03-22 Intel Corporation Gesture pre-processing of video stream using skintone detection

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
EP1887317A1 (en) * 2006-08-04 2008-02-13 Fasep 2000 S.r.l. Method and device for non-contact measurement of the alignment of motor vehicle wheels
CN102701033A (en) * 2012-05-08 2012-10-03 华南理工大学 Elevator key and method based on image recognition technology
CN106598221A (en) * 2016-11-17 2017-04-26 电子科技大学 Eye key point detection-based 3D sight line direction estimation method
JP2019177973A (en) * 2018-03-30 2019-10-17 三菱電機株式会社 Input apparatus and input method

Also Published As

Publication number Publication date
CN113220114A (en) 2021-08-06

Similar Documents

Publication Publication Date Title
CN109325454B (en) Static gesture real-time recognition method based on YOLOv3
CN110147743B (en) Real-time online pedestrian analysis and counting system and method under complex scene
CN111709310B (en) Gesture tracking and recognition method based on deep learning
CN106682598B (en) Multi-pose face feature point detection method based on cascade regression
CN106845487B (en) End-to-end license plate identification method
CN105069413B (en) A kind of human posture's recognition methods based on depth convolutional neural networks
Gurav et al. Real time finger tracking and contour detection for gesture recognition using OpenCV
CN103098076B (en) Gesture recognition system for TV control
CN110688965B (en) IPT simulation training gesture recognition method based on binocular vision
CN109948453B (en) Multi-person attitude estimation method based on convolutional neural network
CN102426480A (en) Man-machine interactive system and real-time gesture tracking processing method for same
CN110569817B (en) System and method for realizing gesture recognition based on vision
CN110795990B (en) Gesture recognition method for underwater equipment
CN110163111A (en) Method, apparatus of calling out the numbers, electronic equipment and storage medium based on recognition of face
CN106502390B (en) A kind of visual human's interactive system and method based on dynamic 3D Handwritten Digit Recognition
CN111444764A (en) Gesture recognition method based on depth residual error network
CN113449573A (en) Dynamic gesture recognition method and device
CN109033978A (en) A kind of CNN-SVM mixed model gesture identification method based on error correction strategies
CN112507918A (en) Gesture recognition method
CN103793056A (en) Mid-air gesture roaming control method based on distance vector
WO2016070800A1 (en) Virtual ball simulation and control method of mobile device
CN114792443A (en) Intelligent device gesture recognition control method based on image recognition
CN113220114B (en) Face recognition-fused embeddable non-contact elevator key interaction method
CN115147488A (en) Workpiece pose estimation method based on intensive prediction and grasping system
Ling et al. Research on gesture recognition based on YOLOv5

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant