Computer vision automatic door opening method and system
Technical Field
The invention relates to the technical field of deep learning, in particular to a computer vision automatic vehicle door opening method and a computer vision automatic vehicle door opening system.
Background
In recent years, the number of intelligent systems for opening the door according to gestures in the market is small, one researched is an intelligent door opening system which is produced by mcdona and adopts infrared type sensing gestures, the electric opening system is composed of three parts, the first is a sensing system, and the realization mode of the sensing system is an infrared type; the second part is a matched intelligent lock which can be automatically opened; the third part is an actuator. The specific operation mode is that a user makes a stroking gesture, the electric back door can be automatically opened, the opening gap is about 30-50 mm, and the electric back door can be directly opened by hands. The system is only simple to detect whether a human body or a hand appears nearby in an infrared ray detection mode, can not customize a set special gesture according to a user, and can also open a door by a misjudgment end when a person passes through.
Disclosure of Invention
This section is for the purpose of summarizing some aspects of embodiments of the invention and to briefly introduce some preferred embodiments. In this section, as well as in the abstract and the title of the invention of this application, simplifications or omissions may be made to avoid obscuring the purpose of the section, the abstract and the title, and such simplifications or omissions are not intended to limit the scope of the invention.
The present invention has been made in view of the above-mentioned conventional problems.
Therefore, one technical problem solved by the present invention is: the computer vision automatic door opening method is provided, so that the specific gesture or posture can be recognized more accurately, and the mistaken door opening caused by the mistaken detection is reduced.
In order to solve the technical problems, the invention provides the following technical scheme: a computer vision automatic door opening method comprises the steps of collecting target gestures to obtain posture picture data; marking the posture picture data; training a deep neural network by using the gesture picture data subjected to marking; inputting images shot by a camera in real time into the deep neural network for prediction frame by frame, and outputting prediction result data; judging whether the gesture or posture meets the door opening intention or not according to the prediction result data and algorithm logic; if the door opening intention is met, the vehicle controller controls the vehicle to automatically open the door.
As a preferable aspect of the computer vision automatic door opening method according to the present invention, wherein: the target gesture acquisition comprises that a 1920 x 1080 high-definition monocular camera is adopted; the angle of the camera is the same as that of a vehicle body camera in the environment of various different lights indoors and outdoors in the daytime and at night; and continuously shooting picture data of hand gestures and body postures to be acquired by using the monocular camera.
As a preferable aspect of the computer vision automatic door opening method according to the present invention, wherein: the marking of the gesture picture data comprises framing out different gestures through a 2D frame; classifying into corresponding meaning types; and positioning and labeling the hand fingers, the palm key points and the body key points, and marking out the pixel coordinates of the key points in the graph.
As a preferable aspect of the computer vision automatic door opening method according to the present invention, wherein: the method comprises the steps of constructing a deep neural network by using an MXNET deep learning framework;
defining the loss function, the softmax function is as follows:
cross entry function:
where L is the loss, SjIs softmax, where the jth value of the output vector S, representing the probability that the sample belongs to the jth class, yjWith a summation symbol in front, j also ranges from 1 to the number of categories T, so label-y is a vector of 1 × T, the T values inside, and only 1 value is 1, the other T-1 values are all 0, the value of the position corresponding to the true label is 1, and the others are all 0.
As a preferable aspect of the computer vision automatic door opening method according to the present invention, wherein: reading the marked gesture picture data by using an MXNET deep learning framework; training the built deep neural network; updating the parameters of the deep neural network by using the error of the result predicted by the loss function and the true value and the magnitude of the error by using a gradient optimizer; until the training index reaches more than 98%.
As a preferable aspect of the computer vision automatic door opening method according to the present invention, wherein: the frame-by-frame prediction comprises compiling the deep neural network for invoking training; reading a real-time camera frame and inputting the real-time camera frame into the deep neural network to perform real-time processing on the read frame; compiling the processed result of each frame of the deep neural network into an array; the array entries are logically predicted.
As a preferable aspect of the computer vision automatic door opening method according to the present invention, wherein: judging whether the gesture or posture meets the door opening intention or not, wherein the judgment comprises the step of judging whether the obtained array contains a desired result or not; if the target result label is 2, if the obtained array contains [1, 2, 3, 4], the detection result is regarded as meeting the requirements of people, and the judgment result can be set as 1; otherwise, the detection result is regarded as not containing the target result, and the judgment result is set as-1.
As a preferable aspect of the computer vision automatic door opening method according to the present invention, wherein: controlling the vehicle door according to the judgment result, wherein if the judgment result is 1, the vehicle door is considered to meet the door opening intention; sending a door opening instruction to an automatic door opening motor; otherwise, if the judgment result is-1, the door opening intention is not met, and no instruction is sent to the automatic door opening motor.
As a preferable aspect of the computer vision automatic door opening method according to the present invention, wherein: if the automatic door opening motor receives an instruction meeting the door opening intention, the motor is powered on to open the door, otherwise, the motor keeps a breakpoint standby state.
The invention solves another technical problem that: the computer vision automatic door opening method is provided, so that the specific gesture or posture can be recognized more accurately, and the mistaken door opening caused by the mistaken detection is reduced.
In order to solve the technical problems, the invention provides the following technical scheme: a computer vision automatic start door system comprises an acquisition module, a marking module, a deep neural network module, a logic judgment module and a control module; the acquisition module is arranged on the vehicle body and used for acquiring gesture picture data of a target gesture; the marking module is used for marking the posture picture data; the deep neural network module is used for predicting the real-time shot images frame by frame and outputting prediction result data; the logic judgment module judges whether the gesture or posture meets the door opening intention and generates a control instruction according to the prediction result data and algorithm logic; and if the door opening intention is met, the control module receives the control instruction and is used for controlling the vehicle to automatically open the vehicle door by the vehicle controller.
The invention has the beneficial effects that: specific gestures or postures can be more accurately identified, the phenomenon that the door is opened by mistake due to false detection is reduced, the characteristic gestures or postures can be identified at a place far away from the door through the camera, specific gestures or postures can be customized for a customer, and the safety and privacy of the system can be enhanced by automatically opening the door through the specific gestures or postures.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings needed to be used in the description of the embodiments will be briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without inventive exercise. Wherein:
FIG. 1 is a schematic overall flow chart of a computer vision automatic door opening method according to a first embodiment of the invention;
fig. 2 is a schematic structural diagram of an overall principle of a computer vision automatic door opening system according to a first embodiment of the present invention;
FIG. 3 is a diagram illustrating a palm edge point according to a third embodiment of the present invention;
FIG. 4 is a schematic diagram of a palm edge profile according to a third embodiment of the present invention;
fig. 5 is a diagram illustrating simulation results according to a third embodiment of the present invention.
Detailed Description
In order to make the aforementioned objects, features and advantages of the present invention comprehensible, specific embodiments accompanied with figures are described in detail below, and it is apparent that the described embodiments are a part of the embodiments of the present invention, not all of the embodiments. All other embodiments, which can be obtained by a person skilled in the art without making creative efforts based on the embodiments of the present invention, shall fall within the protection scope of the present invention.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, but the present invention may be practiced in other ways than those specifically described and will be readily apparent to those of ordinary skill in the art without departing from the spirit of the present invention, and therefore the present invention is not limited to the specific embodiments disclosed below.
Furthermore, reference herein to "one embodiment" or "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment is included in at least one implementation of the invention. The appearances of the phrase "in one embodiment" in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments.
The present invention will be described in detail with reference to the drawings, wherein the cross-sectional views illustrating the structure of the device are not enlarged partially in general scale for convenience of illustration, and the drawings are only exemplary and should not be construed as limiting the scope of the present invention. In addition, the three-dimensional dimensions of length, width and depth should be included in the actual fabrication.
Meanwhile, in the description of the present invention, it should be noted that the terms "upper, lower, inner and outer" and the like indicate orientations or positional relationships based on the orientations or positional relationships shown in the drawings, and are only for convenience of describing the present invention and simplifying the description, but do not indicate or imply that the referred device or element must have a specific orientation, be constructed in a specific orientation and operate, and thus, cannot be construed as limiting the present invention. Furthermore, the terms first, second, or third are used for descriptive purposes only and are not to be construed as indicating or implying relative importance.
The terms "mounted, connected and connected" in the present invention are to be understood broadly, unless otherwise explicitly specified or limited, for example: can be fixedly connected, detachably connected or integrally connected; they may be mechanically, electrically, or directly connected, or indirectly connected through intervening media, or may be interconnected between two elements. The specific meanings of the above terms in the present invention can be understood in specific cases to those skilled in the art.
Example 1
Referring to the schematic diagram of fig. 1, the schematic diagram is an overall flow chart of the automatic door opening method based on computer vision, which can more accurately recognize a specific gesture or posture, reduce the false door opening caused by false detection, and is superior to an infrared detection system in terms of recognizable distance. The camera can recognize the characteristic gesture or posture at a place far away from the door, a specific gesture or posture can be customized for a customer, and the safety and privacy of the system can be enhanced by enabling the door to be automatically opened through the specific gesture or posture. In particular to a computer vision automatic door opening method, which comprises the following steps,
s1: acquiring gesture picture data of a target gesture; the target gesture acquisition in this step includes,
a 1920 x 1080 high-definition monocular camera is adopted;
the angle of the camera is the same as that of a vehicle body camera in the environment of various different lights indoors and outdoors in the daytime and at night;
and continuously shooting picture data of hand gestures and body postures to be acquired by using the monocular camera.
S2: marking the gesture picture data, wherein marking the gesture picture data comprises,
framing out different gestures and postures through the 2D frame;
classifying into corresponding meaning types;
and positioning and labeling the hand fingers, the palm key points and the body key points, and marking out the pixel coordinates of the key points in the graph.
S3: training a deep neural network by using the marked gesture picture data; the concrete steps include that,
building a deep neural network by using an MXNET deep learning framework;
defining the loss function, the softmax function is as follows:
cross entry function:
where L is the loss, SjIs softmax, where the jth value of the output vector S, representing the probability that the sample belongs to the jth class, yjWith a summation symbol in front, j also ranges from 1 to the number of categories T, so labely is a vector of 1 × T, the T values inside, and only 1 value is 1, the other T-1 values are all 0, the value of the position corresponding to the real label is 1, and the others are all 0.
Further, reading the marked gesture picture data by using an MXNET deep learning framework;
training the built deep neural network;
error between result and true value predicted by loss function and error magnitude
Updating the parameters of the deep neural network by using a gradient optimizer;
until the training index reaches more than 98%.
S4: inputting images shot by a camera in real time into a depth neural network for prediction frame by frame, and outputting prediction result data; the frame-by-frame prediction in this step includes,
compiling a deep neural network for calling training;
reading a real-time camera frame and inputting the real-time camera frame into a deep neural network to perform real-time processing on the read frame;
compiling the processed results of each frame of the deep neural network into an array;
the array transfer makes a logical prediction.
S5: judging whether the gesture or posture meets the door opening intention or not according to the prediction result data and algorithm logic; determining whether the gesture or posture satisfies a door opening intent, including,
judging whether the obtained array contains a desired result;
if the target result label is 2, if the obtained array contains [1, 2, 3, 4], the detection result is regarded as meeting the requirements of people, and the judgment result can be set as 1;
otherwise, the detection result is regarded as not containing the target result, and the judgment result is set as-1.
S6: if the door opening intention is met, the vehicle controller controls the vehicle to automatically open the vehicle door, if the automatic door opening motor receives an instruction meeting the door opening intention, the motor is powered on to open the vehicle door, and otherwise, the motor keeps a breakpoint standby state.
Whether come to detect through infrared induction's mode in with present whether have the hand to be close to, whether the mode that whether need open the door compares, this embodiment passes through the mode of computer vision, the meaning of accurate judgement gesture and posture, different gestures and posture of accessible accurate recognition, make different judgement actions, the shortcoming that the intention misrecognition appears easily in the solution now, the security is lower, the individuation is relatively poor and recognizable intention is less, can increase the discernment distance simultaneously, richen discernment intention, increase discernment security, reduce the misrecognition probability, increase privacy and individuation.
Example 2
Referring to the schematic diagram of fig. 2, the schematic diagram of the overall principle structure of the computer vision automatic door system according to the present embodiment includes an acquisition module 100, a marking module 200, a deep neural network module 300, a logic determination module 400, and a control module 500; specifically, the acquisition module 100 is disposed on the vehicle body and configured to acquire gesture picture data of a target gesture; the marking module 200 is used for marking the posture picture data; the deep neural network module 300 is configured to predict frames by frames of an image captured in real time and output prediction result data; the logic judgment module 400 judges whether the gesture or posture meets the door opening intention and generates a control instruction according to the prediction result data and the algorithm logic; if the door opening intention is satisfied, the control module 500 receives a control instruction for the vehicle controller to control the vehicle to automatically open the door.
Example 3
The operation of opening the door is carried out in combination with the gesture of opening in the above-mentioned embodiment, but there is irrelevant crowd around in the actual operation, and it leads to the door to be opened if being misrecognized to have some non-owner's gestures to take place danger. For the problem, the existing technology of user identity binding or face recognition is used for recognizing whether the vehicle is the owner, but other auxiliary technologies are needed to shield the gesture, so that the development cost and difficulty are increased. Based on this, the present embodiment provides a method for controlling car door opening authorized by gesture, which specifically includes the following steps,
a user registers and logs in a vehicle-mounted control system;
inputting palm information of a user and correspondingly obtaining gesture authorization;
extracting palm print characteristics and size characteristics of the palm information as authorization characteristics;
when the system receives a gesture request, acquiring a current palm image, extracting features of the current palm image and matching the features with authorized features;
and judging whether to give authorization to the current gesture according to the matching result.
According to the gesture authorization method provided by the embodiment, while other parts are not additionally identified, the gesture identification process and the palm authorization process are combined, only the palm image is obtained, the face and other images are not required to be obtained, gesture shielding and screening of other non-input palms can be completed by the gesture authorization method, and the identification preparation rate is higher than that of traditional face identification.
Referring to the illustration of fig. 3, further, entering palm information of the user includes the following steps:
the vehicle-mounted camera shoots images of the front side and the back side of a user to be recorded in a palm opening state;
performing gray level processing on the image, defining the image in a neighborhood range of 3 × 3 pixels, taking a pixel at the center point of the neighborhood as a threshold, and marking information of pixel values by comparing gray levels of 8 pixels adjacent to the pixel;
if the pixel value is greater than the threshold value, marking the pixel value as 1, otherwise marking the pixel value as 0;
marking all pixel values to obtain 8-bit binary number, namely the characteristic value of the field;
the texture information of the image in the field reflects the characteristic image in the value, and the characteristic input of the palm information is completed.
Since the above features reflect the texture of the palm, in order to improve the recognition accuracy, in this embodiment, feature extraction is provided in combination with contour feature extraction, and an edge detection algorithm is adopted to perform feature extraction on the palm contour, which specifically includes the following steps:
and converting the color image into a gray image, carrying out Gaussian blur on the image, carrying out convolution processing on the gray image and a Gaussian kernel, and inhibiting high-frequency noise in the image.
The 8 edge points are schematically shown in fig. 3, the image edge amplitude and angle are calculated, and the differential edge detection operator is used for obtaining the amplitude and angle of the image edge. According to the convolution operator, the gradient amplitude and the gradient direction of the image in two coordinate axis directions can be deduced to be:
in the formula, f is an image gray value, P is an X-direction gradient amplitude, Q is a Y-direction gradient amplitude, M is the point amplitude, and Q is a gradient direction, namely an angle.
Non-maximum signal suppression processing (edge thinning), in order to exclude pixel points which do not belong to the edge, signal suppression is needed to be carried out on the pixel points, namely, local maximum points of the pixel points are found out from the found points, and the points of the non-local maximum are removed, so that the detailed position of the edge after thinning and the accurate edge is obtained.
And (4) performing double-threshold edge connection processing. However, because the set threshold is too large, some true edges are excluded, and the edges of the obtained image are not closed. Therefore, it is necessary to determine a low threshold again, and obtain a low threshold edge for connecting the image edges to reclose the edge.
Meanwhile, the hand feature profile is considered to be a smooth curve, the profile curve function is defined as S ═ G (x, y), obviously S is nonlinear, and by defining the line segment from point 1 to point 2 as S1, the function model is:
y=k11x2+k12x+c11
wherein k is11,k12,c11For the undetermined coefficient, and by this interpolation, the S2 line segment definition function is:
y=k21x+c21
wherein k is21And c21For undetermined coefficients, further divided into S31、S32,S33The function model is as follows:
f(x,y)=S31f1(x,y)+S32f2(x,y)+S33f3(x,y)。
let f2(x, y) is a high order polynomial function as follows:
y=a0+a1x+a2x2+…+anxn,6≤n≤9
wherein, the number of the inflexion points of the graph can be judged that the order n is not less than 6, the complexity n considering the fitting is not more than 9, and f is not more than1(x, y) and f3(x, y) is a polynomial function of order 3 or less. The model function for the entire profile curve S is as follows:
S=S1G1(x,y)+S2G(x,y)+S3G3(x,y)
s is a smooth curve, and to satisfy the continuous, smooth condition, the text is paired with S1、S2、S3And the function establishment constraint condition is as follows:
wherein x isd,ydα (x, y), β (x, y) are curve functions on both sides of the division point.
In summary, to obtain the expression of the function S, S needs to be determined1、S2、S3Further analysis revealed that S is due to1、S2The function model has low order and less undetermined coefficient, if S3As known, the corresponding S can be obtained by the constraint condition of continuity of the smooth curve and 1-2 characteristic points1、S2To S3For example, if enough contour reference points are obtained, the curve can be piecewise fitted by using the least square methodReference is made to the schematic of fig. 4, thus becoming the extraction of contour reference points.
And finally, calculating Euclidean distances of the characteristics of the two images by using a nearest neighbor algorithm to judge the similarity of the images, if the similarity is greater than a certain threshold value, determining that the images are not the owner of the vehicle, otherwise, determining that the images are the owner of the vehicle.
In order to verify that the method of the embodiment has a high recognition rate, a traditional face recognition mode is combined with gesture recognition to serve as a comparison method 1, a traditional image feature recognition technology is used as a comparison method 2, 10 different users are selected, palm information of the 10 users needs to be input into a vehicle-mounted control system, a vehicle is requested to be opened, each user performs MAT L B software programming simulation tests for 1-10 times by adopting each method, the vehicle door opening is regarded as successful in recognition, the final simulation result is shown in figure 5, and the method achieves high accuracy in recognition.
It should be recognized that embodiments of the present invention can be realized and implemented by computer hardware, a combination of hardware and software, or by computer instructions stored in a non-transitory computer readable memory. The methods may be implemented in a computer program using standard programming techniques, including a non-transitory computer-readable storage medium configured with the computer program, where the storage medium so configured causes a computer to operate in a specific and predefined manner, according to the methods and figures described in the detailed description. Each program may be implemented in a high level procedural or object oriented programming language to communicate with a computer system. However, the program(s) can be implemented in assembly or machine language, if desired. In any case, the language may be a compiled or interpreted language. Furthermore, the program can be run on a programmed application specific integrated circuit for this purpose.
Further, the operations of processes described herein can be performed in any suitable order unless otherwise indicated herein or otherwise clearly contradicted by context. The processes described herein (or variations and/or combinations thereof) may be performed under the control of one or more computer systems configured with executable instructions, and may be implemented as code (e.g., executable instructions, one or more computer programs, or one or more applications) collectively executed on one or more processors, by hardware, or combinations thereof. The computer program includes a plurality of instructions executable by one or more processors.
Further, the method may be implemented in any type of computing platform operatively connected to a suitable interface, including but not limited to a personal computer, mini computer, mainframe, workstation, networked or distributed computing environment, separate or integrated computer platform, or in communication with a charged particle tool or other imaging device, and the like. Aspects of the invention may be embodied in machine-readable code stored on a non-transitory storage medium or device, whether removable or integrated into a computing platform, such as a hard disk, optically read and/or write storage medium, RAM, ROM, or the like, such that it may be read by a programmable computer, which when read by the storage medium or device, is operative to configure and operate the computer to perform the procedures described herein. Further, the machine-readable code, or portions thereof, may be transmitted over a wired or wireless network. The invention described herein includes these and other different types of non-transitory computer-readable storage media when such media include instructions or programs that implement the steps described above in conjunction with a microprocessor or other data processor. The invention also includes the computer itself when programmed according to the methods and techniques described herein. A computer program can be applied to input data to perform the functions described herein to transform the input data to generate output data that is stored to non-volatile memory. The output information may also be applied to one or more output devices, such as a display. In a preferred embodiment of the invention, the transformed data represents physical and tangible objects, including particular visual depictions of physical and tangible objects produced on a display.
As used in this application, the terms "component," "module," "system," and the like are intended to refer to a computer-related entity, either hardware, firmware, a combination of hardware and software, or software in execution. For example, a component may be, but is not limited to being: a process running on a processor, an object, an executable, a thread of execution, a program, and/or a computer. By way of example, both an application running on a computing device and the computing device can be a component. One or more components can reside within a process and/or thread of execution and a component can be localized on one computer and/or distributed between two or more computers. In addition, these components can execute from various computer readable media having various data structures thereon. The components may communicate by way of local and/or remote processes such as in accordance with a signal having one or more data packets (e.g., data from one component interacting with another component in a local system, distributed system, and/or across a network such as the internet with other systems by way of the signal).
It should be noted that the above-mentioned embodiments are only for illustrating the technical solutions of the present invention and not for limiting, and although the present invention has been described in detail with reference to the preferred embodiments, it should be understood by those skilled in the art that modifications or equivalent substitutions may be made on the technical solutions of the present invention without departing from the spirit and scope of the technical solutions of the present invention, which should be covered by the claims of the present invention.