CN112241204A

CN112241204A - Gesture interaction method and system of vehicle-mounted AR-HUD

Info

Publication number: CN112241204A
Application number: CN202011495637.4A
Authority: CN
Inventors: 胡爽
Original assignee: Ningbo Joynext Technology Corp
Current assignee: Ningbo Joynext Technology Corp
Priority date: 2020-12-17
Filing date: 2020-12-17
Publication date: 2021-01-19
Anticipated expiration: 2040-12-17
Also published as: CN112241204B

Abstract

The application relates to a gesture interaction method and system of a vehicle-mounted AR-HUD, wherein the method comprises the following steps: acquiring continuous multi-frame images in a vehicle cabin; processing the image through a gesture recognition model, and outputting hand position information and a gesture category; acquiring point cloud information in a vehicle cabin; determining gesture coordinates according to the hand position information and the point cloud information; determining an AR picture corresponding to the gesture interactive operation according to the gesture coordinates and the coordinates of the AR-HUD projection picture; and executing corresponding interactive operation according to the gesture category and the AR picture corresponding to the gesture interactive operation. According to the scheme, the gesture of the driver is recognized and positioned through the fusion perception of the image and the point cloud information, and the gesture interaction mode of the AR-HUD is achieved, so that the driver can interact with the AR-HUD system directly and efficiently, and the interaction experience of the AR-HUD is enhanced.

Description

Gesture interaction method and system of vehicle-mounted AR-HUD

Technical Field

The application relates to the technical field of automotive electronics, in particular to a gesture interaction method and system of a vehicle-mounted AR-HUD.

Background

The AR-HUD is an optical system designed through internal customization, and carries out perfect fusion on virtual image information and the actual traffic road condition in front, so that the state information of vehicles such as speed and rotating speed can be projected onto a front windshield, and the POI information, road name, navigation pictures and the like in front of the vehicles can be enhanced and displayed, so that a vehicle owner can check the relevant information of the vehicles and the road condition without lowering head in driving.

In the related art, known on-board AR-HUD systems primarily project virtual content for presentation to the driver with little interaction with the driver. The only interaction is also to adjust the basic settings of the AR-HUD, such as brightness, content, etc., by physical keys.

Disclosure of Invention

In order to overcome the problems in the related art at least to a certain extent, the application provides a gesture interaction method and system of an on-vehicle AR-HUD.

According to a first aspect of the embodiments of the application, a gesture interaction method for a vehicle-mounted AR-HUD is provided, which includes:

acquiring continuous multi-frame images in a vehicle cabin;

processing the image through a gesture recognition model, and outputting hand position information and a gesture category;

acquiring point cloud information in a vehicle cabin;

determining gesture coordinates according to the hand position information and the point cloud information;

determining an AR picture corresponding to gesture interactive operation according to the gesture coordinates and the coordinates of the AR-HUD projection picture, wherein the AR-HUD projection picture comprises a plurality of AR pictures;

and executing corresponding interactive operation according to the gesture type and the AR picture corresponding to the gesture interactive operation, and outputting the AR-HUD projection picture after the interactive operation is executed.

Further, the hand position information includes: position information of the key points, hand boundary information; the key points include: a point of the finger tip, a point at the junction of the first phalanx and the second phalanx;

accordingly, the determining gesture coordinates from the hand position information and the point cloud information includes:

matching the hand boundary information with the point cloud information to determine a point cloud corresponding to the hand;

screening point cloud information of the key points from the point cloud corresponding to the hand part by combining the position information of the key points;

and converting the point cloud information of the key points into coordinates under an automobile body coordinate system, namely the gesture coordinates.

Further, the point cloud information is obtained by scanning a laser radar installed in a vehicle cabin; the point cloud information is coordinates under a laser radar coordinate system;

correspondingly, the converting the point cloud information of the key points into coordinates under a vehicle body coordinate system comprises the following steps:

converting the coordinates of the key points from a laser radar coordinate system to a vehicle body coordinate system through a coordinate conversion matrix;

wherein the coordinate conversion matrix is predetermined according to an installation position of the laser radar with respect to the vehicle body.

Further, the coordinate transformation matrix comprises a rotation variation matrix and a translation variation matrix;

the conversion formula for converting the laser radar coordinate system into the vehicle body coordinate system is as follows:

；

wherein,

is the coordinate under the laser radar coordinate system,

the coordinates are under a vehicle body coordinate system;

and

a rotation variation matrix and a translation variation matrix, respectively.

Further, the gesture coordinates include coordinates of a plurality of keypoints;

correspondingly, the determining the AR picture corresponding to the gesture interactive operation includes:

determining a three-dimensional space straight line according to the coordinates of the key points;

calculating the distance between each AR picture of the AR-HUD projection pictures and the three-dimensional space straight line;

and determining an AR picture corresponding to the gesture interactive operation according to the distance.

Further, the determining a three-dimensional space straight line according to the coordinates of the plurality of key points includes:

determining an equation of a three-dimensional space straight line according to the coordinates of the two key points; wherein, the two key points are respectively the point of the finger tip and the point of the joint of the first phalanx and the second phalanx.

Further, the determining, according to the distance, the AR picture corresponding to the gesture interaction operation includes:

screening out an AR picture closest to the three-dimensional space straight line as an AR picture corresponding to the gesture interactive operation;

and if the difference value of the distances between the plurality of AR pictures and the straight line is smaller than a preset error threshold value, popping up a dialog box, and determining the corresponding AR pictures according to the operation instruction of the user.

Further, the gesture recognition model is a pre-trained space-time diagram convolutional neural network model.

According to a second aspect of the embodiments of the present application, there is provided a gesture interaction system of an in-vehicle AR-HUD, including:

the gesture recognition module is used for acquiring continuous multi-frame images in the vehicle cabin; processing the image through a gesture recognition model, and outputting hand position information and a gesture category;

the gesture positioning module is used for acquiring point cloud information in a vehicle cabin; determining gesture coordinates according to the hand position information and the point cloud information; determining an AR picture corresponding to gesture interactive operation according to the gesture coordinates and the coordinates of the AR-HUD projection picture, wherein the AR-HUD projection picture comprises a plurality of AR pictures;

and the fusion analysis module is used for executing corresponding interactive operation according to the gesture type and the AR picture corresponding to the gesture interactive operation, and outputting the AR-HUD projection picture after the interactive operation is executed.

According to a third aspect of embodiments of the present application, there is provided a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements the operational steps of the method according to any one of the above embodiments.

The technical scheme provided by the embodiment of the application has the following beneficial effects:

according to the scheme, the gesture of the driver is recognized and positioned through the fusion perception of the image and the point cloud information, and the gesture interaction mode of the AR-HUD is achieved, so that the driver can interact with the AR-HUD system directly and efficiently, and the interaction experience of the AR-HUD is enhanced.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

FIG. 1 is a flow diagram illustrating a method of gesture interaction for an on-board AR-HUD according to an exemplary embodiment.

FIG. 2 is an AR-HUD system architecture diagram illustrating a fused driver gesture interaction in accordance with an exemplary embodiment.

FIG. 3 is a functional block diagram illustrating an AR Core processing a driver gesture interaction according to an exemplary embodiment.

FIG. 4 is a flowchart illustrating an AR-HUD gesture interaction algorithm, according to an exemplary embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of methods and systems consistent with certain aspects of the present application, as detailed in the appended claims.

FIG. 1 is a flow diagram illustrating a method of gesture interaction for an on-board AR-HUD according to an exemplary embodiment. The method may comprise the steps of:

step S1: acquiring continuous multi-frame images in a vehicle cabin;

step S2: processing the image through a gesture recognition model, and outputting hand position information and a gesture category;

step S3: acquiring point cloud information in a vehicle cabin;

step S4: determining gesture coordinates according to the hand position information and the point cloud information;

step S5: determining an AR picture corresponding to gesture interactive operation according to the gesture coordinates and the coordinates of the AR-HUD projection picture, wherein the AR-HUD projection picture comprises a plurality of AR pictures;

step S6: and executing corresponding interactive operation according to the gesture type and the AR picture corresponding to the gesture interactive operation, and outputting the AR-HUD projection picture after the interactive operation is executed.

It should be understood that, although the steps in the flowchart of fig. 1 are shown in order as indicated by the arrows, the steps are not necessarily performed in order as indicated by the arrows. The steps are not performed in the exact order shown and described, and may be performed in other orders, unless explicitly stated otherwise. Moreover, at least a portion of the steps in fig. 1 may include multiple sub-steps or multiple stages that are not necessarily performed at the same time, but may be performed at different times, and the order of performance of the sub-steps or stages is not necessarily sequential, but may be performed in turn or alternately with other steps or at least a portion of the sub-steps or stages of other steps.

As shown in FIG. 2, the method of the present application can be applied to an AR-HUD system of a vehicle, which shows a block diagram of the AR-HUD system with gesture interaction function. Compared with the configuration of the existing common AR-HUD system, the laser radar and the camera are required to be installed at the proper positions in the automobile cabin, and the laser radar and the camera are installed at the close position of the same position as much as possible. The installation position is suggested to be an A column but is not limited to the A column, so long as the camera and the laser radar can acquire sufficient information of the gesture image and the point cloud respectively.

As shown in fig. 2 and 3, the method of the present application is mainly implemented by an AR Core operating at the vehicle end, which may include three functional modules: gesture recognition, gesture positioning and fusion analysis. The three modules form the core of the AR-HUD gesture interaction algorithm, gesture images and point cloud information are obtained through a camera and a laser radar in the cockpit, and the gestures of a driver are subjected to fusion perception.

The scheme of the application breaks through the defect that the conventional AR-HUD lacks deep interaction with a driver, the laser radar and the camera are creatively arranged in the cabin, and the gesture of the driver is identified and positioned through the fusion perception of the laser radar and the camera, so that the gesture interaction method of the vehicle-mounted AR-HUD is realized, and the interaction experience of the AR-HUD can be greatly improved. In a gesture mode, a driver can interact with the AR-HUD system, and the interaction experience of the AR-HUD is enhanced.

The following describes the scheme of the present application in an expanded manner with reference to a specific application scenario.

First, the gesture recognition module will be explained. In some embodiments, the gesture recognition model is a pre-trained space-time graph convolutional neural network model.

As shown in fig. 4, the camera takes a picture of the driver in real time and transmits the image to the gesture recognition module of the AR Core. The gesture recognition module is mainly composed of a pre-trained space-time graph convolutional neural network model and is used for recognizing gesture actions including static gestures and dynamic gestures. And in the period from the starting time to the ending time of the gesture action, continuous multi-frame images shot by the camera are taken as the input of the space-time diagram convolutional neural network model.

The neural network model can output three types of results: the method comprises the following steps of firstly, connecting a fingertip of a finger of a driver, a first knuckle phalanx and a second knuckle phalanx, and enabling two key points to be located in coordinate positions in a gesture action starting image frame and a gesture action ending image frame. For gesture actions such as clicking, swiping, etc., the fingers described herein generally represent only the index finger; for gesture actions such as grab-and-drag, the fingers described herein generally represent all fingers. And secondly, in the image frames from the starting time to the ending time of the gesture action, the boundary frame of the whole hand of the driver. And thirdly, judging the gesture action category of the driver.

The first and second output results are the hand position information in step S2, and the third output result is the gesture type in step S2. In some embodiments, the hand position information comprises: position information of the key points, hand boundary information; the key points include: a point of the tip of the finger, a point where the first phalanx connects with the second phalanx.

The gesture action category of the driver can comprise static gestures and dynamic gestures; static gestures, such as click-to-select; dynamic gestures, such as right and left strokes, up and down strokes, grab dragging, and the like. It is easy to understand that in practical applications, the gesture motion categories are not limited to the above categories, and can be adjusted according to practical requirements.

The gesture localization module is then described. The function of the gesture positioning module is as follows: the laser radar scans cabin space with a driver as a main body in real time and transmits point cloud information to the gesture positioning module of the AR Core. Besides the point cloud information obtained by scanning the laser radar as input, the gesture recognition module is required to output the result: and the position information of key points such as a boundary box of the whole hand of the driver, the finger tips of the fingers, the joints of the first and second phalanges and the like is used as the input of the gesture positioning module.

Specifically, the determining the gesture coordinates according to the hand position information and the point cloud information includes:

Referring to fig. 4, the gesture positioning module firstly screens out point cloud information of key points from point cloud information of the laser radar through twice screening; and then, applying a coordinate transformation matrix to transform the coordinates of the key points from the laser radar coordinate system to the vehicle coordinate system.

In some embodiments, the point cloud information is obtained by scanning a laser radar mounted in a vehicle cabin; and the point cloud information is coordinates under a laser radar coordinate system.

In some embodiments, the coordinate transformation matrix includes a rotation variation matrix and a translation variation matrix.

(1)

wherein,

is the coordinate under the laser radar coordinate system,

as coordinates in the coordinate system of the vehicle body；

And

a rotation variation matrix and a translation variation matrix, respectively.

The rotation change matrix and the translation change matrix are calibrated according to the installation position of the laser radar relative to the vehicle body.

In some embodiments, the gesture coordinates include coordinates of a plurality of keypoints;

In some embodiments, the determining a three-dimensional spatial straight line according to the coordinates of the plurality of key points includes:

In some embodiments, the determining, according to the distance, an AR screen corresponding to the gesture interaction operation includes:

Referring to FIG. 4, the specific processing steps of the gesture location module are summarized as follows.

a) Firstly, matching the laser radar point cloud picture with a hand boundary frame, and determining the point cloud of the part representing the hand area in the laser radar point cloud picture.

b) And screening out point cloud information corresponding to the finger key points from the determined point cloud representing the hand according to the coordinate information of the finger key points acquired by the gesture recognition module.

c) And converting the coordinates of the key points of the fingers into a vehicle body coordinate system from a laser radar coordinate system, wherein the conversion process is completed through the rotation change and the translation change of the coordinate system.

d) The straight line in the three-dimensional space can be determined by the finger tip, the joint of the first phalanx and the second phalanx and the three-dimensional coordinates of the key points of the two fingers. Such as two point coordinates being

、

Then the spatial straight line determined by these two points can be expressed as:

(2)

e) and calculating the distance from each part of the picture projected by the current AR-HUD to the determined space straight line. The projection positions of all parts of the AR picture are determined by an AR-HUD system, and the coordinate of a certain point of the AR picture is assumed to be

The distance of the point to the above-mentioned space straight line

Comprises the following steps:

(3)

it should be noted that, each part of the image projected by the AR-HUD may be on different planes, and different icons, images, etc. may be projected at different positions. Therefore, the distance from the AR picture to the space straight line is used as the basis in the scheme, and the AR picture is determined to be the part of the object to be operated by the gesture of the user.

f) And screening out the part of the AR picture closest to the determined spatial straight line from all the distances obtained by the calculation. If a plurality of AR pictures are close to the straight line distance, a dialog box pops up, and the driver selects and confirms the AR pictures.

g) And finally, outputting the AR-HUD projection content and the position corresponding to the starting time and the ending time of the gesture action. The gesture positioning module of the scheme needs to output the positions of the starting time and the ending time, and the gesture types comprise sliding, dragging and the like, and the positions of the starting time and the ending time are needed to determine how complete sliding and dragging operations are executed.

The fusion analysis module is explained with reference to fig. 4. The fusion analysis module is used for: and taking the output of the gesture recognition module (the judgment result of the gesture action type) and the output of the gesture positioning module (the AR-HUD projection content and the position corresponding to the gesture) as the input of the fusion analysis module. And the fusion analysis module combines the input information of the two modules to generate the content and the position to be projected corresponding to the AR-HUD under different gesture interaction conditions. And finally, outputting the picture content to be projected by the AR-HUD after the gesture interaction. For example, if the gesture action type is clicking and the gesture points to an AR icon, the fusion analysis module generates an AR picture after the icon is clicked; if the gesture action type is left-right sliding and the gesture position corresponds to the AR taskbar, the fusion analysis module generates a dynamic effect of left-right sliding of each icon of the taskbar; for example, if the gesture action type is grabbing and dragging, and the gesture action starting time points to an AR icon, the fusion analysis module moves the generated AR icon to the position pointed by the gesture action ending time.

Referring to fig. 3 and 4, the present application further provides a gesture interaction system of a vehicle-mounted AR-HUD, including:

With regard to the system in the above embodiment, the specific steps in which the respective modules perform operations have been described in detail in the embodiment related to the method, and are not described in detail herein. The modules in the gesture interaction system of the vehicle-mounted AR-HUD can be completely or partially realized by software, hardware and a combination thereof. The modules can be embedded in a hardware form or independent from a processor in the computer device, and can also be stored in a memory in the computer device in a software form, so that the processor can call and execute operations corresponding to the modules.

The present application further provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, implements an on-board AR-HUD gesture interaction method: acquiring continuous multi-frame images in a vehicle cabin; processing the image through a gesture recognition model, and outputting hand position information and a gesture category; acquiring point cloud information in a vehicle cabin; determining gesture coordinates according to the hand position information and the point cloud information; determining an AR picture corresponding to gesture interactive operation according to the gesture coordinates and the coordinates of the AR-HUD projection picture, wherein the AR-HUD projection picture comprises a plurality of AR pictures; and executing corresponding interactive operation according to the gesture type and the AR picture corresponding to the gesture interactive operation, and outputting the AR-HUD projection picture after the interactive operation is executed.

It is understood that the same or similar parts in the above embodiments may be mutually referred to, and the same or similar parts in other embodiments may be referred to for the content which is not described in detail in some embodiments.

It should be noted that, in the description of the present application, the terms "first", "second", etc. are used for descriptive purposes only and are not to be construed as indicating or implying relative importance. Further, in the description of the present application, the meaning of "a plurality" means at least two unless otherwise specified.

Any process or method descriptions in flow charts or otherwise described herein may be understood as representing modules, segments, or portions of code which include one or more executable instructions for implementing specific logical functions or steps of the process, and the scope of the preferred embodiments of the present application includes other implementations in which functions may be executed out of order from that shown or discussed, including substantially concurrently or in reverse order, depending on the functionality involved, as would be understood by those reasonably skilled in the art of the present application.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

It will be understood by those skilled in the art that all or part of the steps carried by the method for implementing the above embodiments may be implemented by hardware related to instructions of a program, which may be stored in a computer readable storage medium, and when the program is executed, the program includes one or a combination of the steps of the method embodiments.

In addition, functional units in the embodiments of the present application may be integrated into one processing module, or each unit may exist alone physically, or two or more units are integrated into one module. The integrated module can be realized in a hardware mode, and can also be realized in a software functional module mode. The integrated module, if implemented in the form of a software functional module and sold or used as a stand-alone product, may also be stored in a computer readable storage medium.

The storage medium mentioned above may be a read-only memory, a magnetic or optical disk, etc.

In the description herein, reference to the description of the term "one embodiment," "some embodiments," "an example," "a specific example," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

Although embodiments of the present application have been shown and described above, it is understood that the above embodiments are exemplary and should not be construed as limiting the present application, and that variations, modifications, substitutions and alterations may be made to the above embodiments by those of ordinary skill in the art within the scope of the present application.

Claims

1. A gesture interaction method of a vehicle-mounted AR-HUD is characterized by comprising the following steps:

acquiring continuous multi-frame images in a vehicle cabin;

acquiring point cloud information in a vehicle cabin;

2. The method of claim 1, wherein the hand position information comprises: position information of the key points, hand boundary information; the key points include: a point of the finger tip, a point at the junction of the first phalanx and the second phalanx;

3. The method of claim 2, wherein the point cloud information is obtained by a lidar scan mounted within a vehicle cabin; the point cloud information is coordinates under a laser radar coordinate system;

4. The method of claim 3, wherein the coordinate transformation matrix comprises a rotation transformation matrix and a translation transformation matrix;

；

wherein,

is the coordinate under the laser radar coordinate system,

the coordinates are under a vehicle body coordinate system;

and

a rotation variation matrix and a translation variation matrix, respectively.

5. The method of claim 4, wherein the gesture coordinates comprise coordinates of a plurality of keypoints;

6. The method of claim 5, wherein determining a three-dimensional spatial line based on coordinates of the plurality of keypoints comprises:

7. The method of claim 5, wherein the determining the AR picture corresponding to the gesture interaction operation according to the distance comprises:

8. The method according to any one of claims 1-7, wherein the gesture recognition model is a pre-trained space-time graph convolutional neural network model.

9. A gesture interaction system of an on-vehicle AR-HUD, comprising:

10. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the operating steps of the method according to any one of claims 1 to 8.