CN116225236A

CN116225236A - Intelligent home scene interaction method based on acousto-optic control

Info

Publication number: CN116225236A
Application number: CN202310497713.2A
Authority: CN
Inventors: 邱维新
Original assignee: Shenzhen Boshi System Integration Co ltd
Current assignee: Shenzhen Boshi System Integration Co ltd
Priority date: 2023-05-06
Filing date: 2023-05-06
Publication date: 2023-06-06
Anticipated expiration: 2043-05-06
Also published as: CN116225236B

Abstract

An intelligent home scene interaction method based on acousto-optic control comprises the following steps: the user gives out a voice instruction and gesture information; responding to a voice instruction of a user, acquiring 3 images containing gesture information according to a three-dimensional vision system, wherein the three images are respectively: a first image, a second image, and a third image; obtaining the world coordinates of the knuckle according to the 3 images; according to the 3 images, obtaining the three-dimensional pointing angle of the finger; and determining a unique target object pointed by the gesture according to the world coordinates of the knuckle, the three-dimensional pointing angle and the world coordinates of all target objects, and executing a voice instruction on the unique target object. According to the invention, through the three-dimensional vision system, when the three-dimensional pointing angle is obtained, the two-dimensional straight line in each image is directly fitted. Neither the image depth information is required nor the image distortion and other factors are considered. The accuracy of the result is greatly improved.

Description

Intelligent home scene interaction method based on acousto-optic control

Technical Field

The invention belongs to the technical field of intelligent home control, and particularly relates to an intelligent home scene interaction method based on acousto-optic control.

Background

In the related art in the field of smart home control, for example, when we control a certain lamp, we need to say the name first and then speak the voice command.

However, this method has drawbacks in that: it is naturally not problematic for the homeowner or young to say that they are of a name. While for guests, especially middle aged and elderly people, on the one hand their memory is reduced; on the other hand, it may simply look at children on weekends or take children, and not live, and it is naturally impossible to remember as many lamp names. Considering that the applicable crowd of the smart home is just the old people crowd with inconvenient legs and feet, a method capable of combining voice instructions to realize efficient and accurate control is needed.

Prior art document 1 (CN 109839827 a) discloses a gesture recognition smart home control system based on full spatial location information, and the method is characterized in that a spatial pointing vector is formed by using the spatial locations of the wrist and the finger tip (i.e. three-dimensional world coordinates) (see paragraph [0053] of the specification). However, at the current image processing level, the space position is calculated by considering not only the influence factors such as camera distortion, but also whether the internal parameters and external parameters of the camera are accurate or not. That is, the determination of the spatial position itself has a large error. In addition, in smarthouses, the ratio of the distance of the target object from the finger tip to the distance of the wrist from the finger tip is much larger than 1, which is known as the spurious one thousand meters. The space direction vector is obtained by using 2 space positions, the error of the space positions is amplified according to the distance ratio, and the accuracy of the result is greatly reduced.

Disclosure of Invention

In order to solve the defects in the prior art, the invention aims to solve the defects, and further provides an intelligent home scene interaction method based on acousto-optic control.

The invention adopts the following technical scheme.

The invention discloses an acousto-optic control-based intelligent home scene interaction method, which comprises the following steps:

step 1, a user gives a voice instruction and gesture information; responding to a voice instruction of a user, acquiring 3 images containing gesture information according to a three-dimensional vision system, wherein the three images are respectively: a first image, a second image, and a third image;

step 2, obtaining world coordinates of the knuckle according to the 3 images;

step 3, obtaining a three-dimensional pointing angle of the finger according to the 3 images;

and 4, determining a unique target object pointed by the gesture according to the world coordinates of the knuckle, the three-dimensional pointing angle and the world coordinates of all target objects, and executing a voice instruction on the unique target object.

The second aspect of the invention discloses an acousto-optic control based intelligent home scene interaction system, which comprises: a three-eye stereoscopic vision system and a central processing unit; wherein, the central processing unit comprises: the device comprises an image processing module, a voice recognition module, a calculation module and a data storage module;

the image processing module is used for responding to a voice instruction of a user, acquiring 3 images containing gesture information according to the three-dimensional vision system, wherein the three images are respectively: a first image, a second image, and a third image; the calculation module is combined to obtain the world coordinates of the knuckle and the three-dimensional pointing angle of the finger;

the data storage module is used for storing world coordinates of all targets;

the computing module is used for determining a unique target object pointed by the gesture according to the world coordinates of the knuckle, the three-dimensional pointing angle and the world coordinates of all target objects;

the voice recognition module is used for executing voice instructions on the unique target object.

Compared with the prior art, the invention has the following advantages:

(1) According to the invention, through the three-dimensional vision system, when the three-dimensional pointing angle is obtained, the two-dimensional straight line in each image is directly fitted. Neither the image depth information is required nor the image distortion and other factors are considered. The accuracy of the result is greatly improved.

(2) The invention also analyzes special conditions so that the gesture information does not find the condition of the target object.

Drawings

Fig. 1 is a schematic illustration of a finger profile.

Fig. 2 is a diagram of the spatial relationship of the first camera to the knuckle.

FIG. 3 is a graph of a gesture versus a target in a particular case.

Fig. 4 is a flowchart of a smart home scenario interaction method based on acousto-optic control.

Detailed Description

The present application is further described below with reference to the accompanying drawings. The following examples are only for more clearly illustrating the technical solutions of the present invention and are not intended to limit the scope of protection of the present application.

In connection with the background art, it will be appreciated that an image taken by a camera can generally easily determine the position of an object in planar coordinates at the view angle of the image, but it is difficult to determine the position of the object perpendicular to the image, i.e.: image depth information. Typically, at least more than 2 cameras are required to establish three-dimensional position information of the object. However, the position perpendicular to the image must be inaccurate compared to the position of the planar coordinates. Thus, in practice, three-dimensional positional information of the object, it is defective that there must be one dimension (i.e., a dimension perpendicular to the image).

Based on the defects, the invention provides an acousto-optic control-based intelligent home scene interaction method, which comprises the following steps as shown in fig. 4:

step 1, a user gives a voice instruction and gesture information; responding to a voice instruction of a user, acquiring 3 images containing gesture information according to a three-dimensional vision system, wherein the three images are respectively: the first image, the second image and the third image.

The three-eye stereoscopic vision system consists of 3 cameras. Preferably, the 3 cameras are in regular triangle distribution. The 3 cameras should always keep the angles consistent, namely: when 1 camera rotates, other 2 cameras also synchronously rotate.

And 2, obtaining the world coordinates of the knuckle according to the 3 images.

Step 2 essentially allows the world coordinates of the knuckle to be obtained through a binocular vision system, and thus, in some embodiments, step 2 specifically includes:

and 2.1, respectively finding out two-dimensional coordinates of the knuckle on the first image and the second image.

Note that the meaning of the knuckle referred to herein is: a fixed point on the finger. Thus, the world coordinates of the knuckle represent the coordinates of the finger.

And 2.2, calculating the world coordinates of the knuckle according to the two-dimensional coordinates of the knuckle on the first image and the second image.

wherein ,

the rotation matrix of the first camera and the second camera are respectively +.>

The translation vectors of the first camera and the second camera are respectively; />

Is the two-dimensional coordinates of the knuckle on the first image, is>

Is the two-dimensional coordinates of the knuckle on the second image; />

Representing the actual physical dimensions of a single pixel in the X-axis and Y-axis of the first image, respectively,/->

Representing the actual physical dimensions of the individual pixels in the X-axis and Y-axis of the second image, respectively; />

The focal lengths of the first camera and the second camera are respectively; />

Coordinate values of origin in the pixel coordinate system of the first image, +.>

Coordinate values of an origin point in a pixel coordinate system of the second image; />

World coordinates for the knuckle; />

The rotation matrix and the translation vector of the first camera and the second camera are respectively; />

The z-axis components of the world coordinates of the first camera and the second camera, respectively.

It should be noted that, the rotation matrix may be obtained by calibrating internal parameters and external parameters of the camera, and the translation vector is related to the selection of the origin of the pixel coordinate system and the origin of the world coordinate system, both of which are constants.

However, due to the presence of step 3, the solution of the present invention has to use a three-eye stereoscopic system instead of a binocular vision system. Therefore, in the step 2, the three cameras can be used for comprehensive regulation and control, so that higher precision is ensured. Specific details are not described again.

And step 3, obtaining the three-dimensional pointing angle of the finger according to the 3 images.

The step 3 specifically comprises the following steps:

step 3.1, for each image, acquiring a plurality of points on the finger; and fitting according to the plurality of points to obtain the slope of the straight line corresponding to each image.

In some embodiments, step 3.1 may comprise:

step 3.1.1, extracting a region containing a finger from each image, and sharpening the region to obtain a boundary contour of the finger; the boundary contour is divided into a first contour and a second contour by using fingertips as dividing points.

And 3.1.2, sampling the first contour and the second contour to obtain a plurality of points respectively, and obtaining straight lines after fitting the first contour and the second contour respectively by using a least square method.

As shown in FIG. 1, the first contour may be a contour above the finger, the points at which the first contour is sampled may be M1-M6, and the second contour may be a contour below the finger, and the points at which the second contour is sampled may be N1-N5.

And 3.1.3, calculating the slope of the straight line corresponding to each image according to the straight line after the first contour and the second contour are fitted.

It should be noted here that the straight line equation corresponding to each image is assumed to be obtained by fitting

The slope of the straight line is +.>

. Furthermore, the->

Is not needed to be solved, is the relative displacement, and is completely irrelevant to the three-dimensional pointing angle.

Step 3.1.3 essentially consists in taking the average value. For example, the slopes of the straight line after the first contour fitting and the straight line after the second contour fitting are respectively

Then the slope of its straight line corresponding to each image +.>

The relationship of (2) is as follows:

as can be seen from step 3.1, the invention is precisely that the determination of the image depth information is avoided when the three-dimensional pointing angle of the finger is determined. In addition, each image is independent, and thus it is not necessary to consider factors such as image distortion. Therefore, only two-dimensional information of three images is needed to be considered when the three-dimensional pointing angle is obtained.

And 3.2, calculating the three-dimensional pointing angle of the finger according to the slope of the straight line corresponding to each image.

Specifically, step 3.2 includes:

it will be appreciated that the three-dimensional pointing angle may be defined by a vector

To represent. The purpose of step 3.2 is then to find +.>

These three values.

For ease of illustration, fig. 2 shows a spatial relationship diagram of the first camera and knuckle. In the figure, the direction of the straight line is the three-dimensional pointing angle of the finger

. It will be appreciated that the direction of the dashed line is the direction of the finger on the first image. Namely: the dashed line is perpendicular to the dash-dot line.

The straight line equation formed by the fingers is shown below:

wherein ,

is the world coordinates of the knuckle.

Knuckle and the first

The linear equation for the camera is shown below:

wherein ,

is->

World coordinates of the camera.

It will be appreciated that in FIG. 2, the normal vector of the dot-dash line is

。/>

Note that: all vectors on fig. 2 do not distinguish between directions.

Taking the first camera in FIG. 2 as an example, the normal vector of the plane formed by the straight line and the dot-dash line is calculated

。

Note that, in fig. 2,

is perpendicular to the plane of the paper.

The dashed line is the dashed line and normal vector in FIG. 2

The straight line direction of the constituted plane, then, the expression of the broken line is:

in step 3.1, the slope of the straight line corresponding to each image is obtained, so that a ternary system of equations can be established according to the following 3-degree simultaneous equations to obtain the slope of each straight line

These three values.

wherein ,

is->

The slope of the straight line corresponding to the image.

It will be appreciated that in smart home systems, the world coordinates of each object are already entered into the system. World coordinates of the knuckle

Three-dimensional pointing angle->

When all are determined, the world coordinates of each object are judged in turn>

. If the conditions are satisfied:

then the object indicated by the gesture is determined.

A common special case is given in fig. 3, considering that the object pointed by the gesture cannot be absolutely accurate, that is, the world coordinates of the object cannot fall exactly on the straight line formed by the world coordinates of the knuckle and the three-dimensional pointing angle. In fig. 3, the target object of the voice command is X, and accordingly, the three-dimensional pointing angle expected by the gesture information is L2, but in reality, the true three-dimensional pointing angle of the gesture information is L1.

In FIG. 3, L3 is a straight line from the world coordinates of the eyes of the person to the world coordinates of the knuckle, the straight line corresponding to a three-dimensional pointing angle of

, wherein ,/>

Is the world coordinates of the eye, note: the world coordinates of the eye can be obtained in the same way as in step 2. This special case can be generalized as: the straight line L1, the eyes and the target are all on the same straight line.

Based on this, in fig. 3, the desired three-dimensional pointing angle

Angle of orientation in true three dimensions->

The relationship of (2) should satisfy:

wherein ,

respectively representing the weight value and the weight threshold. It will be appreciated that they should each be proportional to the ratio of the eye to knuckle distance to the knuckle to target distance.

Summarizing the above ideas, therefore, in some embodiments, step 4 specifically comprises:

step 4.1, calculating the expected pointing angle corresponding to each object according to the world coordinates of each object and the world coordinates of the knuckle

；/>

Is the number of the target.

Wherein the desired pointing angle for each target is determined by the following formula:

wherein ,

to find the minimum sign +.>

Is an absolute value sign. It will be appreciated that, without the second equation,

there are numerous solutions. The second expression is therefore aimed at matching by limiting the minimum value

And->

Normalization.

Step 4.2, if there is a desired pointing angle corresponding to a certain target

All satisfy the following formula:

the object is determined to be the only object pointed by the gesture.

Correspondingly, the invention also discloses an acousto-optic control-based intelligent home scene interaction system, which comprises: a three-eye stereoscopic vision system and a central processing unit.

Wherein, at least, include in the central processing unit: the device comprises an image processing module, a voice recognition module, a calculation module and a data storage module.

The image processing module is used for responding to a voice instruction of a user, acquiring 3 images containing gesture information according to the three-dimensional vision system, wherein the three images are respectively: a first image, a second image, and a third image; and combining the calculation module to obtain the world coordinates of the knuckle and the three-dimensional pointing angle of the finger.

The data storage module is used for storing world coordinates of all objects.

The computing module is used for determining the unique target object pointed by the gesture according to the world coordinates of the knuckle, the three-dimensional pointing angle and the world coordinates of all target objects.

While the applicant has described and illustrated the embodiments of the present invention in detail with reference to the drawings, it should be understood by those skilled in the art that the above embodiments are only preferred embodiments of the present invention, and the detailed description is only for the purpose of helping the reader to better understand the spirit of the present invention, and not to limit the scope of the present invention, but any improvements or modifications based on the spirit of the present invention should fall within the scope of the present invention.

Claims

1. The intelligent home scene interaction method based on acousto-optic control is characterized by comprising the following steps of:

step 2, obtaining world coordinates of the knuckle according to the 3 images;

2. The intelligent home scene interaction method based on acousto-optic control according to claim 1, wherein the step 2 specifically comprises:

step 2.1, respectively finding out two-dimensional coordinates of the knuckle on the first image and the second image;

3. The intelligent home scene interaction method based on acousto-optic control according to claim 1, wherein the step 3 specifically comprises:

step 3.1, for each image, acquiring a plurality of points on the finger; fitting according to the plurality of points to obtain the slope of the straight line corresponding to each image;

4. The intelligent home scene interaction method based on acousto-optic control according to claim 3, wherein the step 3.1 specifically comprises:

step 3.1.1, extracting a region containing a finger from each image, and sharpening the region to obtain a boundary contour of the finger; the boundary contour is divided into a first contour and a second contour by taking fingertips as dividing points;

step 3.1.2, sampling the first contour and the second contour to obtain a plurality of points respectively, and obtaining straight lines after fitting the first contour and the second contour respectively by using a least square method;

5. The intelligent home scene interaction method based on acousto-optic control according to claim 3, wherein the step 3.2 specifically comprises:

wherein ,

is->

World coordinates of camera, ">

World coordinate of knuckle, +.>

Is the three-dimensional pointing angle of the finger.

6. The intelligent home scene interaction method based on acousto-optic control according to claim 1, wherein the step 4 specifically comprises:

The following formula is shown:

wherein ,

to find the minimum sign +.>

Is absolute sign, ++>

World coordinates of the knuckle>

Is the world coordinate of the target,/->

For the three-dimensional pointing angle of the finger +.>

Is the number of the target;

All satisfy the following formula:

judging the target object to be the only target object pointed by the gesture; wherein,

is the world coordinates of the eye.

7. An acousto-optic control based smart home scenario interaction system for performing the method of any of claims 1-6, the system comprising: a three-eye stereoscopic vision system and a central processing unit; wherein, the central processing unit comprises: the device comprises an image processing module, a voice recognition module, a calculation module and a data storage module;

the data storage module is used for storing world coordinates of all targets;