WO2023169123A1

WO2023169123A1 - Device control method and apparatus, and electronic device and medium

Info

Publication number: WO2023169123A1
Application number: PCT/CN2023/074997
Authority: WO
Inventors: 徐亮; 徐梓宁; 沈丽娜; 王锐; 武锐; 牛建伟
Original assignee: 深圳地平线机器人科技有限公司
Priority date: 2022-03-11
Filing date: 2023-02-08
Publication date: 2023-09-14
Also published as: CN114613362A

Abstract

Disclosed in the embodiments of the present disclosure are a device control method and apparatus, and an electronic device and a medium. The device control method comprises: in response to receiving a voice control instruction, performing voice recognition on the voice control instruction, so as to obtain a first voice recognition result; on the basis of the first voice recognition result, determining a target device corresponding to the voice control instruction; and in response to detecting a preset dynamic gesture, continuously adjusting the state of the target device on the basis of a continuous action of the dynamic gesture. By means of the embodiments of the present disclosure, the efficiency and convenience of selecting a target device can be improved, and continuous operation control of the target device can be realized, such that the state of the target device is adjusted more flexibly, finely and accurately.

Description

Equipment control methods and devices, electronic equipment and media

This disclosure claims priority to the Chinese patent application with application number 202210242711.4 and the invention title "Equipment Control Method and Device, Electronic Equipment and Medium" submitted on March 11, 2022, the entire content of which is incorporated into this disclosure by reference. middle.

Technical field

The present disclosure relates to artificial intelligence technology, especially an equipment control method and device, electronic equipment and media.

Background technique

Human-computer interaction refers to the information exchange process between humans and machines using a certain dialogue language and a certain interactive method to complete certain tasks. Traditional human-computer interaction is mainly achieved through input and output devices such as keyboards, mice, and monitors. However, with the development of technologies such as artificial intelligence, humans and machines have been able to interact in a manner similar to natural language.

With the popularity of smart vehicles, the number of on-board devices on smart vehicles is gradually increasing, and more and more auxiliary functions can be implemented. For drivers while driving, it is inconvenient and unsafe to manually control vehicle-mounted equipment to implement corresponding functions.

Contents of the invention

In order to solve the above technical problems, the present disclosure is proposed. Embodiments of the present disclosure provide a device control method and device, electronic devices, and media.

According to an aspect of an embodiment of the present disclosure, a device control method is provided, including:

In response to receiving the voice control instruction, perform voice recognition on the voice control instruction to obtain a first voice recognition result;

Based on the first voice recognition result, determine the target device corresponding to the voice control instruction;

In response to detecting the preset dynamic gesture, continuously adjusting the state of the target device based on continued actions of the dynamic gesture.

According to yet another aspect of an embodiment of the present disclosure, an equipment control device is provided, including:

A voice recognition module, configured to perform voice recognition on the voice control instruction in response to receiving the voice control instruction, and obtain a first voice recognition result;

A determination module, configured to determine the target device corresponding to the voice control instruction based on the first voice recognition result obtained by the voice recognition module;

Detection module, used to detect preset dynamic gestures;

An adjustment module, configured to continuously adjust the state of the target device based on the continuous action of the dynamic gesture in response to the detection module detecting the preset dynamic gesture.

According to yet another aspect of an embodiment of the present disclosure, a computer-readable storage medium is provided, the storage medium stores a computer program, and the computer program is used to execute the device control method described in any of the above embodiments of the present disclosure.

According to yet another aspect of an embodiment of the present disclosure, an electronic device is provided, the electronic device including:

processor;

memory for storing instructions executable by the processor;

The processor is configured to read the executable instructions from the memory and execute the instructions to implement the device control method described in any of the above embodiments of the present disclosure.

Based on the device control methods and devices, electronic devices and media provided by the above embodiments of the present disclosure, when receiving a voice control instruction, the first voice recognition result is obtained by performing voice recognition on the voice control instruction, and then based on the first The speech recognition result determines the target device corresponding to the voice control instruction, and when a preset dynamic gesture is detected, the state of the corresponding target device is continuously adjusted based on the continuous action of the dynamic gesture. Therefore, embodiments of the present disclosure can determine the target device that needs to be adjusted based on voice control instructions without manually selecting the target device, which can improve the efficiency and convenience of selecting the target device and effectively avoid the inconvenience problem of manually selecting the target device; In addition, continuous actions based on dynamic gestures continuously adjust the status of the target device, achieving continuous operational control of the target device, making the adjustment of the status of the target device more flexible, precise, and precise, thereby improving the control of the target device. control effect.

Embodiments of the present disclosure can be used to adjust the status of any equipment such as home appliances, vehicle-mounted equipment, and terminal equipment. When the disclosed embodiments are applied to vehicles, they can improve the efficiency, convenience, and safety of selecting and operating vehicle-mounted equipment, and effectively avoid the inconvenience and unsafety of drivers manually operating and controlling vehicle-mounted equipment while driving; In addition, continuous actions based on dynamic gestures realize continuous operation control of vehicle-mounted equipment, making the adjustment of the status of vehicle-mounted equipment more flexible, precise, and precise, thus improving the control effect of vehicle-mounted equipment.

The technical solution of the present disclosure will be described in further detail below through the accompanying drawings and examples.

Description of the drawings

The above and other objects, features and advantages of the present disclosure will become more apparent through a more detailed description of the embodiments of the present disclosure in conjunction with the accompanying drawings. The accompanying drawings are used to provide further understanding of the embodiments of the present disclosure, and constitute a part of the specification. They are used to explain the disclosure together with the embodiments of the present disclosure, and do not constitute a limitation of the disclosure. In the drawings, like reference numbers generally represent like components or steps.

Figure 1 is a system diagram to which the present disclosure is applicable.

Figure 2 is a schematic flowchart of a device control method provided by an exemplary embodiment of the present disclosure.

FIG. 3 is a schematic diagram of drawing circles with one finger in an embodiment of the present disclosure.

FIG. 4 is a schematic flowchart of a device control method provided by another exemplary embodiment of the present disclosure.

Figure 5 is a schematic flowchart of a device control method provided by yet another exemplary embodiment of the present disclosure.

Figure 6 is a schematic flowchart of a device control method provided by yet another exemplary embodiment of the present disclosure.

FIG. 7 is a schematic flowchart of a device control method provided by yet another exemplary embodiment of the present disclosure.

Figure 8 is a schematic flowchart of a device control method provided by yet another exemplary embodiment of the present disclosure.

Figure 9 is a schematic structural diagram of an equipment control device provided by an exemplary embodiment of the present disclosure.

Figure 10 is a schematic structural diagram of an equipment control device provided by another exemplary embodiment of the present disclosure.

FIG. 11 is a schematic structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure.

Detailed ways

Hereinafter, example embodiments according to the present disclosure will be described in detail with reference to the accompanying drawings. Obviously, the described embodiments are only some embodiments of the present disclosure, rather than all embodiments of the present disclosure, and it should be understood that the present disclosure is not limited to the example embodiments described here.

It should be noted that the relative arrangement of components and steps, numerical expressions, and numerical values set forth in these examples do not limit the scope of the disclosure unless otherwise specifically stated.

Those skilled in the art can understand that terms such as "first" and "second" in the embodiments of the present disclosure are only used to distinguish different steps, devices or modules, etc., and do not represent any specific technical meaning, nor do they represent the differences between them. necessary logical sequence.

It should also be understood that in the embodiments of the present disclosure, "plurality" may refer to two or more than two, and "at least one" may refer to one, two, or more than two.

Embodiments of the present disclosure may be applied to electronic devices such as terminal devices, computer systems, servers, etc., which may operate with numerous other general or special purpose computing system environments or configurations. Examples of well-known terminal devices, computing systems, environments and/or configurations suitable for use with terminal devices, computer systems, servers and other electronic devices include, but are not limited to: personal computer systems, server computer systems, thin clients, thick clients Computers, handheld or laptop devices, microprocessor-based systems, set-top boxes, programmable consumer electronics, networked personal computers, small computer systems, mainframe computer systems and distributed cloud computing technology environments including any of the above systems, etc.

Application Overview

Artificial Intelligence (AI) enables machines to perform complex tasks that usually require human intelligence. In order to execute human instructions, efficient and accurate human-computer interaction is necessary. In recent years, with the continuous development of AI technology, the application of speech recognition technology in vehicle-mounted equipment has attracted more and more attention in the industry.

In order to avoid the inconvenience and unsafety problems caused by the driver's manual operation to control the vehicle-mounted equipment during driving, in related technologies, the vehicle-mounted equipment is controlled through voice command operation.

However, the inventor found through research that the method of controlling vehicle-mounted equipment through voice command operations cannot achieve continuous operation and control of vehicle-mounted equipment, and the control effect on vehicle-mounted equipment is poor. For example, when the vehicle window is opened through the voice command "open window", the opening range of the vehicle window can only be controlled according to the default settings, and the opening range of the vehicle window cannot be precisely controlled. If the opening range does not reach the user's expected level, it will take multiple times. The voice command "open window" controls multiple times to increase the opening range of the vehicle window, which is inefficient; and if the opening range of the vehicle window exceeds the user's expectation, the opening range of the vehicle window cannot be accurately reduced, thus failing to meet user needs. .

In view of this, embodiments of the present disclosure propose an equipment control method and device, electronic equipment, and media to improve the efficiency, convenience, and safety of selecting and operating and controlling vehicle-mounted equipment, while achieving continuous operation control of target equipment.

The embodiments of the present disclosure determine the target device that needs to be adjusted through voice control instructions, and continuously adjust the status of the target device through the continuous action of dynamic gestures. This eliminates the need to manually select the target device, improves the efficiency and convenience of selecting the target device, and effectively avoids The inconvenience problem of manually selecting the target device is eliminated, and continuous operation control of the target device is realized, making the adjustment of the status of the target device more flexible, precise and precise, thus improving the control effect of the target device.

Embodiments of the present disclosure can be used to adjust the status of any equipment such as home appliances, vehicle-mounted equipment, and terminal equipment. When the embodiments of the present disclosure are applied to vehicles, the efficiency, convenience and safety of selecting and operating vehicle-mounted equipment can be improved, and the driver's manual operation can be effectively avoided during driving. The inconvenience and unsafety of operating and controlling vehicle-mounted equipment exist; moreover, continuous actions based on dynamic gestures realize continuous operation control of vehicle-mounted equipment, making the adjustment of the status of vehicle-mounted equipment more flexible, precise, and precise, thus Improved the control effect of vehicle-mounted equipment.

Example system

Figure 1 is a system diagram to which the present disclosure is applicable. As shown in FIG. 1 , the voice control command is collected through the audio collection module 102 (such as a microphone, etc.). The voice control command or the voice control command is input into the equipment control device 104 of the embodiment of the present disclosure after being processed by the front-end signal. The device control device 104 performs voice recognition on the received voice control instruction, and after obtaining the first voice recognition result, determines the target device 106 corresponding to the voice control instruction based on the voice recognition result, and calls the image acquisition module 108 (such as a camera, etc.) to collect video stream, and perform preset dynamic gesture detection on the video stream collected by the image acquisition module 108. When the preset dynamic gesture is detected, the state of the target device 106 is continuously adjusted based on the continuous action of the dynamic gesture.

The embodiments of the present disclosure can be used to adjust the status of any device such as home appliances, vehicle-mounted equipment, terminal equipment, etc. That is, the target device 106 can be any device such as home appliances, vehicle-mounted equipment, terminal equipment, etc. When the above-mentioned target device 106 is a vehicle-mounted device, the embodiment of the present disclosure targets various interaction scenarios in the cockpit, performs human-computer interaction based on a mixture of voice and dynamic gestures, and obtains control rights of the device to be controlled by performing speech recognition on the voice control instructions, and then Dynamic gestures are used to perform various possible continuous operations and controls on the equipment to be controlled. In the process of continuous operation and control of the equipment to be controlled, the adjustment speed of the equipment to be controlled can also be controlled through the movement speed of the dynamic gestures, which can improve selection and The efficiency, convenience and safety of operating and controlling vehicle-mounted equipment can effectively avoid the inconvenience and unsafety problems caused by drivers manually operating and controlling vehicle-mounted equipment while driving; and, continuous actions based on dynamic gestures realize the control of vehicle-mounted equipment. The continuous operation control makes the adjustment of the status of the vehicle-mounted equipment more flexible, precise and precise, thus improving the control effect of the vehicle-mounted equipment. The disclosed embodiment fully utilizes the excellent permission interface capability of voice control and the fine adjustment capability of dynamic gestures, and has the characteristics of simple operation, good robustness, fine adjustment, high interaction efficiency, and wide range of functions.

Example methods

Figure 2 is a schematic flowchart of a device control method provided by an exemplary embodiment of the present disclosure. This embodiment can be applied to electronic devices. As shown in Figure 2, the device control method of this embodiment includes the following steps:

Step 202: In response to receiving the voice control instruction, perform voice recognition on the voice control instruction to obtain a first voice recognition result.

The voice control instructions in the embodiments of the present disclosure are the original voice control instructions directly collected through the audio collection module (such as a microphone, etc.), or the voice control instructions obtained by performing front-end signal processing on the original voice control instructions collected by the audio collection module. , the embodiment of the present disclosure does not limit this.

Among them, front-end signal processing can include, for example, but is not limited to: Voice Activity Detection (VAD), noise reduction, Acoustic Echo Cancellation (AEC), dereverberation processing, device control, and beam forming (Beam Forming). , BF) etc.

Voice activity detection, also known as voice endpoint detection and voice boundary detection, refers to detecting the presence or absence of voice in audio signals in a noisy environment and accurately detecting the starting position of the voice segment in the audio signal. It is usually used for voice coding and voice enhancement. In other speech processing systems, it plays a role in reducing speech Audio coding rate, communication bandwidth saving, mobile device energy consumption reduction, recognition rate improvement, etc. The starting point of VAD is from silence to speech, the end point of VAD is from speech to silence, and the determination of the end point of VAD requires a period of silence. The speech obtained by the front-end signal processing of the original audio signal includes the speech from the starting point to the end point of the VAD. Therefore, the speech control instructions in the embodiments of the present disclosure may also include a period of silence after the speech segment.

Step 204: Based on the first voice recognition result, determine the target device corresponding to the voice control instruction.

The target device corresponding to the voice control instruction is the device whose status needs to be adjusted. The target device can be any device such as home appliances, vehicle-mounted equipment, terminal equipment, etc. The vehicle-mounted equipment is the equipment on the vehicle. For example, it can include but is not limited to the following equipment on the vehicle: left rearview mirror, right rearview mirror, vehicle interior Rearview mirrors, windows, air conditioners, seats, stereos, lights, etc. The embodiments of the present disclosure do not limit the scope of the target device and the specific scope of the vehicle-mounted device.

Step 206: In response to detecting the preset dynamic gesture, continuously adjust the state of the target device based on the continuous action of the dynamic gesture.

Through this step 206, the user can continuously adjust the state of the target device by continuously making dynamic gestures until the state of the target device reaches the state effect expected by the user, for example, the vehicle window is lowered to the height expected by the user, and the dynamic gesture is stopped. Action to stop adjusting the status of the target device.

Based on this embodiment, the target device that needs to be adjusted can be determined based on the voice control instruction without manually selecting the target device, which can improve the efficiency and convenience of selecting the target device and effectively avoid the inconvenience problem of manually selecting the target device; in addition, Continuous actions based on dynamic gestures continuously adjust the status of the target device, achieving continuous operational control of the target device, making the adjustment of the status of the target device more flexible, fine, and precise, thus improving the control of the target device. Effect.

The preset dynamic gestures in the embodiments of the present disclosure can be designed with the following characteristics in mind: (1) consistent with natural habits and easy to make to improve the convenience of movements; (2) dynamic gestures, compared to static hand movements in a single frame image , good robustness; (3) Different from daily habitual actions, the probability of false positives for other actions is low; (4) It has different movement directions and can be reused.

Based on the above characteristics, in some of the implementations, the above-mentioned preset dynamic gestures can be, for example: drawing circles, that is, drawing circles in the air, which can include but is not limited to drawing circles with the left hand, drawing circles with the right hand, drawing circles with both hands, any one Draw circles with one or more fingers, make a fist to draw circles, bend your fingers to draw circles, etc. As shown in Figure 3, it is a schematic diagram of drawing circles with one finger. The preset dynamic gestures in the embodiments of the present disclosure are not limited to this, and can be dynamic gestures with any of the above characteristics.

The preset dynamic gestures in this embodiment can meet the above characteristics at the same time. They are highly robust, few but precise, in line with natural habits, have high recognition accuracy, and are easy to reuse. This can improve recognition stability and accuracy and help For continuous operational control of equipment.

FIG. 4 is a schematic flowchart of a device control method provided by another exemplary embodiment of the present disclosure. As shown in Figure 4, based on the above embodiment shown in Figure 2, the device control method of this embodiment may also include the following steps:

Step 205: Determine the target dimension parameters to be adjusted of the target device.

The target dimension parameter is the dimension parameter that needs to adjust the status of the target device. For example, when the target device is a window on a vehicle, the target dimension parameter can be the lifting dimension of the window; when the target device is a seat on the vehicle, the target dimension parameter can be the front and rear dimensions, height dimensions, and backrest tilt angle dimensions of the seat. etc.; when the target device is a light on a vehicle, the target dimension parameter can be the brightness dimension, color dimension, etc. of the light; when the target device is the left rearview mirror or right rearview mirror on the vehicle, the target dimension parameter can be the left rearview mirror. mirror, right rearview mirror Elevation angle dimension, yaw angle dimension. For another example, when the target device is a home appliance such as a television, the target dimension parameter may be the channel dimension, volume dimension, brightness dimension, etc. of the TV. In the embodiment of the present disclosure, the target dimension parameter to be adjusted of the target device may be any adjustable dimension parameter of the target device. The embodiment of the present disclosure does not place a limit on the adjustable dimension parameter.

Correspondingly, in this embodiment, step 206 may include:

Step 2062: In response to detecting the preset dynamic gesture, determine the movement direction of the dynamic gesture.

Step 2064: Based on the movement direction of the dynamic gesture, determine the target adjustment direction of the target device on the target dimension parameter.

Optionally, in some implementations, the corresponding relationship between the movement direction of the dynamic gesture and the device, the dimensional parameters of the device, and the adjustment direction can be preset. After determining the movement direction of the dynamic gesture, the corresponding relationship can be queried to obtain the target adjustment direction based on the movement direction of the dynamic gesture, the target device, and the target dimension parameters. Table 1 below is a partial example of the correspondence between the circle direction and the device, the device's dimensional parameters, and the adjustment direction when the dynamic gesture is a circle in an embodiment of the present disclosure. In the disclosed embodiment, the specific content of the correspondence between the movement direction of the dynamic gesture and the device, the dimensional parameters of the device, and the adjustment direction is limited.

Table 1

Step 2066: Based on the continuous action of the dynamic gesture in the movement direction of the dynamic gesture, continuously adjust the target dimension parameters of the target device in the target adjustment direction.

Based on this embodiment, after determining the target dimension parameters of the target device to be adjusted, the target adjustment direction of the target device on the target dimension parameters can be determined through the movement direction of the dynamic gesture, thereby determining the target dimension of the target device to be adjusted. parameters and the target adjustment direction. Furthermore, based on the continuous action of the dynamic gesture in the direction of movement, the continuous adjustment of the target device in the target dimension parameters toward the target adjustment direction can be achieved, thereby achieving the target device in the target dimension parameters. Continuous operational control in the direction of target adjustment.

The target device in the embodiment of the present disclosure may be a device whose status is determined based on one dimension parameter. That is, the status of the device is determined based on one dimension parameter. The device has only one adjustable dimension parameter, and each parameter value on the dimension parameter corresponds to A state of the device. For example, a window on a vehicle is a device whose state is determined based on one dimensional parameter of the lifting dimension. Different height values of the window in the lifting dimension correspond to a state of the window.

Alternatively, the target device in the embodiment of the present disclosure may also be a target device whose status is determined based on multiple dimensional parameters, that is, the status of the device is jointly determined based on the multiple dimensional parameters, and the device has multiple adjustable dimensional parameters. A set of parameter values on multiple dimensional parameters respectively corresponds to a state of the device. When the parameter value on any one of the multiple dimensional parameters changes, the state of the device changes. For example, the left rearview mirror and the right rearview mirror on the vehicle are devices that jointly determine the state based on the pitch angle dimension and the yaw angle dimension. Each group (the angle value in the pitch angle dimension, the yaw angle dimension) The angle value in the angle dimension) corresponds to a state of the left rearview mirror and the right rearview mirror respectively. When the angle values of any or all parameters in the pitch angle dimension and yaw angle dimension change, the left rearview mirror , the status of the right rearview mirror has also changed.

In some implementations, in the embodiments of the present disclosure, when the status of the target device is determined based on one dimension parameter, in step 205, the one dimension parameter of the target device can be directly determined as the target dimension parameter.

Based on this embodiment, when the status of the target device is determined based on one dimension parameter, and the target device has only one dimension parameter that can be adjusted, then the one dimension parameter of the target device can be directly determined as the target dimension parameter without the user having to specify the parameter that needs to be adjusted. The target dimension parameters help to improve the efficiency of determining the target dimension parameters, thereby improving the control efficiency of the target device.

In some implementations, in the embodiments of the present disclosure, when the status of the target device is determined based on multiple dimensional parameters, in step 205, the target dimensional parameters may be determined based on the first speech recognition result.

In this embodiment, the user can directly carry relevant information about the target dimensional parameters that need to be adjusted through voice control instructions. For example, the voice control instructions can be the voice "I want to adjust the front and rear of the main driver's seat", "I want to adjust the main driver's seat." Chair, adjust forward and backward", "I want to adjust the driver's seat forward", "I want to adjust the pitch of the left rearview mirror", etc., the embodiments of the present disclosure have a content form that carries relevant information about the target dimension parameters in the voice control instructions. There are no restrictions on the format. Then, the first speech recognition result in text form obtained by performing speech recognition on the speech control instruction includes relevant information of the target dimension parameters, and the target dimension parameters can be determined based on the relevant information of the target dimension parameters.

For example, in a specific implementation, the dimensional parameters of each device can be preset, and after voice recognition is performed on the voice control instruction to obtain the first voice recognition result, the correlation of the target dimensional parameters in the first voice recognition result is determined for the target device. The information-related or closest dimension parameter is used as the target dimension parameter to be adjusted. For example, for the target device of the main driver's seat, which has three dimensions: front and rear dimensions, high and low dimensions, and backrest tilt angle dimension, the target dimension is based on the first voice recognition result "I want to adjust the front and rear of the main driver's seat." The relevant information of the parameter "front and rear" is based on the first voice recognition result "I want to adjust the driver's seat, front and rear". The relevant information of the target dimension parameter "front and rear adjustment" is based on the first voice recognition result "I want to adjust it forward." The relevant information of the target dimensional parameter in "Main Driver's Seat" is "Adjust Forward", and the relevant information of the target dimensional parameter "Front and Back", "Front and Back Adjustment", and "Adjust Forward" can be determined. The related or closest dimensional parameters are front and rear. Dimension, as the target dimension parameter to be adjusted for the main driver's seat.

In addition, in a specific implementation, a preset determination method may be used to determine, for the target device, the dimensional parameter associated with or closest to the relevant information of the target dimensional parameter in the first speech recognition result. For example, you can determine the dimension parameter name of the target device, with the first speech recognition The dimension parameter with the most identical characters among the related information of the target dimension parameter in the result is the associated or closest dimension parameter. For another example, an information list can be preset, and the information list includes relevant information that may correspond to each dimension parameter of each device. Then based on the relevant information of the target dimension parameter in the first speech recognition result, the information list can be queried for the target device, Get the matching dimension parameter as the associated or closest dimension parameter. In addition, the embodiments of the present disclosure may also use other methods to determine the dimensional parameters associated with or the closest dimensional parameters to the relevant information of the target dimensional parameters in the first speech recognition result, and the embodiments of the present disclosure do not limit this.

Based on this embodiment, when the status of the target device is determined based on multiple dimensional parameters, the user can directly specify the target dimensional parameters that need to be adjusted through voice control instructions without having to separately specify the target dimensional parameters that need to be adjusted, which helps to improve the target The efficiency of determining dimensional parameters is improved, thereby improving the control efficiency of the target device.

In other implementations, when the status of the target device is determined based on multiple dimensional parameters, in step 205, in response to receiving the dimensional parameter voice command, voice recognition can be performed on the dimensional parameter voice command to obtain the second The speech recognition result is then determined based on the second speech recognition result, and the target dimension parameters are determined.

In this embodiment, the user can directly send the dimensional parameter voice command after sending the voice control command. For example, the user can directly send the dimensional parameter voice command "Adjust forward and backward" after sending the voice control command "I want to adjust the driver's seat." ". Alternatively, the device that implements the embodiments of the present disclosure may also output the dimension parameter inquiry voice after receiving the voice control instruction sent by the user, and receive the dimension parameter voice instruction sent by the user for the dimension parameter inquiry voice. For example, the user sends a voice The control command "I want to adjust the driver's seat". After receiving the voice control command "I want to adjust the driver's seat", the device implementing the embodiment of the present disclosure outputs the dimensional parameter inquiry voice "Okay, what do you want? How to adjust?", and receive the dimension parameter voice command "adjust before and after" sent by the user inquiring about the dimension parameter. Then perform voice recognition on the dimensional parameter voice command, and after obtaining the second voice recognition result, the target dimensional parameter can be determined based on the second voice recognition result.

In a specific implementation, the dimension parameters of each device can be preset, and the second speech recognition result obtained by speech recognition of the dimension parameter voice command is determined for the target device, the dimension associated with the second speech recognition result or the closest dimension parameter as the target dimension parameter to be adjusted.

In a specific implementation, a preset determination method may be used to determine the dimensional parameters associated with the second speech recognition result or the closest dimensional parameters for the target device. For a specific determination method, reference can be made to the implementation method of determining the dimensional parameter that is associated with the relevant information of the target dimensional parameter in the first speech recognition result or the closest dimensional parameter in the first speech recognition result, and will not be described again here.

Based on this embodiment, when the status of the target device is determined based on multiple dimensional parameters, the user can specify the dimensional parameter that needs to be adjusted through a separate dimensional parameter voice command, thereby determining the target dimensional parameter that needs to be adjusted by the target device.

In some implementations, when the status of the target device is determined based on multiple dimensional parameters, in step 205, hand morphology information corresponding to the dynamic gesture can also be obtained, and then the target dimensions are determined based on the hand morphology information. parameter.

The hand shape information may include, for example, but is not limited to any of the following: finger extension form, number of fingers, single-hand information, etc. The finger extension form may be, for example, straightening, bending, etc.; the number of fingers may be, for example, one, two, etc.; the single-hand information may be, for example, the left hand, the right hand, or both hands, etc.

Specifically, the correspondence between the hand shape information, the device, and the dimensional parameters of the device can be preset in step After the hand morphology information corresponding to the dynamic gesture is obtained in step 205, based on the target device and the obtained hand morphology information, the dimensional parameters corresponding to the target device and the obtained hand morphology information are obtained from the above correspondence relationship, as Target dimension parameters.

As shown in Table 2 below, in the embodiment of the present disclosure, when the hand morphology information is the number of fingers and the device is a vehicle-mounted device, the correspondence between the hand morphology information, the vehicle-mounted device, and the dimensional parameters of the vehicle-mounted device An example of partial content.

Table 2

As shown in Table 3 below, in the embodiment of the present disclosure, when the hand morphology information is single-hand information and the device is a vehicle-mounted device, the correspondence between the hand morphology information, the vehicle-mounted device, and the dimensional parameters of the vehicle-mounted device An example of partial content.

table 3

The above Table 2 and Table 3 only exemplarily show part of the correspondence between the hand morphology information and the device, as well as the dimensional parameters of the device. The hand morphology information is other hands other than Table 2 and Table 3. If the form information and device are other devices than Table 2 and Table 3 (such as other vehicle-mounted devices, home appliances, terminal devices, etc.), refer to Table 2 and Table 3 for the content structure, and the embodiments of this disclosure will not be repeated.

Based on this embodiment, when the status of the target device is determined based on multiple dimensional parameters, the target dimensional parameters that need to be adjusted for the target device can be determined through the user's hand shape information.

Figure 5 is a schematic flowchart of a device control method provided by yet another exemplary embodiment of the present disclosure. As shown in Figure 5, based on the above embodiment shown in Figure 4, step 2066 may include the following steps:

Step 20662: During the continuous action of the dynamic gesture, obtain the movement speed of the dynamic gesture in real time or according to a preset adjustment period.

In order to achieve real-time and dynamic adjustment effects on the status of the target device, the value of the preset adjustment period can be set to a smaller value, such as 0.01s. The embodiments of the present disclosure can be preset according to the specific controlled equipment and the adjustment effect. and can be updated as needed.

Step 20664: Based on the movement speed of the dynamic gesture, determine the target adjustment speed of the target device on the target dimension parameter.

Step 20666: Adjust the target device in the target dimension parameter at the target adjustment speed in the target adjustment direction.

Among them, in step 20666, the target device can be adjusted in the target adjustment direction at the target adjustment speed within the state limit range of the target dimension parameter. When the target device reaches the state limit range boundary on the target dimension parameter, For example, when the windows on the vehicle are lowered to the lowest level or raised to the highest level, the target device will no longer be adjusted in the target adjustment direction on the target dimension parameters to avoid damaging the target device.

Based on this embodiment, the target adjustment speed of the target device on the target dimension parameter can be determined based on the movement speed of the dynamic gesture, and the target device can be adjusted in the target adjustment direction at the target adjustment speed. In this way, the faster the movement speed of the dynamic gesture , the faster the device adjustment speed, conversely, the slower the dynamic gesture movement speed, the slower the device adjustment speed, thus realizing dynamic control of the device adjustment speed based on the movement speed of the dynamic gesture, and realizing visual control of the device adjustment speed. Improves the adjustment efficiency of the target device and the user's operating experience.

Optionally, in some implementations, the adjustment speed configuration information of the target device on the target dimension parameters can be obtained. The adjustment speed configuration information is used to determine the gesture movement speed and device adjustment speed on each dimension parameter of the target device. For example, for windows on a vehicle, in the lifting dimension, the relationship between the gesture movement speed and the window lifting speed can linearly correspond to the gesture movement speed and the device adjustment speed. Accordingly, in step 20664, based on the obtained adjustment speed configuration information, it may be determined that the device adjustment speed corresponding to the motion speed of the dynamic gesture obtained in step 20662 on the target dimension parameter is the target adjustment speed.

Based on this embodiment, the target adjustment speed of the target device on the target dimension parameters can be determined objectively and accurately based on the movement speed of the dynamic gesture according to the preset adjustment speed configuration information, so as to achieve accurate control of the adjustment speed of the target device state. .

In some specific implementations, the adjustment speed configuration information of the target device on the target dimension parameter can be obtained from the first speech recognition result.

In this embodiment, the user can directly carry the speed adjustment configuration information through voice control instructions. For example, the voice control instruction can be "I want to adjust the main driver's window. Turn it three times to raise the entire window." This includes the speed adjustment configuration. The message "Three turns can raise the entire window." The embodiment of the present disclosure does not limit the content form and format of the speed adjustment configuration information carried in the voice control command. Then, after performing voice recognition on the voice control instruction to obtain the first voice recognition result, the target device's parameters in the target dimension can be obtained from the first voice recognition result. Adjust the speed configuration information to determine the relationship between the gesture movement speed and the device adjustment speed on the target dimension parameters of the target device.

Based on this embodiment, the user can directly set the adjustment speed configuration information of the target device on the target dimension parameters through voice control instructions during the process of controlling the device, thereby realizing real-time and dynamic configuration of the adjustment speed configuration information in specific scenarios. Achieve personalized configuration of device adjustment effects.

Or, in other specific implementations, the following method can also be used to obtain the adjustment speed configuration information of the target device on the target dimension parameters: in response to receiving the adjustment speed configuration voice instruction, perform voice recognition on the adjustment speed configuration voice instruction, Obtain the third speech recognition result, and then obtain the adjustment speed configuration information of the target device on the target dimension parameter from the third speech recognition result, thereby determining the ratio between the gesture movement speed and the device adjustment speed on the target dimension parameter of the target device. relationship between. The voice command for adjusting the speed configuration may be a voice command for adjusting the speed and configuration actively sent by the user. For example, the user actively sends the voice command for adjusting the speed configuration "Turn forward" after sending the voice control command "I want to move the driver's seat forward". "Three turns can raise the entire window"; or, the user can configure the voice instruction according to the adjustment speed prompt voice sent by the device for implementing the embodiment of the present disclosure. For example, the user sends the voice control instruction "I want to After adjusting the driver's seat forward", according to the adjustment speed prompt voice "Okay, what speed do you want to adjust at" output by the device used to implement the embodiment of the present disclosure, the adjustment speed configuration voice command "Turn three turns" is sent. The entire window can be raised." The embodiment of the present disclosure does not limit the manner and specific content of the user's sending voice instructions for speed adjustment and configuration.

Based on this embodiment, the user can set the adjustment speed configuration information of the target device on the target dimension parameters through a separate instruction during the process of controlling the device, thereby realizing real-time and dynamic configuration of the adjustment speed configuration information in specific scenarios. , to achieve personalized configuration of the equipment adjustment effect.

Or, in some specific implementations, the adjustment speed configuration information of the target device on the target dimension parameter can also be obtained from the preconfigured adjustment speed configuration information, thereby determining the gesture movement speed and the target dimension parameter on the target device. The relationship between device regulation speed.

The preconfigured adjustment speed configuration information may be preconfigured by the user. Taking vehicle-mounted equipment as an example, users can use the speed adjustment configuration page provided by the vehicle's central control system, for example, through the configuration options for each vehicle-mounted equipment in the speed adjustment configuration page, or through the speed adjustment configuration page for human-machine voice interaction. method to set or update the adjustment speed configuration information of each vehicle-mounted device. Alternatively, the user can also access the adjustment speed configuration permission provided by the central control system through human-computer voice interaction, and set the adjustment speed configuration information of each vehicle-mounted device through human-computer voice interaction. For other devices (such as home appliances, terminal equipment, etc.), the adjustment speed configuration information of each device can be set or updated through the adjustment speed configuration page provided by the control device that controls these devices in a similar manner to vehicle-mounted equipment. .

When the user does not pre-configure the adjustment speed configuration information, the factory-preset information of the central control system (for vehicle-mounted equipment), control equipment (for home appliances, terminal equipment and other equipment) can be obtained as the pre-configured adjustment speed configuration information .

Based on this embodiment, when the user does not set the adjustment speed configuration information for the current scene, the adjustment speed configuration information of the target device on the target dimension parameter can be obtained from the preconfigured adjustment speed configuration information to determine the adjustment speed configuration information for the current scene. The target throttling speed for the target device.

For example, in a specific application, the adjustment speed configuration information can be pre-configured in the following way:

Through the setting interface, such as the interface on the adjustment speed configuration page provided by the central control system (for vehicle-mounted equipment), control equipment (for home appliances, terminal equipment and other equipment), receive the adjustment speed configuration request sent by the user. The adjustment speed configuration The request includes information about the device identification (ID), dimension parameter ID, gesture movement range (for example, one circle), and device adjustment range (for example, 0.5cm). The device ID is used to uniquely identify a device, and the dimension parameter ID is used to uniquely identify a device. Dimension parameters;

Based on the gesture motion amplitude and device adjustment amplitude information in the adjustment speed configuration request, determine the relationship between the gesture motion speed and the device adjustment speed;

Based on the relationship between the device ID, dimension parameter ID, gesture movement speed and device adjustment speed in the adjustment speed configuration request, configure the adjustment speed configuration information of the device identified by the device ID on the dimension parameter identified by the dimension parameter ID. ; Or, based on the relationship between the device ID, dimension parameter ID, gesture movement speed and device adjustment speed in the adjustment speed configuration request, update the adjustment speed corresponding to the device ID and the dimension parameter ID in the preconfigured adjustment speed configuration information. Configuration information.

Based on this embodiment, the configuration or update of the adjustment speed configuration information of the device on the dimensional parameters is implemented.

In addition, in the above embodiment, during the execution of step 206 or 2066, in response to receiving the speed adjustment voice instruction, voice recognition is performed on the speed adjustment voice instruction, and a fourth voice recognition result is obtained, and the fourth voice recognition result is obtained from the speed adjustment voice instruction. The adjustment speed update configuration information is obtained from the fourth speech recognition result. The adjustment speed update configuration information is used to represent the relationship between the updated gesture movement speed and the device adjustment speed on the various dimensional parameters of the target device; then, in the dynamic During the subsequent continuous action of the gesture, the movement speed of the dynamic gesture is obtained in real time or according to the preset adjustment cycle, and the configuration information is updated based on the above adjustment speed to determine the updated device adjustment corresponding to the movement speed of the dynamic gesture on the target dimension parameters. speed, and then adjust the target device in the target adjustment direction at the updated adjustment speed on the target dimension parameters.

During the process of continuously adjusting the target device, the user may find that the adjustment speed of the target device is too fast or too slow. Based on this embodiment, the user can send adjustments according to the adjustment effect requirements during the process of adjusting the target device. The speed update voice command is used to update the adjustment speed configuration information, thereby realizing real-time update of the adjustment speed of the target device, further improving the adjustment efficiency and effect of the target device and the user's operating experience.

In addition, in the above embodiments of the present disclosure, the step of presetting dynamic gesture detection may also be included.

Figure 6 is a schematic flowchart of a device control method provided by yet another exemplary embodiment of the present disclosure. As shown in Figure 6, in some implementations, preset dynamic gestures can be detected in the following ways:

Step 302: Determine the position of the sound source object that sends the voice control instruction.

For example, the position of the sound source object that sends the voice control instruction can be determined through the sound zone positioning method.

Step 304: Based on the position of the sound source object, obtain an image sequence including the hand of the sound source object.

Wherein, the image sequence includes multiple frames of images with a temporal relationship.

After determining the position of the sound source object, you can call the image acquisition module (such as a camera, etc.) to collect images of the sound source object, and perform hand detection and tracking on the collected images to obtain the hand including the sound source object. A video stream, from which multiple frames of images with a temporal relationship are selected in a preset manner (such as continuous selection or every other frame selection, etc.), as an image sequence of the hand of the sound source object, or further from the selected multiple frames. Images of the same size containing the hand are intercepted from the frame images to obtain the image of the hand of the sound source object. Like sequence.

Compared with the image sequence of the sound source object, the image sequence of the hand is intercepted from the selected multi-frame images. Since the image contains less background information and less interference, the accuracy of the gesture detection results can be improved.

In a specific implementation, a first neural network, such as a convolutional neural network (CNN), can be used to detect and track the hand on the collected images to obtain a video stream including the hand of the sound source object. . The first neural network can be obtained by pre-training a neural network model using sample images including hands.

Step 306: Perform hand key point detection on each frame image in the image sequence in sequence to obtain a hand key point sequence.

Wherein, the hand key point sequence is formed based on the time series relationship of the hand key points in each frame image.

In a specific implementation, a second neural network, such as CNN, can be used to detect hand key points on each frame image to obtain hand key points. The second neural network can be obtained by pre-training the neural network model using sample images marked with hand key point information.

Step 308: Perform preset dynamic gesture detection based on the hand key point sequence.

In a specific implementation, the hand key point sequence can be input into a third neural network, such as CNN, and the preset gesture detection result of whether the dynamic gesture is preset is output through the third neural network. The third neural network can be trained in advance using sample videos of preset dynamic gestures.

Based on this embodiment, by acquiring an image sequence including the hand of the sound source object, the detection of the preset dynamic gesture is implemented based on visual technology, so that when the preset dynamic gesture is detected, the adjustment of the state of the target device is triggered. .

Correspondingly, based on the embodiment shown in FIG. 6 , the movement direction of the dynamic gesture can be determined based on the hand key point sequence obtained in step 306 . For example, the movement direction of the dynamic gesture can be determined based on the direction corresponding to the trajectory of the hand key point sequence.

Based on this embodiment, through the hand key point sequence corresponding to the image sequence, the dynamic gesture movement direction is determined based on visual technology.

In addition, it can be based on the hand key points in the last frame image and the hand key points in the previous frame image in the image sequence obtained in step 304, as well as the collection time of the last frame image and the time of the previous frame image. At the collection moment, the movement speed of the dynamic gesture is obtained. The previous frame of image may be any frame of image located before the last frame of image in the image sequence. For example, it may be the previous frame of image adjacent to the last frame of image, or it may be the image of the last frame of image. The embodiment of the present disclosure does not limit the images separated by several frames.

For example, it can be based on the distance between the hand key points in the last frame of the image and the hand key points in the previous frame of the image, as well as the acquisition time of the last frame of image and the acquisition of the previous frame of image. The time between moments is used to calculate the movement speed of dynamic gestures. Among them, the distance between the hand key points in the last frame image and the hand key points in the previous frame image can be the distance between the corresponding hand key points in the last frame image and the previous frame image. The average value may also be the distance between the preset hand key points (such as fingertip key points) in the last frame image and the previous frame image, etc. This embodiment of the present disclosure does not limit this.

In a specific implementation, the hand key point sequence can be input into the above-mentioned third neural network, and the movement direction and movement amplitude (such as the circle angle) of the dynamic gesture corresponding to the hand key point sequence are output through the third neural network, and then based on The movement amplitude and the time corresponding to the image sequence can be used to calculate the movement speed of the dynamic gesture. Alternatively, the image sequence carrying the collection time information and labeling the hand key points can also be input into the above-mentioned third neural network, and the movement direction and movement speed of the dynamic gesture corresponding to the hand key point sequence are output through the third neural network. etc. The embodiments of the present disclosure do not limit this.

Based on this embodiment, the movement speed of the dynamic gesture can be accurately determined through the key points of the hand corresponding to the two frames of images in the image sequence and the image collection time.

FIG. 7 is a schematic flowchart of a device control method provided by yet another exemplary embodiment of the present disclosure. As shown in Figure 7, in other implementations, preset dynamic gestures can also be detected in the following ways:

Step 402: Determine the position of the sound source object that sends the voice control instruction.

Step 404: Based on the position of the sound source object, use an optical Time of Flight (ToF) sensor to measure the distance information between each point on the hand of the sound source object and the ToF sensor to obtain a set of distance information.

After determining the position of the sound source object, the ToF sensor can be used to measure the distance between the hand points of the sound source object and the ToF sensor. A set of distance information obtained at each measurement moment includes the distance of the sound source object at that measurement moment. Distance information between each point of the hand and the ToF sensor.

Step 406: Obtain a distance information sequence based on multiple sets of distance information with time series relationships.

Step 408: Perform preset dynamic gesture detection based on the distance information sequence.

Optionally, in some implementations, based on the distance information sequence, the change over time of the distance between the hand points of the sound source object and the ToF sensor can be learned, so that the distance changes can be based on whether the distance changes meet the preset The distance change pattern corresponding to the dynamic gesture determines whether the hand of the sound source object makes a preset dynamic gesture.

Or, in other implementations, three-dimensional (3D) modeling can be performed based on each set of distance information in the distance information sequence to obtain the corresponding hand posture, and the sound can be determined from the hand posture corresponding to the distance information sequence. Whether the source object's hands make preset dynamic gestures.

Based on this embodiment, the ToF sensor is used to realize the detection of the preset dynamic gesture, so that when the preset dynamic gesture is detected, the adjustment of the state of the target device is triggered.

Correspondingly, based on the embodiment shown in FIG. 7 , the movement direction of the dynamic gesture can be determined based on the distance information sequence obtained in step 406 . For example, the movement direction of the dynamic gesture corresponding to the distance information sequence obtained in step 406 can be determined based on the change pattern of the distance corresponding to the preset dynamic gesture in different movement directions.

Based on this embodiment, the ToF sensor detects the change in distance from each point on the hand of the sound source object, thereby realizing the determination of the direction of dynamic gesture movement.

In addition, dynamic gestures can be obtained based on the last set of distance information and the previous set of distance information in the distance information sequence obtained in step 406, and the measurement time corresponding to the last set of distance information and the measurement time corresponding to the previous set of distance information. movement speed. The previous set of distance information may be any set of distance information located before the last set of distance information in the distance information sequence. For example, it may be the previous set of distance information adjacent to the last set of distance information, or it may be A set of distance information that is separated from the last set of distance information by several sets of distance information, and this embodiment of the present disclosure does not limit this.

For example, it can be based on the distance change between the last set of distance information and the previous set of distance information in the distance information sequence, and the difference between the measurement time corresponding to the last set of distance information and the measurement time corresponding to the previous set of distance information. time to calculate the movement of dynamic gestures speed. Among them, the distance change between the last set of distance information and the previous set of distance information can be the average of the distance changes between the last set of distance information and the previous set of distance information, or it can be The distance changes between the last set of distance information and the preset hand points (such as fingertips) in the previous set of distance information, etc., are not limited in this embodiment of the disclosure.

Based on this embodiment, through the two sets of distance information and the measurement time in the distance information sequence, the movement speed of the dynamic gesture can be accurately determined.

Figure 8 is a schematic flowchart of a device control method provided by yet another exemplary embodiment of the present disclosure. As shown in Figure 8, in some implementations, preset dynamic gestures can also be detected in the following ways:

Step 502: Determine the position of the sound source object that sends the voice control instruction.

Step 504: Based on the position of the sound source object, use the wearable device to obtain the positions of each point on the hand of the sound source object to obtain hand position information.

The hand position information includes position information of each point of the hand.

Wearable devices in embodiments of the present disclosure may be, for example, smart gloves, smart glasses and other smart devices. The smart gloves can directly locate the positions of various points on the hand at any time, and the smart glasses can visually obtain the positions of various points on the hand. , the embodiments of the present disclosure do not limit the specific wearable device used and the manner in which it obtains the positions of various points on the hand of the sound source object.

Step 506: Determine the posture of the hand based on the hand position information.

Step 508: Determine hand movements based on hand postures at multiple moments.

Step 510: Confirm whether the hand movement is a preset dynamic gesture movement.

Step 512: In response to the hand movement being the preset dynamic gesture, confirm that the preset dynamic gesture is detected.

Otherwise, in response to the movement of the hand not being the preset dynamic gesture, it is confirmed that the preset dynamic gesture is not detected.

Based on this embodiment, the wearable device can be used to directly obtain the position of each point of the hand of the sound source object, and then determine the posture of the hand, and determine the movement of the hand based on the posture of the hand at multiple moments, thereby confirming whether it is a predetermined Set dynamic gestures to trigger adjustments to the target device's state when the preset dynamic gesture is detected.

Correspondingly, based on the embodiment shown in FIG. 8 , the movement direction of the dynamic gesture can be determined based on the postures of the hand determined at multiple moments in step 506 . For example, the movement direction of a dynamic gesture can be determined based on changes in hand postures at multiple moments. Alternatively, the movement direction of the dynamic gesture can also be directly determined based on the hand movement determined in step 508.

Based on this embodiment, hand position information is obtained through the wearable device, and the direction of dynamic gesture movement is determined.

In addition, the motion speed of the dynamic gesture can be obtained based on the last moment and the previous moment among the multiple moments obtained in step 504, as well as the hand position information of the last moment and the hand position information of the previous moment. The time can be the information collection time when the wearable device obtains the position of each point on the hand of the sound source object. The wearable device can obtain the position of each point on the hand of the sound source object according to the preset information collection period (for example, 0.01s), then The time interval between two information collection moments is 0.01s. The previous moment may be a moment before the last moment, or may be a moment before the last moment and separated from the last moment by a preset number of moments (for example, 2). The present disclosure implements There is no restriction on this.

For example, the dynamics can be calculated based on the change between the hand position information at the last moment and the hand position information at the previous moment, and the time between the last moment and the measurement moment at the previous moment. The speed of the gesture. Among them, the last The change between the hand position information at a moment and the hand position information at the previous moment can be the average change in the distance between the corresponding points of the hand in the hand position information at the last moment and the previous moment, or It may be the change in the distance between the preset hand points (such as fingertips) in the hand position information at the last moment and the previous moment, etc., and the embodiment of the present disclosure does not limit this.

Based on this embodiment, the movement speed of the dynamic gesture can be accurately determined using the hand position information of the sound source object obtained by the wearable device at different times.

As shown below, there are several exemplary application scenarios of the embodiments of the present disclosure:

Scenario 1, adjust vehicle windows (windows):

The user sends a voice control instruction "I want gestures to adjust the main driver's window." After receiving the voice control instruction sent by the user, the device implementing the embodiments of the present disclosure performs speech recognition and determines that the target device is the main device based on the obtained first speech recognition result. driver's window, and access the control authority of the main driver's window; the user draws a circle clockwise, and the main driver's window continuously rises. During the continuous lowering of the main driver's window, the user sends the speed adjustment voice command "It's too slow. Three turns can raise the entire window." Based on this, the device implementing the embodiment of the present disclosure determines the speed corresponding to the user's circle-drawing action. The update device adjusts the speed, and in turn controls the main driving window to rise at the update adjustment speed. The user continues to draw circles until the main driver's window is adjusted to the height expected by the user.

Scenario 2: Adjust the front and rear seats on the vehicle:

The user sends a voice control instruction "I want to adjust the driver's seat forward, turn it one centimeter forward." After receiving the voice control instruction sent by the user, the device implementing the embodiment of the present disclosure performs speech recognition. Based on the obtained The first voice recognition result determines that the target device is the main driving seat, the target dimension parameter is front and rear, the adjustment speed configuration information is "turn one circle forward one centimeter", and accesses the control authority of the main driving seat; the user draws a circle clockwise , the driver's seat moves forward continuously. The user continues to move in circles until the driver's seat is adjusted to the user's desired position.

Scenario 3: Adjust the left rearview mirror on the vehicle based on hand shape information:

The user sends a voice control instruction "I want to adjust the left rearview mirror with gestures." After receiving the voice control instruction sent by the user, the device implementing the embodiments of the present disclosure performs speech recognition and determines that the target device is based on the obtained first speech recognition result. Left rearview mirror, access the control authority of the left rearview mirror; when the user draws a circle with the right hand counterclockwise, the left rearview mirror lowers its head continuously; when the user draws a circle counterclockwise, the left rearview mirror continuously raises the head; when the user draws a circle counterclockwise with the left hand, the left rearview mirror The rearview mirror continuously points outward; the user draws a circle with the needle, and the left rearview mirror continuously points inward. Alternatively, the user extends the index finger of the right hand to draw a counterclockwise circle, and the left rearview mirror lowers its head continuously; the user draws a circle counterclockwise, and the left rearview mirror continuously raises the head; the user extends the index finger and middle finger of the right hand to draw a counterclockwise circle, and the left rearview mirror continuously raises the head. outward; the user draws circles with the needle, and the left rearview mirror continuously points inward. The specific adjustment speed can be determined by obtaining the preconfigured adjustment speed configuration information, or you can also refer to the above scenario one and two to configure the adjustment speed configuration information through user voice commands. The user continues to move in circles until the left rearview mirror is adjusted to the direction expected by the user.

Scenario 4: Adjust the air volume of the air conditioner:

The user sends a voice control instruction "I want to adjust the empty air volume with gestures." After receiving the voice control instruction sent by the user, the device implementing the embodiments of the present disclosure performs speech recognition and determines that the target device is based on the obtained first speech recognition result. The air conditioner and target dimension parameters are air volume, and are accessed to the control authority of the air conditioner; if the user draws a circle clockwise, the air volume of the air conditioner increases; if the user draws a circle counterclockwise, the air volume of the air conditioner decreases, and the specific adjustment speed can be obtained from the preconfigured adjustment Speed configuration information is determined. In the process of adjusting the air volume of the air conditioner, the user sends an adjustment speed update voice command to update the adjustment speed of the air volume of the air conditioner. The user continues to move in circles until the air volume of the air conditioner reaches the user's desired effect.

Users can adjust the temperature, direction, etc. of the air conditioner in similar ways, which will not be described again here.

Any device control method provided by the embodiments of the present disclosure can be executed by any appropriate device with data processing capabilities, including but not limited to: terminal devices and servers. Alternatively, any of the device control methods provided by the embodiments of the present disclosure can be executed by the processor. For example, the processor executes any of the device control methods mentioned in the embodiments of the present disclosure by calling corresponding instructions stored in the memory. No further details will be given below.

Exemplary device

The equipment control device of the embodiment of the present disclosure can be used to implement the equipment control method of the above-mentioned embodiments of the present disclosure.

Figure 9 is a schematic structural diagram of an equipment control device provided by an exemplary embodiment of the present disclosure. As shown in Figure 9, the equipment control device of this embodiment includes: a voice recognition module 602, a first determination module 604, a detection module 606 and an adjustment module 608. in:

The voice recognition module 602 is configured to perform voice recognition on the voice control instruction in response to receiving the voice control instruction to obtain a first voice recognition result.

The first determination module 604 is configured to determine the target device corresponding to the voice control instruction based on the first voice recognition result obtained by the voice recognition module 602.

The detection module 606 is used to detect preset dynamic gestures.

The detection of preset dynamic gestures in the embodiment of the present disclosure may include, but is not limited to, drawing circles, etc., for example.

The adjustment module 608 is configured to continuously adjust the state of the target device determined by the first determination module 604 based on the continuous action of the dynamic gesture in response to the detection module 606 detecting the preset dynamic gesture.

Figure 10 is a schematic structural diagram of an equipment control device provided by another exemplary embodiment of the present disclosure. As shown in Figure 10, based on the embodiment shown in Figure 9, the equipment control device of this embodiment may further include: a second determination module 702, used to determine the target dimension parameters to be adjusted of the target equipment.

Correspondingly, the adjustment module 608 may include: a first determination unit 6082, used to determine the movement direction of the dynamic gesture; a second determination unit 6084, used to determine the target adjustment of the target device on the target dimension parameter based on the movement direction of the dynamic gesture. Direction; the adjustment unit 6086 is used to continuously adjust the target dimension parameters of the target device in the target adjustment direction based on the continuous action of the dynamic gesture in the movement direction.

Optionally, in some of these implementations, the status of the target device is determined based on a dimensional parameter. Correspondingly, in this embodiment, the The second determination module 702 is specifically used to determine one dimension parameter of the target device as the target dimension parameter.

Optionally, in other implementations, the status of the target device is determined based on multiple dimensional parameters. Correspondingly, in this embodiment, the second determination module 702 is specifically configured to determine the target dimension parameters based on the first speech recognition result.

Optionally, in some implementations, the status of the target device is determined based on multiple dimensional parameters. Correspondingly, in this embodiment, the voice recognition module 602 is also configured to perform voice recognition on the dimension parameter voice command in response to receiving the dimension parameter voice command, and obtain a second voice recognition result. The second determination module 702 is specifically configured to determine the target dimension parameters based on the second speech recognition result.

Optionally, in some implementations, the status of the target device is determined based on multiple dimensional parameters. Correspondingly, referring to Figure 10 again, the device control device of this embodiment may also include: a first acquisition module 704, specifically used to acquire hand form information corresponding to dynamic gestures, where the hand form information may include, for example But it is not limited to any of the following: finger extension form, number of fingers, single-hand information, etc. The finger extension form may be, for example, straightening, bending, etc.; the number of fingers may be, for example, one, two, etc.; the single-hand information may be, for example, the left hand, the right hand, or both hands, etc. Correspondingly, the second determination module 702 is specifically configured to determine the target dimension parameters based on the hand morphology information obtained by the first acquisition module 704 .

Referring again to FIG. 10 , in yet another embodiment of the equipment control device, a second acquisition module 706 and a third determination module 708 may also be included. Among them, the second acquisition module 706 is used to acquire the movement speed of the dynamic gesture in real time or according to a preset adjustment period during the continuous action of the dynamic gesture. The third determination module 708 is configured to determine the target adjustment speed of the target device on the target dimension parameter based on the movement speed of the dynamic gesture. Correspondingly, in this embodiment, the adjustment unit 6086 is specifically used to adjust the target device on the target dimension parameter at the target adjustment speed in the target adjustment direction.

Referring again to Figure 10, in another embodiment of the equipment control device, it may also include: a third acquisition module 710, used to obtain the adjustment speed configuration information of the target device on the target dimension parameter, and the adjustment speed configuration information is used to represent the adjustment speed configuration information in the target dimension parameter. Regarding the various dimensional parameters of the target device, the relationship between the gesture movement speed and the device adjustment speed. Accordingly, in this embodiment, the third determination module 708 is specifically configured to determine, based on the adjustment speed configuration information, the device adjustment speed corresponding to the movement speed of the dynamic gesture on the target dimension parameter as the target adjustment speed.

Optionally, in some implementations, the third acquisition module 710 is specifically configured to acquire the adjustment speed configuration information of the target device on the target dimension parameter from the first speech recognition result.

Or, in other implementations, the third acquisition module 710 is specifically configured to acquire the adjustment speed configuration information of the target device on the target dimension parameter from the preconfigured adjustment speed configuration information.

Or, referring to FIG. 10 again, in some implementations, the voice recognition module 602 can also be used to respond to receiving the voice instruction to adjust the speed configuration, perform voice recognition on the voice instruction to adjust the speed configuration, and obtain a third voice recognition result. Accordingly, in this embodiment, the third acquisition module 710 is specifically configured to acquire the adjustment speed configuration information of the target device on the target dimension parameter from the third speech recognition result;

Referring again to Figure 10, in another embodiment of the device control device, it may also include: a configuration module 712, configured to receive an adjustment speed configuration request through a setting interface. The adjustment speed configuration request includes a device identification, a dimension parameter identification, and a gesture movement amplitude. and device adjustment amplitude information, the device identifier is used to uniquely identify a device, and the dimension parameter identifier is used to uniquely identify a dimension parameter; based on the gesture movement amplitude and device adjustment amplitude information, the relationship between the gesture movement speed and the device adjustment speed is determined; based on The relationship between device identification, dimension parameter identification, gesture movement speed and device adjustment speed, configure the adjustment speed of the device identified by the device identification on the dimension parameter identified by the dimension parameter identification degree configuration information; or, based on the relationship between the device identification, the dimension parameter identification, the gesture movement speed and the device adjustment speed, update the adjustment speed configuration information corresponding to the device identification and the dimension parameter identification in the preconfigured adjustment speed configuration information.

Optionally, in some implementations, the speech recognition module 602 is also used to continuously adjust the target device in the target dimension parameters in the target adjustment direction based on the continuous action of the dynamic gesture in the movement direction. , in response to receiving the speed adjustment voice instruction, perform voice recognition on the speed adjustment voice instruction, and obtain a fourth voice recognition result. Correspondingly, in this embodiment, the third acquisition module 710 is also used to acquire the adjustment speed update configuration information from the fourth speech recognition result. The adjustment speed update configuration information is used to represent the parameters of each dimension of the target device. After the update The relationship between gesture movement speed and device adjustment speed. The second acquisition module 706 is also used to acquire the movement speed of the dynamic gesture in real time or according to a preset adjustment period during the subsequent continuous action of the dynamic gesture. The third determination module 708 is also configured to determine the update device adjustment speed corresponding to the movement speed of the dynamic gesture on the target dimension parameter based on the adjustment speed update configuration information. The adjustment unit 6086 is also used to adjust the target device on the target dimension parameters in the target adjustment direction at an updated adjustment speed.

Referring again to FIG. 10 , in yet another embodiment of the device control device, a fourth determination module 714 may be further included, configured to determine the position of the sound source object that sends the voice control instruction.

Correspondingly, in some implementations, the detection module 606 is specifically configured to: based on the position of the sound source object, obtain an image sequence including the hand of the sound source object, where the image sequence includes multiple frames of images with a temporal relationship; Hand key point detection is performed on each frame image in the sequence to obtain a hand key point sequence. The hand key point sequence is formed based on the time series relationship between the hand key points in each frame image; based on the hand key point sequence, a preset is performed Dynamic gesture detection.

Correspondingly, in this embodiment, the first determining unit 6082 is specifically configured to determine the movement direction of the dynamic gesture based on the hand key point sequence.

Correspondingly, in this embodiment, the second acquisition module 706 is specifically used to obtain the hand key points based on the hand key points in the last frame image in the image sequence, the hand key points in the previous frame image, and the last frame image. The acquisition time and the acquisition time of the previous frame image are used to obtain the movement speed of the dynamic gesture.

Correspondingly, in other implementations, the detection module 606 is specifically used to: based on the position of the sound source object, use the ToF sensor to measure the distance information between the hand points of the sound source object and the ToF sensor to obtain a set of Distance information; based on multiple sets of distance information with time-series relationships, a distance information sequence is obtained; based on the distance information sequence, preset dynamic gesture detection is performed.

Correspondingly, in this embodiment, the first determining unit 6082 is specifically configured to determine the movement direction of the dynamic gesture based on the distance information sequence.

Correspondingly, in this embodiment, the second acquisition module 706 is specifically configured to calculate the distance information based on the last set of distance information and the previous set of distance information in the distance information sequence, and the measurement time corresponding to the last set of distance information and the previous set of distances. The measurement moment corresponding to the information is used to obtain the movement speed of the dynamic gesture.

Correspondingly, in some implementations, the detection module 606 is specifically used to: based on the position of the sound source object, use the wearable device to obtain the position of each point of the hand of the sound source object, and obtain the hand position information. The hand position information Including position information of each point of the hand; determining the acquired hand position information to determine the posture of the hand; determining the movement of the hand based on the posture of the hand at multiple moments; confirming whether the movement of the hand is a preset dynamic gesture Action, in response to whether the hand movement is a preset dynamic gesture action, confirming that the preset dynamic gesture is detected.

Correspondingly, in this embodiment, the first determining unit 6082 is specifically configured to determine the dynamic gesture based on the hand postures at the multiple moments. direction of movement.

Correspondingly, in this embodiment, the second acquisition module 706 is specifically configured to obtain information based on the last moment and the previous moment among the multiple moments, as well as the hand position information of the last moment and the hand position of the previous moment. Information to obtain the movement speed of dynamic gestures.

Example electronic device

FIG. 11 is a schematic structural diagram of an electronic device provided by an exemplary embodiment of the present disclosure. Next, an electronic device according to an embodiment of the present disclosure is described with reference to FIG. 11 . The electronic device may be any one or both of the first device 100 and the second device 200, or a stand-alone device independent of them. The stand-alone device may communicate with the first device and the second device to receive the information from them. collected input signal.

As shown in Figure 11, the electronic device includes one or more processors 802 and memory 804.

Processor 802 may be a central processing unit (CPU) or other form of processing unit having data processing capabilities and/or instruction execution capabilities, and may control other components in the electronic device to perform desired functions.

Memory 804 may include one or more computer program products, which may include various forms of computer-readable storage media, such as volatile memory and/or non-volatile memory. The volatile memory may include, for example, random access memory (RAM) and/or cache memory (cache). The non-volatile memory may include, for example, read-only memory (ROM), hard disk, flash memory, etc. One or more computer program instructions may be stored on the computer-readable storage medium, and the processor 802 may execute the program instructions to implement the device control methods of various embodiments of the present disclosure described above and/or other Desired functionality. Various contents such as input signals, signal components, noise components, etc. may also be stored in the computer-readable storage medium.

In one example, the electronic device may also include an input device 806 and an output device 808, with these components interconnected by a bus system and/or other forms of connection mechanisms (not shown).

For example, when the electronic device is the first device 100 or the second device 200, the input device 806 may be the above-mentioned microphone or microphone array, used to capture the input signal of the sound source. When the electronic device is a stand-alone device, the input device 806 may be a communication network connector for receiving the collected input signals from the first device 100 and the second device 200 .

In addition, the input device 806 may also include, for example, a keyboard, a mouse, and the like.

The output device 808 can output various information to the outside, including determined distance information, direction information, etc. The output devices 808 may include, for example, displays, speakers, printers, and communication networks and remote output devices to which they are connected, among others.

Of course, for simplicity, only some of the components in the electronic device related to the present disclosure are shown in diagram 802, and components such as buses, input/output interfaces, etc. are omitted. In addition to this, the electronic device may include any other suitable components depending on the specific application.

Example computer program products and computer-readable storage media

In addition to the above methods and devices, embodiments of the present disclosure may also be a computer program product, which includes computer program instructions that, when executed by a processor, cause the processor to perform the “exemplary method” described above in this specification The steps in the device control method according to various embodiments of the present disclosure are described in Sec.

The basic principles of the present disclosure have been described above in conjunction with specific embodiments. However, it should be pointed out that the advantages, advantages, and The effects, etc. are only examples and not limitations, and it cannot be considered that these advantages, advantages, effects, etc. are necessarily possessed by each embodiment of the present disclosure. In addition, the specific details disclosed above are only for the purpose of illustration and to facilitate understanding, and are not limiting. The above details do not limit the present disclosure to be implemented by using the above specific details.

Each embodiment in this specification is described in a progressive manner, and each embodiment focuses on its differences from other embodiments. The same or similar parts between the various embodiments can be referred to each other. For the system embodiment, since it basically corresponds to the method embodiment, the description is relatively simple. For relevant details, please refer to the partial description of the method embodiment.

The block diagrams of the devices, devices, equipment, and systems involved in the present disclosure are only illustrative examples and are not intended to require or imply that they must be connected, arranged, or configured in the manner shown in the block diagrams. As those skilled in the art will recognize, these devices, devices, equipment, and systems may be connected, arranged, and configured in any manner.

The methods and apparatus of the present disclosure may be implemented in many ways. For example, the methods and devices of the present disclosure may be implemented through software, hardware, firmware, or any combination of software, hardware, and firmware. The above order for the steps of the methods is for illustration only, and the steps of the methods of the present disclosure are not limited to the order specifically described above unless otherwise specifically stated. Furthermore, in some embodiments, the present disclosure may also be implemented as programs recorded in recording media, and these programs include machine-readable instructions for implementing methods according to the present disclosure. Thus, the present disclosure also covers recording media storing programs for executing methods according to the present disclosure.

It should also be noted that in the devices, equipment and methods of the present disclosure, each component or each step can be decomposed and/or recombined. These decompositions and/or recombinations should be considered equivalent versions of the present disclosure.

Claims

A device control method including:

In response to receiving the voice control instruction, perform voice recognition on the voice control instruction to obtain a first voice recognition result;

Based on the first voice recognition result, determine the target device corresponding to the voice control instruction;

In response to detecting the preset dynamic gesture, continuously adjusting the state of the target device based on continued actions of the dynamic gesture.
The method of claim 1, further comprising:

Determine the target dimension parameters to be adjusted of the target device;

The continuous action based on the dynamic gesture continuously adjusts the state of the target device, including:

Determine the movement direction of the dynamic gesture;

Based on the movement direction of the dynamic gesture, determine the target adjustment direction of the target device on the target dimension parameter;

Based on the continuous action of the dynamic gesture in the movement direction, the target device is continuously adjusted in the target dimension parameter in the target adjustment direction.
The method of claim 2, wherein the status of the target device is determined based on a dimensional parameter;

Determining the target dimension parameters to be adjusted of the target device includes:

The one dimension parameter of the target device is determined to be the target dimension parameter.
The method of claim 2, wherein the status of the target device is determined based on multiple dimensional parameters;

Determining the target dimension parameters to be adjusted of the target device includes:

Based on the first speech recognition result, the target dimension parameter is determined.
The method of claim 2, wherein the status of the target device is determined based on multiple dimensional parameters;

Determining the target dimension parameters to be adjusted of the target device includes:

In response to receiving the dimension parameter voice command, perform voice recognition on the dimension parameter voice command to obtain a second voice recognition result;

Based on the second speech recognition result, the target dimension parameter is determined.
The method of claim 2, wherein the status of the target device is determined based on multiple dimensional parameters;

Determining the target dimension parameters to be adjusted of the target device includes:

Obtain hand morphology information corresponding to the dynamic gesture. The hand morphology information includes any of the following: finger extension form, number of fingers, and single-hand information;

Based on the hand morphology information, the target dimension parameters are determined.
The method according to any one of claims 2 to 6, wherein based on the continuous action of the dynamic gesture in the movement direction, the target device adjusts the target dimension parameter toward the target. Continuous adjustment of direction, including:

During the continuous action of the dynamic gesture, obtain the movement speed of the dynamic gesture in real time or according to a preset adjustment period;

Based on the movement speed of the dynamic gesture, determine the target adjustment speed of the target device on the target dimension parameter;

The target device is adjusted in the target adjustment direction at the target adjustment speed on the target dimension parameter.
The method of claim 7, further comprising:

Obtain the adjustment speed configuration information of the target device on the target dimension parameters, and the adjustment speed configuration information is used to represent the relationship between the gesture movement speed and the device adjustment speed on each dimension parameter of the target device;

Determining the target adjustment speed of the target device on the target dimension parameter based on the movement speed of the dynamic gesture includes:

Based on the adjustment speed configuration information, it is determined that the device adjustment speed corresponding to the movement speed of the dynamic gesture on the target dimension parameter is the target adjustment speed.
The method according to claim 8, wherein said obtaining the adjustment speed configuration information of the target device on the target dimension parameter includes:

Obtain the adjustment speed configuration information of the target device on the target dimension parameter from the first speech recognition result; or,

In response to receiving the speed adjustment voice instruction, perform voice recognition on the speed adjustment voice instruction to obtain a third voice recognition result;

Obtain the adjustment speed configuration information of the target device on the target dimension parameter from the third speech recognition result; or,

The adjustment speed configuration information of the target device on the target dimension parameter is obtained from the preconfigured adjustment speed configuration information.
The method according to claim 9, wherein preconfiguring the adjustment speed configuration information includes:

The adjustment speed configuration request is received through the setting interface. The adjustment speed configuration request includes device identification, dimension parameter identification, gesture motion amplitude and device adjustment amplitude information. The device identification is used to uniquely identify a device, and the dimension parameter identification is used to uniquely identify a device. Uniquely identifies a dimension parameter;

Based on the gesture movement amplitude and the device adjustment amplitude information, determine the relationship between the gesture movement speed and the device adjustment speed;

Based on the relationship between the device identification, the dimension parameter identification, the gesture movement speed and the device adjustment speed, configure the adjustment speed of the device identified by the device identification on the dimension parameter identified by the dimension parameter identification Configuration information; or, based on the relationship between the device identification, the dimension parameter identification, the gesture movement speed and the device adjustment speed, update the device identification and the dimension parameter identification in the preconfigured adjustment speed configuration information Corresponding adjustment speed configuration information.
The method according to any one of claims 7-10, further comprising:

In the process of continuously adjusting the target device in the target dimension parameter in the target adjustment direction based on the continuous action of the dynamic gesture in the movement direction, in response to receiving the adjustment speed Update the voice command, perform voice recognition on the speed adjustment voice command, and obtain a fourth voice recognition result;

The adjustment speed update configuration information is obtained from the fourth speech recognition result. The adjustment speed update configuration information is used to represent the difference between the updated gesture movement speed and the device adjustment speed on each dimension parameter of the target device. relation;

During the subsequent continuous action of the dynamic gesture, obtain the movement speed of the dynamic gesture in real time or according to a preset adjustment period;

Based on the adjustment speed update configuration information, determine the update device adjustment speed corresponding to the movement speed of the dynamic gesture on the target dimension parameter;

The target device is adjusted in the target adjustment direction at the update adjustment speed on the target dimension parameter.
The method according to any one of claims 7-11, further comprising:

Determine the location of the sound source object that sends the voice control instruction;

Based on the position of the sound source object, obtain an image sequence including the hand of the sound source object, where the image sequence includes multiple frame images with a temporal relationship;

Perform hand key point detection on each frame image in the image sequence in sequence to obtain a hand key point sequence, which is formed from the hand key points in each frame image based on the temporal relationship. ;

Based on the hand key point sequence, preset dynamic gesture detection is performed.
The method according to claim 12, wherein determining the movement direction of the dynamic gesture includes:

Based on the hand key point sequence, the movement direction of the dynamic gesture is determined.
The method according to claim 12 or 13, wherein said obtaining the movement speed of the dynamic gesture includes:

Based on the hand key points in the last frame image and the hand key points in the previous frame image in the image sequence, as well as the collection time of the last frame image and the collection time of the previous frame image, obtain The movement speed of the dynamic gesture.
An equipment control device including:

A voice recognition module, configured to perform voice recognition on the voice control instruction in response to receiving the voice control instruction, and obtain a first voice recognition result;

A determination module, configured to determine the target device corresponding to the voice control instruction based on the first voice recognition result obtained by the voice recognition module;

Detection module, used to detect preset dynamic gestures;

An adjustment module, configured to continuously adjust the state of the target device based on the continuous action of the dynamic gesture in response to the detection module detecting the preset dynamic gesture.
A computer-readable storage medium stores a computer program, and the computer program is used to execute the device control method described in any one of claims 1-14.
An electronic device, the electronic device includes:

processor;

memory for storing instructions executable by the processor;

The processor is configured to read the executable instructions from the memory and execute the instructions to implement the device control method described in any one of claims 1-14.