WO2022188552A1 - 设备控制方法及相关装置 - Google Patents

设备控制方法及相关装置 Download PDF

Info

Publication number
WO2022188552A1
WO2022188552A1 PCT/CN2022/072355 CN2022072355W WO2022188552A1 WO 2022188552 A1 WO2022188552 A1 WO 2022188552A1 CN 2022072355 W CN2022072355 W CN 2022072355W WO 2022188552 A1 WO2022188552 A1 WO 2022188552A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
angle
camera
coordinate point
image information
Prior art date
Application number
PCT/CN2022/072355
Other languages
English (en)
French (fr)
Inventor
戴强
张晓帆
曾理
王佩玲
Original Assignee
Oppo广东移动通信有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Oppo广东移动通信有限公司 filed Critical Oppo广东移动通信有限公司
Publication of WO2022188552A1 publication Critical patent/WO2022188552A1/zh

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • H04L12/2803Home automation networks
    • H04L12/2816Controlling appliance services of a home automation network by calling their functionalities
    • H04L12/282Controlling appliance services of a home automation network by calling their functionalities based on user interaction within the home
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L12/00Data switching networks
    • H04L12/28Data switching networks characterised by path configuration, e.g. LAN [Local Area Networks] or WAN [Wide Area Networks]
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Definitions

  • the present application belongs to the technical field of equipment control, and in particular relates to a device control method and related devices.
  • the embodiments of the present application provide a device control method and a related device, so as to improve the accuracy and intelligence of device control.
  • an embodiment of the present application provides a device control method, including:
  • the angle receiving range of the target device matches the facing angle of the face of the first user
  • the target device is controlled to perform the operation indicated by the voice instruction of the first user.
  • the arbitration device first obtains at least one angle receiving range of at least one fixed device and the face orientation angle of the first user; secondly, determines the target device that the first user needs to control; finally, controls the target device The operation indicated by the voice instruction of the first user is performed. It can be seen that the arbitration device can intelligently decide the target device that the first user needs to control according to the face orientation angle of the first user combined with the angle receiving range of at least one fixed device, so as to avoid the situation that the control intention of the first user cannot be accurately identified. It is beneficial to improve the accuracy and intelligence of equipment control.
  • an embodiment of the present application provides a device control device, including:
  • an acquisition unit used for at least one angle receiving range of at least one fixed device and the facing angle of the face of the first user
  • a determining unit configured to determine a target device that the first user needs to control, where the angle receiving range of the target device matches the face orientation angle of the first user;
  • a control unit configured to control the target device to perform the operation indicated by the voice instruction of the first user.
  • embodiments of the present application provide an electronic device, and one or more processors;
  • the one or more memories and the program are configured to, by the one or more processors, control the electronic device to execute instructions as in any of the methods in the first aspect of the embodiments of the present application.
  • an embodiment of the present application provides a chip, including: a processor configured to call and run a computer program from a memory, so that a device installed with the chip executes any method according to the first aspect of the embodiment of the present application some or all of the steps described in .
  • an embodiment of the present application provides a computer-readable storage medium, wherein the computer-readable storage medium stores a computer program for electronic data exchange, wherein the computer program causes a computer to execute the implementation of the present application Examples include some or all of the steps described in any method of the first aspect.
  • the embodiments of the present application provide a computer program, wherein the computer program is operable to cause the computer to execute some or all of the steps described in any of the methods in the first aspect of the embodiments of the present application.
  • the computer program may be a software installation package.
  • 1a is a schematic diagram of user control in a multi-device scenario provided by an embodiment of the present application
  • FIG. 1b is an architecture diagram of a device control system 10 provided by an embodiment of the present application.
  • 1c is a schematic diagram of a functional interface of an intelligent voice assistant provided by an embodiment of the present application.
  • 1d is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • FIG. 2a is a schematic flowchart of a device control method provided by an embodiment of the present application.
  • 2b is a schematic diagram of an angular receiving range of a multi-device provided by an embodiment of the present application
  • Fig. 2c is a schematic diagram of measuring the receiving angle range of a fixed device provided by an embodiment of the present application.
  • FIG. 2d is an example diagram of an interface for displaying the determined target device provided by an embodiment of the present application.
  • 3a is a schematic flowchart of a device control method provided by an embodiment of the present application.
  • 3b is an example diagram of a schematic diagram device provided by an embodiment of the present application.
  • 3c is an example diagram of another schematic diagram device provided by an embodiment of the present application.
  • FIG. 3d is an example diagram of another schematic diagram device provided by an embodiment of the present application.
  • FIG. 3e is an example diagram of another schematic diagram device provided by an embodiment of the present application.
  • FIG. 4 is a block diagram of functional units of a device control device provided by an embodiment of the present application.
  • FIG. 5 is a block diagram of the functional unit composition of another device control apparatus provided by an embodiment of the present application.
  • FIG. 6 is a block diagram of functional units of a device control device provided by an embodiment of the present application.
  • FIG. 7 is a block diagram of functional units of another device control apparatus provided by an embodiment of the present application.
  • FIG. 1a there are smart speakers (0.5m away from the user), smart TV 1 (0.6m away from the user), computer (1.2m away from the user), and smart TV 2 in the space where the user is located. (The distance from the user is 0.55m), there are multiple TVs in the space where the user is located, and it is difficult for the user to use voice commands to control the TV they want to watch. In a more general situation, when a user wants to listen to music and issues a "play music" command, the current intelligent voice assistant is also unable to select a suitable device to meet the user's intention.
  • the embodiments of the present application provide a device control method.
  • the embodiments of the present application can introduce a new dimension feature-user according to the interaction habits between users and devices face facing. This feature makes the interaction between the device and the user more natural and smooth, and also makes the relationship between the user and the device more closely integrated.
  • the fixed device that the user faces does not need to have any signal acquisition capability, which greatly expands the type and range of the facing device.
  • FIG. 1b is a device control system 10 provided by an embodiment of the present application.
  • the device control system 10 includes a fixed device 100 (for example, a smart TV, a smart speaker, a smart washing machine, a smart air conditioner, a device that prevents a mobile phone on a table from changing its position within a period of time), a camera 200 (For example: a surveillance camera installed in a corner, a surveillance camera placed on a smart refrigerator, etc.), an arbitration device 300 installed with an intelligent voice assistant (the arbitration device can be any fixed device or any mobile device)
  • One, such as the user's mobile phone can also be a dedicated control box in smart home scenarios, or a server in the cloud, or a device group composed of multiple devices that jointly complete the solution, which is not uniquely limited here)
  • user-side A mobile device 400 such as a mobile phone held by a user, a smart watch worn on the wrist, and other devices whose positions change with the user's position
  • a server 500 the arbitration device 300, the
  • the intelligent voice assistant can be installed on various devices such as mobile phones to support the device control method of the present application, and the specific function names and interface interaction methods presented by the intelligent voice assistant can be various, which are not uniquely limited here. , for example, it is installed on an OPPO mobile phone and presents the setting function interface of the "Breeno" smart assistant as shown in Figure 1c.
  • the arbitration device 300 can exchange data and signaling with other devices (eg, the fixed device 100 and the mobile device 400) in various ways.
  • the arbitration device 300 may directly communicate with the first camera in a local area network to obtain corresponding information, and the arbitration device 300 may connect to a smart speaker in the space where the user is located through a mobile communication network to realize corresponding information exchange and the like.
  • FIG. 1d is a schematic structural diagram of an electronic device provided by an embodiment of the present application.
  • the electronic device is applied to the above-mentioned device control system 10.
  • the electronic device includes an application processor 120, a memory 130, a communication module 140, and one or more programs 131.
  • the application processor 120 communicates with the memory through an internal communication bus. 130.
  • the communication modules 140 are all connected in communication.
  • the one or more programs 131 are stored in the above-mentioned memory 130 and configured to be executed by the above-mentioned application processor 120, and the one or more programs 131 include a program for executing any step in the above-mentioned method embodiments. instruction.
  • the application processor 120 may be, for example, a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), an application-specific integrated circuit (Application-Specific Integrated Circuit, ASIC), a field Field Programmable Gate Array (FPGA) or other programmable logic devices, transistor logic devices, hardware components or any combination thereof. It may implement or execute the various exemplary logical blocks, units and circuits described in connection with this disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the communication unit may be a communication module 140 , a transceiver, a transceiver circuit, etc., and the storage unit may be the memory 130 .
  • the memory 130 may be volatile memory or non-volatile memory, or may include both volatile and non-volatile memory.
  • the non-volatile memory may be read-only memory (ROM), programmable read-only memory (PROM), erasable programmable read-only memory (EPROM), electrically programmable Erase programmable read-only memory (electrically EPROM, EEPROM) or flash memory.
  • Volatile memory may be random access memory (RAM), which acts as an external cache.
  • RAM random access memory
  • SRAM static random access memory
  • DRAM dynamic random access memory
  • DRAM synchronous dynamic random access memory
  • Access memory synchronous DRAM, SDRAM
  • double data rate synchronous dynamic random access memory double data rate SDRAM, DDR SDRAM
  • enhanced synchronous dynamic random access memory enhanced SDRAM, ESDRAM
  • synchronous connection dynamic random access memory Fetch memory direct memory bus random access memory (direct rambus RAM, DR RAM).
  • the application processor 120 is configured to perform any step performed by the arbitration device in the method embodiment of the present application.
  • FIG. 2a is a schematic flowchart of a device control method provided by an embodiment of the present application.
  • the arbitration device 300 applied in the device control system 10 is shown in the figure.
  • the device control method includes the following operations.
  • Step 201 Acquire at least one angle receiving range of at least one fixed device and the facing angle of the face of the first user.
  • the at least one angle receiving range corresponds to the at least one fixed device one-to-one, that is, each fixed device corresponds to one angle receiving range.
  • the first distance between the first camera and the first user may be calculated by the first camera based on a depth of field algorithm.
  • the face orientation angle of the first user can be characterized as the face yaw angle (yaw), pitch angle (pitch), and roll angle (roll) relative to the current camera, and the coordinates of the first camera can be obtained through angle conversion angle within the system.
  • Step 202 Determine a target device to be controlled by the first user, and the angle receiving range of the target device matches the face orientation angle of the first user.
  • the target device may be a fixed device.
  • the device that the user is facing does not need to perform signal acquisition work.
  • the device that the user is facing can be smart curtains, lamps, switches, mobile phones with the same position, etc., or it can be the device of the mobile phone held by the user. It is only necessary to install the arbitration device of the intelligent voice assistant to be able to control these devices. This feature Greatly expands the type and range of oriented devices.
  • Step 203 Control the target device to perform the operation indicated by the voice instruction of the first user.
  • the angle receiving range refers to the fan-shaped angle range formed by the boundary point of the fixed device and the user's position.
  • the fixed device includes the mobile phone, speaker, TV 1, TV 2, and computer in the space where the user is located.
  • the angular receiving range of the mobile phone can be determined as the angular range C shown in the figure
  • the angular receiving range of the speaker can be determined through the fan-shaped area between the boundary point of the speaker and the user's position as
  • the angle range B shown in the figure can be determined by the fan-shaped area between the boundary point of the TV 1 and the user's position.
  • the angle receiving range of the TV 1 is the angle range A shown in the figure.
  • the angle receiving range of the computer can be determined as the angle range D shown in the figure, and the angle range E of the TV 2 can be determined through the sector area between the boundary point of the TV 2 and the user's position.
  • the acquiring at least one angle receiving range of at least one fixed device includes: according to the position of the first camera, the first distance between the first camera and the first user, and the The position of the at least one fixture determines at least one angular reception range of the at least one fixture.
  • the device needs to obtain the first distance between the first camera and the first user, for example, calculate the first distance between the first camera and the first user through the depth detection algorithm of the first camera a distance.
  • the arbitration device first obtains the first distance between the first camera and the first user, as well as the facing angle of the first user's face; The position of the first user and the facing angle of the first user's face are determined to determine the target device that the first user needs to control; finally, the target device is controlled to perform the operation indicated by the first user's voice instruction. It can be seen that the arbitration device can intelligently decide the target device that the first user needs to control according to the first user's face orientation angle combined with the position of the first camera, the first distance, and the position of at least one fixed device, so as to avoid being unable to accurately identify the first user. It is beneficial to improve the accuracy and intelligence of equipment control.
  • the first distance between the first camera and the first user according to the position of the first camera and the position of the at least one fixed device determine at least one of the at least one fixed device.
  • An angle receiving range includes: as shown in FIG. 2c, if the coordinate point a1 is the equivalent position of the first camera, a Cartesian coordinate system Xa1Y is established with the coordinate point a1 as the coordinate origin, and the coordinate point b1 is the same as that according to the first camera.
  • the equivalent position of the first user corresponding to a distance, the coordinate point b2 and the coordinate point b3 are two boundary points of a single fixed device, the coordinate point a3 is the horizontal projection point of the coordinate point b2 on the X axis, and the coordinate point a5 is the Coordinate point b3 is the horizontal projection point of X-axis, coordinate point a4 is the intersection of ray b1b2 and X-axis, and coordinate point a6 is the intersection of ray b1b3 and X-axis, then under the constraint of coordinate point b1, the angle of the single fixed device receives
  • the first boundary angle ⁇ 1 of the range is ⁇ a2b1b2
  • the second boundary angle ⁇ 2 is ⁇ a2b1b3
  • ⁇ 1 and ⁇ 2 constitute the angle receiving range of the single fixed device.
  • the first distance corresponds to the line segment length of a1b1, and the length of the horizontal projected line segment a1a2 and the line segment length of the vertical projected line segment a2b1 can be calculated according to the line segment length of a1b1.
  • ⁇ 1 and ⁇ 2 are calculated by the following formulas:
  • a 2 a 3 is obtained by a 1 a 3 -a 1 a 2 , which can be obtained according to the trigonometric function:
  • a 2 a 5 is obtained by a 1 a 5 -a 1 a 2 .
  • ⁇ 1 and ⁇ 2 can be determined by the above formula.
  • the angle receiving range of the single fixed device is [ ⁇ 1, ⁇ 2].
  • the interactive control result can be displayed on the display screen on a carrier such as a mobile phone, and the determined first user's needs can be displayed through text prompts.
  • the target device of the control ie, the intent device.
  • the face orientation angle of the current user when the face orientation angle of the current user is detected through the face orientation algorithm of the image, it can be determined that the current user is facing the device. If the device can provide the capabilities described in the user instruction, the system will call the device to respond to the user. request.
  • the acquiring the face orientation angle of the first user includes: acquiring a first image captured by the first camera; detecting that the first image includes image information of at least one user, determining the image information of the first user in the first image; determining the facing angle of the face of the first user according to the image information of the first user in the first image.
  • the determining the face orientation angle of the first user according to the image information of the first user in the first image includes: extracting a face deflection angle ( yaw), pitch angle (pitch), roll angle (roll).
  • the detecting that the first image includes image information of at least one user, and determining the image information of the first user in the first image includes:
  • Detecting whether the image information of the first user can be determined according to the voiceprint information of the voice command and/or the biometric information of the user;
  • the multiple users determine the multiple users according to the image information of the multiple users.
  • the positions of the multiple users are determined according to the image information of the multiple users, and the position of the multiple users is detected according to the positions of the multiple users, the sound source localization position information of each user, and the status of each user.
  • the image information of the first user cannot be determined)
  • the image information of the first user is determined according to whether there is a device with the facial orientation of the multiple users, and whether the device can provide the capability described by the voice command .
  • the multiple users further include a second user other than the first user.
  • the biometric information of the user refers to the characteristic data reflecting the biometric characteristics of the user's face, such as the distance between the eyes, the proportion of the nose to the face, and wearing glasses.
  • the arbitration device may preset or acquire in real time the correspondence between the user's image information and the user's voiceprint information, and/or the correspondence between the user's image information and the user's biometric information, and the arbitration device determines The voiceprint feature of the voice command, and/or extract the biometric information of the first image, and then query the above-mentioned corresponding relationship. If the image information of the corresponding user is found, it can be determined that the image information of the first user does exist in the first image. .
  • the image position of each user can be obtained by analyzing the first image, and the sound source position of the first user identified by processing the voice command of the first user by the sound source localization technology, Perform position comparison, if it has not been matched yet, or if more than one match is found, it can be further screened by the status of each user, wherein the status of each user includes limb status and/or facial status.
  • the physical state and/or facial state of each user is used to determine whether the current user is performing an operation of controlling the device through a voice command.
  • the device may be further determined based on the image analysis to determine the device that the user's face is facing, and whether the device has the capability described by the voice instruction. For example, if the user's face is facing the device including a smart watch, and the function described by the voice command is temperature adjustment, it is obviously not matched, so the controlled device is not a smart watch.
  • the method further includes: detecting and determining the image information of the first user according to the voiceprint information of the voice command and/or the biometric information of the user.
  • the method further includes: determining the image information of the first user according to the positions of the multiple users, the sound source localization position information of each user, and the state of each user.
  • the arbitration device can perform a gradient step-by-step detection mechanism based on multiple types of information, so as to comprehensively and finely detect the first user.
  • the method further includes: detecting that there is image information of a single user in the first image; and determining that the image information of the single user is the image information of the first user.
  • the simplified algorithm of the arbitration device directly locates the current user as the first user, which is fast, efficient, and real-time.
  • the method before the determining of the target device that the first user needs to control, the method further includes: detecting the first user according to the image information of the first user in the first image is not facing the mobile device.
  • the mobile device includes a wearable device.
  • whether there is a mobile device in the image area facing the face of the first user may be identified based on an image analysis algorithm, and the mobile device may be a mobile phone held by the user, a smart watch worn by the user, or the like.
  • the arbitration device needs to be able to Based on the collected first image, first analyze whether the first user has a control intention for the mobile device, and further locate the fixed device that needs to be controlled based on the facial orientation when there is no control intention for the mobile device, so as to improve the control efficiency of the device. Accuracy and Comprehensiveness.
  • the method further includes: detecting that the face orientation of the first user exists in the mobile device according to the image information of the first user in the first image; determining according to the mobile device The target device that the first user needs to control.
  • the specific implementation manner of the target device that needs to be controlled according to the mobile device as the first user includes: if the mobile device is a single mobile device, determining the single mobile device as the first user needs The target device to be controlled; if the mobile device is a plurality of mobile devices, the device status of each mobile device in the plurality of mobile devices is acquired, and the device status of each mobile device in the plurality of mobile devices is determined. The target device that the first user needs to control.
  • the device status of each mobile device includes at least one of the following: screen status, whether it is held by the user, and the like.
  • the arbitration device needs to be able to Based on the collected first image, first analyze whether the first user has a control intention for the mobile device, and when it is recognized that there is a control intention for the mobile device, determine that the mobile device is the target that the first user currently needs to control equipment, so as to avoid misidentification and improve the accuracy and comprehensiveness of equipment control.
  • the first camera is selected and determined according to the location of the first user.
  • the first camera is a camera associated with the sound source localization reference position of the first user; the sound source localization reference position of the first user is collected by the first user through at least three devices The time difference of the voice commands, the positions of the three devices, and the sound source localization technology are determined.
  • the first camera may be a camera selected by the arbitration device from a plurality of cameras based on the sound source localization result of the first user and meeting a preset condition, wherein the meeting the preset condition may be the following conditions At least one of:
  • the camera is in the same room as the first user;
  • the distance between the camera and the first user is the smallest or less than a preset distance threshold
  • the viewing range of the camera includes the first user, or the camera can be directly facing the first user.
  • the arbitration device selects the first camera, it can adjust the angle, focal length and other states of the first camera according to the approximate orientation of the first user, so that the user's picture can be captured clearly and accurately.
  • the system will be exited, and the user can be actively inquired through any device to determine the user's intent device, and the intent device can be activated to serve the user.
  • the arbitration device can filter out the associated first camera from the multiple cameras based on the sound source localization result of the first user, thereby improving the success rate of image acquisition, detection and identification.
  • the position of the at least one fixed device and the position of the first camera are demarcated by means of visual scanning positioning.
  • the user can use a device with a binocular camera to locate the relative position of each device (including the room number to which it belongs, and the relative position in the current room, etc.), or it can be specified by the user. At the same time, the user can fine-tune the position of each device, expand or narrow the receiving range of the device's orientation angle to improve the control accuracy.
  • the system supports visual scanning positioning to quickly build the spatial position relationship of multiple devices, and supports user fine-tuning to improve convenience and accuracy.
  • the arbitration device first obtains at least one angle receiving range of at least one fixed device and the face orientation angle of the first user; secondly, determines the target device that the first user needs to control; finally, controls the The target device performs the operation indicated by the voice instruction of the first user. It can be seen that the arbitration device can intelligently decide the target device that the first user needs to control according to the face orientation angle of the first user combined with the angle receiving range of at least one fixed device, so as to avoid the situation that the control intention of the first user cannot be accurately identified. It is beneficial to improve the accuracy and intelligence of equipment control.
  • FIG. 3a is a schematic flowchart of a method for developing a schematic device provided by an embodiment of the present application, which is applied to any device in the device control system 10. As shown in the figure, the method for developing a schematic device includes the following operations.
  • Step 301 Obtain the detection result of the intent device of the voice command of the first user, where the detection result of the intent device is based on the location of the first camera, the first distance, the location of at least one fixed device, and the face of the first user Determined by the orientation angle, the first distance is the distance between the first camera and the first user.
  • Step 302 displaying the detection result of the intended device.
  • the voice instruction is used for the target device to perform a corresponding operation to complete the control intention of the first user.
  • the displaying the detection result of the intended device includes: displaying a device control system space model, where the device control system space model includes the at least one obtained by performing position calibration by means of visual scanning and positioning. a fixed device; highlighting the determined target device in the at least one fixed device; and/or displaying prompt information for indicating that the target device is an intended device.
  • the display schematic diagram of the intent device wherein the intent device is the TV 1 marked with a dotted frame, and the icon of the TV 1 can also be directly highlighted, which is not limited here.
  • the display schematic diagram of the intended device in which the schematic device displayed through text information is the TV 2 .
  • the device control system supports to display the detection results of the schematic device intuitively through the display screen.
  • the displaying the detection result of the intended device includes: displaying a device control system space model, where the device control system space model includes the at least one obtained by performing position calibration by means of visual scanning and positioning. a fixed device and the determined mobile device as the target device; highlighting the determined mobile device as the target device; and/or displaying prompt information indicating that the target device is the intended device.
  • the display schematic diagram of the intent device wherein the intent device is a mobile phone marked with a highlight, and the icon of the mobile phone can also be directly highlighted, etc., which is not uniquely limited here.
  • the display schematic diagram of the intended device wherein the schematic device displayed through text information is a mobile phone.
  • the device control system supports to display the detection results of the schematic device intuitively through the display screen.
  • the device control system can accurately determine the intended device of the first user based on the face orientation of the first user and other related information, and display the detection result of the intended device in a visual manner to visually present it to the user. Users, improve the intuitiveness and intelligence of device control, and improve user experience.
  • An embodiment of the present application provides a device control device, where the device control device may be an arbitration device.
  • the device control apparatus is configured to perform the steps performed by the arbitration device in the above device control method.
  • the device control apparatus provided in this embodiment of the present application may include modules corresponding to corresponding steps.
  • the device control apparatus may be divided into functional modules according to the foregoing method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.
  • the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • FIG. 4 shows a possible schematic structural diagram of the device control apparatus involved in the above embodiment.
  • the device control device 4 is applied to the arbitration device 400 in the device control system 10; the device includes:
  • an obtaining unit 40 configured to obtain the first distance between the first camera and the first user, and the facing angle of the first user's face
  • a determining unit 41 configured to determine the target device that the first user needs to control according to the position of the first camera, the first distance, the position of at least one fixed device, and the facing angle of the face of the first user;
  • the control unit 42 is configured to control the target device to perform the operation indicated by the voice instruction of the first user.
  • the acquiring unit 40 is specifically configured to: according to the position of the first camera, the relationship between the first camera and the first user The first distance therebetween and the position of the at least one fixture determine at least one angular receiving range of the at least one fixture.
  • the acquisition unit 40 is specifically configured to: if the coordinate point a1 is the equivalent position of the first camera, use the coordinate point a1 as the coordinate origin to establish a rectangular coordinate system Xa1Y, and the coordinate point b1 is based on the The equivalent position of the first user corresponding to the first distance, the coordinate point b2 and the coordinate point b3 are two boundary points of a single fixed device, the coordinate point a3 is the horizontal projection point of the coordinate point b2 on the X axis, and the coordinate The point a5 is the horizontal projection point of the coordinate point b3 on the X-axis, the coordinate point a4 is the intersection of the ray b1b2 and the X-axis, and the coordinate point a6 is the intersection of the ray b
  • ⁇ 1 and ⁇ 2 are calculated by the following formulas:
  • the acquiring unit 40 is specifically configured to: acquire a first image captured by the first camera; and detect the first image Including image information of at least one user, determining the image information of the first user in the first image; and determining the facing angle of the face of the first user according to the image information of the first user in the first image .
  • the acquiring unit 40 is specifically configured to: detect image information of multiple users exists in the first image;
  • Detecting whether the image information of the first user can be determined according to the voiceprint information of the voice command and/or the biometric information of the user;
  • the image information of the first user can be determined
  • the image information of the first user is determined according to whether there is a device with the face orientation of the multiple users and whether the device can provide the capability described by the voice instruction.
  • the first camera is selected and determined according to the location of the first user.
  • the position of the at least one fixed device and the position of the first camera are demarcated by means of visual scanning positioning.
  • the determining unit 41 before the determining unit 41 determines the target device that the first user needs to control, it is further configured to determine that the face of the first user is not facing the mobile device according to the facing angle of the face of the first user .
  • the determining unit 41 is further configured to: detect that the face orientation of the first user exists in the mobile device according to the image information of the first user in the first image; and The mobile device determines a target device that the first user needs to control.
  • the device control apparatus 5 includes: a processing module 50 and a communication module 51 .
  • the processing module 50 is used to control and manage the actions of the device control apparatus, for example, the steps performed by the acquisition unit 40, the determination unit 41, the control unit 42, the detection unit 43, and/or other methods used to perform the techniques described herein process.
  • the communication module 51 is used to support the interaction between the device control apparatus and other devices.
  • the device control apparatus may further include a storage module 52, and the storage module 52 is used for storing program codes and data of the device control apparatus.
  • the processing module 50 may be a processor or a controller, such as a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), ASIC, FPGA or other programmable Logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the communication module 51 may be a transceiver, an RF circuit, a communication interface, or the like.
  • the storage module 52 may be a memory.
  • Both the above-mentioned device control device 4 and device control device 5 can execute the steps performed by the arbitration device in the device control method shown in FIG. 2a.
  • An embodiment of the present application provides a device control apparatus, and the device control apparatus may be any device in a device control system. Specifically, the device control apparatus is configured to execute the steps performed by any device in the device control system in the above device control method.
  • the device control apparatus provided in this embodiment of the present application may include modules corresponding to corresponding steps.
  • the device control apparatus may be divided into functional modules according to the above method examples.
  • each functional module may be divided corresponding to each function, or two or more functions may be integrated into one processing module.
  • the above-mentioned integrated modules can be implemented in the form of hardware, and can also be implemented in the form of software function modules.
  • the division of modules in the embodiments of the present application is schematic, and is only a logical function division, and there may be other division manners in actual implementation.
  • FIG. 6 shows a possible schematic structural diagram of the device control apparatus involved in the foregoing embodiment.
  • the device control device 6 is applied to the arbitration device 600 in the device control system 10; the device includes:
  • the obtaining unit 60 is configured to obtain the detection result of the intention device of the voice command of the first user, the detection result of the intention device is based on the position of the first camera, the first distance, the position of at least one fixed device, and the first The first distance is the distance between the first camera and the first user, determined by the facing angle of the user's face;
  • the display unit 61 is configured to display the detection result of the intended device.
  • the voice instruction is used for the target device to perform a corresponding operation to complete the control intention of the first user.
  • the display unit 61 is specifically configured to display a device control system space model, where the device control system space model includes a way of positioning through visual scanning The at least one fixed device obtained by performing position calibration; and highlighting the determined target device in the at least one fixed device; and/or displaying prompt information for indicating that the target device is an intended device.
  • the display unit 61 is specifically configured to display a device control system space model, where the device control system space model includes a way of positioning through visual scanning The at least one fixed device and the determined mobile device as the target device obtained by performing location calibration; and highlighting the determined mobile device as the target device; and/or, displaying a display for indicating the target device The prompt information for the intended device.
  • the device control apparatus 7 includes: a processing module 70 and a communication module 71 .
  • the processing module 70 is used to control and manage the actions of the device control apparatus, eg, the steps performed by the acquisition unit 60, the display unit 61, and/or other processes used to perform the techniques described herein.
  • the communication module 71 is used to support the interaction between the device control apparatus and other devices.
  • the device control apparatus may further include a storage module 72, and the storage module 72 is used for storing program codes and data of the device control apparatus.
  • the processing module 70 may be a processor or a controller, such as a central processing unit (Central Processing Unit, CPU), a general-purpose processor, a digital signal processor (Digital Signal Processor, DSP), ASIC, FPGA or other programmable Logic devices, transistor logic devices, hardware components, or any combination thereof. It may implement or execute the various exemplary logical blocks, modules and circuits described in connection with this disclosure.
  • the processor may also be a combination that implements computing functions, such as a combination of one or more microprocessors, a combination of a DSP and a microprocessor, and the like.
  • the communication module 71 may be a transceiver, an RF circuit, a communication interface, or the like.
  • the storage module 72 may be a memory.
  • the above embodiments may be implemented in whole or in part by software, hardware, firmware or any other combination.
  • the above-described embodiments may be implemented in whole or in part in the form of a computer program product.
  • the computer program product includes one or more computer instructions or computer programs. When the computer instructions or computer programs are loaded or executed on a computer, all or part of the processes or functions described in the embodiments of the present application are generated.
  • the computer may be a general purpose computer, special purpose computer, computer network, or other programmable device.
  • the computer instructions may be stored in or transmitted from one computer-readable storage medium to another computer-readable storage medium, for example, the computer instructions may be downloaded from a website site, computer, server, or data center Transmission by wire or wireless to another website site, computer, server or data center.
  • the computer-readable storage medium may be any available medium that a computer can access, or a data storage device such as a server, a data center, or the like containing one or more sets of available media.
  • the usable media may be magnetic media (eg, floppy disks, hard disks, magnetic tapes), optical media (eg, DVDs), or semiconductor media.
  • the semiconductor medium may be a solid state drive.
  • Embodiments of the present application further provide a computer storage medium, wherein the computer storage medium stores a computer program for electronic data exchange, and the computer program causes the computer to execute part or all of the steps of any method described in the above method embodiments , the above computer includes electronic equipment.
  • Embodiments of the present application further provide a computer program product, where the computer program product includes a non-transitory computer-readable storage medium storing a computer program, and the computer program is operable to cause a computer to execute any one of the method embodiments described above. some or all of the steps of the method.
  • the computer program product may be a software installation package, and the computer includes an electronic device.
  • the size of the sequence numbers of the above-mentioned processes does not mean the order of execution, and the execution order of each process should be determined by its functions and internal logic, and should not be dealt with in the embodiments of the present application. implementation constitutes any limitation.
  • the disclosed method, apparatus and system may be implemented in other manners.
  • the device embodiments described above are only illustrative; for example, the division of the units is only a logical function division, and there may be other division methods in actual implementation; for example, multiple units or components may be combined or Can be integrated into another system, or some features can be ignored, or not implemented.
  • the shown or discussed mutual coupling or direct coupling or communication connection may be through some interfaces, indirect coupling or communication connection of devices or units, and may be in electrical, mechanical or other forms.
  • the units described as separate components may or may not be physically separated, and components displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed to multiple network units. Some or all of the units may be selected according to actual needs to achieve the purpose of the solution in this embodiment.
  • each functional unit in each embodiment of the present invention may be integrated into one processing unit, or each unit may be physically included individually, or two or more units may be integrated into one unit.
  • the above-mentioned integrated unit may be implemented in the form of hardware, or may be implemented in the form of hardware plus software functional units.
  • the above-mentioned integrated units implemented in the form of software functional units can be stored in a computer-readable storage medium.
  • the above-mentioned software functional unit is stored in a storage medium, and includes several instructions to cause a computer device (which may be a personal computer, a server, or a network device, etc.) to execute some steps of the methods described in the various embodiments of the present invention.
  • the aforementioned storage medium includes: U disk, mobile hard disk, Read-Only Memory (ROM for short), Random Access Memory (RAM for short), magnetic disk or CD, etc. that can store program codes medium.

Abstract

本申请提供了一种设备控制方法及相关装置,方法包括:获取至少一个固定设备的至少一个角度接收范围以及第一用户的面部朝向角度;确定第一用户需要控制的目标设备,目标设备的角度接收范围与第一用户的面部朝向角度相匹配;控制目标设备执行第一用户的语音指令所指示的操作。本申请实施例有利于提高设备控制的准确度和智能性。

Description

设备控制方法及相关装置 技术领域
本申请属于设备控制技术领域,具体涉及一种设备控制方法及相关装置。
背景技术
随着近几年互联网软硬件的急速发展,功能各异的电子设备环绕在用户周围,如手机、平板、智能音响、电子手表等。这些电子设备在给用户带来极大便利的同时也给用户带来一定程度的困扰,例如当用户希望播放音乐时,用户一般会朝向电视说出播放音乐的语音指令,安装在当前房间内的某个设备上的智能语音助手无法智能准确的识别出用户希望通过电视播放音乐的这一意图。
发明内容
本申请实施例提供了一种设备控制方法及相关装置,以期提高设备控制的准确度和智能性。
第一方面,本申请实施例提供了一种设备控制方法,包括:
获取至少一个固定设备的至少一个角度接收范围以及第一用户的面部朝向角度;
确定所述第一用户需要控制的目标设备,所述目标设备的所述角度接收范围与所述第一用户的面部朝向角度相匹配;
控制所述目标设备执行所述第一用户的语音指令所指示的操作。
可见,本示例中,仲裁设备首先获取至少一个固定设备的至少一个角度接收范围以及第一用户的面部朝向角度;其次,确定所述第一用户需要控制的目标设备;最后,控制所述目标设备执行所述第一用户的语音指令所指示的操作。可见,仲裁设备能够根据第一用户的面部朝向角度结合至少一个固定设备的角度接收范围,智能决策第一用户需要控制的目标设备,从而避免无法准确识别第一用户的控制意图的情况发生,有利于提高设备控制的准确度和智能性。
第二方面,本申请实施例提供了一种设备控制装置,包括:
获取单元,用于至少一个固定设备的至少一个角度接收范围以及第一用户的面部朝向角度;
确定单元,用于确定所述第一用户需要控制的目标设备,所述目标设备的所述角度接收范围与所述第一用户的面部朝向角度相匹配;
控制单元,用于控制所述目标设备执行所述第一用户的语音指令所指示的操作。
第三方面,本申请实施例提供一种电子设备,一个或多个处理器;
一个或多个存储器,用于存储程序,
所述一个或多个存储器和所述程序被配置为,由所述一个或多个处理器控制所述电子设备执行如本申请实施例第一方面任一方法中的步骤的指令。
第四方面,本申请实施例提供了一种芯片,包括:处理器,用于从存储器中调用并运行计算机程序,使得安装有所述芯片的设备执行如本申请实施例第一方面任一方法中所描述的部分或全部步骤。
第五方面,本申请实施例提供了一种计算机可读存储介质,其中,所述计算机可读存储介质存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如本申请实施例第一方面任一方法中所描述的部分或全部步骤。
第六方面,本申请实施例提供了一种计算机程序,其中,所述计算机程序可操作来使 计算机执行如本申请实施例第一方面任一方法中所描述的部分或全部步骤。该计算机程序可以为一个软件安装包。
附图说明
为了更清楚地说明本申请实施例或现有技术中的技术方案,下面将对实施例或现有技术描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1a是本申请实施例提供的一种多设备场景中用户控制的示意图;
图1b是本申请实施例提供的一种设备控制系统10的架构图;
图1c是本申请实施例提供的一种智能语音助手的功能界面示意图;
图1d是本申请实施例提供的一种电子设备的结构示意图;
图2a是本申请实施例提供的一种设备控制方法的流程示意图;
图2b是本申请实施例提供的一种多设备的角度接收范围的示意图;
图2c是本申请实施例提供的一种固定设备的接收角度范围的测量示意图;
图2d是本申请实施例提供的一种展示确定出的目标设备的界面示例图;
图3a是本申请实施例提供的一种设备控制方法的流程示意图;
图3b是本申请实施例提供的一种展示意图设备的示例图;
图3c是本申请实施例提供的另一种展示意图设备的示例图;
图3d是本申请实施例提供的另一种展示意图设备的示例图;
图3e是本申请实施例提供的另一种展示意图设备的示例图;
图4是本申请实施例提供的一种设备控制装置的功能单元组成框图;
图5是本申请实施例提供的另一种设备控制装置的功能单元组成框图;
图6是本申请实施例提供的一种设备控制装置的功能单元组成框图;
图7是本申请实施例提供的另一种设备控制装置的功能单元组成框图。
具体实施方式
为了使本技术领域的人员更好地理解本申请方案,下面将结合本申请实施例中的附图,对本申请实施例中的技术方案进行清楚、完整地描述,显然,所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例。基于本申请中的实施例,本领域普通技术人员在没有作出创造性劳动前提下所获得的所有其他实施例,都属于本申请保护的范围。
本申请的说明书和权利要求书及上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。此外,术语“包括”和“具有”以及它们任何变形,意图在于覆盖不排他的包含。例如包含了一系列步骤或单元的过程、方法、系统、产品或设备没有限定于已列出的步骤或单元,而是可选地还包括没有列出的步骤或单元,或可选地还包括对于这些过程、方法、产品或设备固有的其他步骤或单元。
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。
目前,如图1a所示,用户所处的空间存在智能音箱(与用户距离为0.5m)、智能电视1(与用户距离为0.6m)、电脑(与用户距离为1.2m)、智能电视2(与用户距离为0.55m),用户所处的空间存在多个电视,用户难以使用语音指令控制想要观看的电视。更一般的情况,当用户想听音乐,发出“播放音乐”指令时,当前智能语音助理也存在无法选择合适的 设备满足用户意图的情形。
针对上述问题,本申请实施例提供一种设备控制方法,在智能语音助理面临多设备决策问题时,本申请实施例能够根据用户与设备之间的交互习惯,引入一个新的维度特征--用户面部朝向。该特征使得设备与用户之间的交互更加自然流畅,也让用户与设备的关系融合得更加密切。同时用户朝向的固定设备不需具备任何信号采集能力,这一特性极大拓展了朝向设备的类型与范围。
下面结合附图进行详细说明。
请参阅图1b,图1b是本申请实施例提供的一种设备控制系统10。所述设备控制系统10包括固定设备100(例如:智能电视、智能音箱、智能洗衣机、智能空调、防止在桌子上的手机等自身位置在一段时间内不随用户位置变化而变化的设备)、摄像头200(例如:安装在墙角的监控摄像头、放置在智能冰箱上的监控摄像头等)、安装智能语音助手的仲裁设备300(该仲裁设备可以是固定设备中的任意一个,也可以是移动设备中的任意一个,如用户手机,也可以是智能家居场景中专用的控制盒子,还可以是云端的服务器,还可以是共同完成方案的多个设备组成的设备组,此处不做唯一限定)、用户端的移动设备400(例如:用户手持的手机、手腕佩戴的智能手表等自身位置跟随用户位置变化而变化的设备)以及服务器500,所述仲裁设备300与固定设备100、摄像头200、移动设备400以及服务器500均实现通信连接,形成智能家庭场景中的设备控制网络。
其中,所述智能语音助手可以安装在手机等各类设备上以支持本申请的设备控制方法,其表现出的具体的功能名称、界面交互方式可以是多种多样的,此处不做唯一限定,例如安装在OPPO手机上并呈现如图1c的“Breeno”智能助手的设置功能界面。
需要注意的是,仲裁设备300作为本申请实施例的策略执行设备,与其他设备(如:固定设备100和移动设备400)之间的数据、信令交互方式可以是多种多样的,此处不做唯一限定。例如仲裁设备300可以直接与第一摄像头局域网通信连接获取对应信息,仲裁设备300可以通过移动通信网络连接用户所处空间的智能音箱实现对应的信息交互等。
请参阅图1d,图1d是本申请实施例提供的一种电子设备的结构示意图。该电子设备应用于上述设备控制系统10,所述电子设备包括应用处理器120、存储器130、通信模块140、以及一个或多个程序131,所述应用处理器120通过内部通信总线与所述存储器130、所述通信模块140均通信连接。
其中,所述一个或多个程序131被存储在上述存储器130中,且被配置由上述应用处理器120执行,所述一个或多个程序131包括用于执行上述方法实施例中任一步骤的指令。
其中,应用处理器120例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),专用集成电路(Application-Specific Integrated Circuit,ASIC),现场可编程门阵列(Field Programmable Gate Array,FPGA)或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,单元和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信单元可以是通信模块140、收发器、收发电路等,存储单元可以是存储器130。
所述存储器130可以是易失性存储器或非易失性存储器,或可包括易失性和非易失性存储器两者。其中,非易失性存储器可以是只读存储器(read-only memory,ROM)、可编程只读存储器(programmable ROM,PROM)、可擦除可编程只读存储器(erasable PROM,EPROM)、电可擦除可编程只读存储器(electrically EPROM,EEPROM)或闪存。易失性存储器可以是随机存取存储器(random access memory,RAM),其用作外部高速缓存。通过示例性但不是限制性说明,许多形式的随机存取存储器(random access memory,RAM) 可用,例如静态随机存取存储器(static RAM,SRAM)、动态随机存取存储器(DRAM)、同步动态随机存取存储器(synchronous DRAM,SDRAM)、双倍数据速率同步动态随机存取存储器(double data rate SDRAM,DDR SDRAM)、增强型同步动态随机存取存储器(enhanced SDRAM,ESDRAM)、同步连接动态随机存取存储器(synchlink DRAM,SLDRAM)和直接内存总线随机存取存储器(direct rambus RAM,DR RAM)。
具体实现中,所述应用处理器120用于执行如本申请方法实施例中由仲裁设备执行的任一步骤。
请参阅图2a,图2a是本申请实施例提供的一种设备控制方法的流程示意图,应用于设备控制系统10中的仲裁设备300如图所示,本设备控制方法包括以下操作。
步骤201,获取至少一个固定设备的至少一个角度接收范围以及第一用户的面部朝向角度。
其中,所述至少一个角度接收范围与所述至少一个固定设备一一对应,即每个固定设备对应一个角度接收范围。
其中,所述第一摄像头与第一用户之间的第一距离可以由第一摄像头基于景深算法计算得到。
其中,所述第一用户的面部朝向角度可以表征为相对于当前摄像头的人脸偏转角(yaw)、俯仰角(pitch)、滚动角(roll),通过角度转换能够得到在第一摄像头的坐标系内的角度。
步骤202,确定所述第一用户需要控制的目标设备,所述目标设备的所述角度接收范围与所述第一用户的面部朝向角度相匹配。
其中,所述目标设备可以为固定设备。
其中,用户所朝向的设备不需要进行信号采集工作。用户朝向的设备可以是智能窗帘、灯具、开关、位置不变的手机等设备,也可以是用户手持的手机的设备,只需要安装智能语音助理的仲裁设备能够控制这些设备即可,这一特性极大拓展了朝向设备的类型与范围。
步骤203,控制所述目标设备执行所述第一用户的语音指令所指示的操作。
其中,所述角度接收范围是指固定设备的边界点与用户位置所组成的扇形角度范围,如图2b所示,假设固定设备包括用户所处空间的手机、音箱、电视1、电视2以及电脑,通过手机的边界点与用户位置之间的扇形区域可以确定手机的角度接收范围为图示的角度范围C,通过音箱的边界点与用户位置之间的扇形区域可以确定音箱的角度接收范围为图示的角度范围B,通过电视1的边界点与用户位置之间的扇形区域可以确定电视1的角度接收范围为图示的角度范围A,通过电脑的边界点与用户位置之间的扇形区域可以确定电脑的角度接收范围为图示的角度范围D,通过电视2的边界点与用户位置之间的扇形区域可以确定电视2的角度接收范围为图示的角度范围E。
在一个可能的实例中,所述获取至少一个固定设备的至少一个角度接收范围,包括:根据第一摄像头的位置、所述第一摄像头与所述第一用户之间的第一距离以及所述至少一个固定设备的位置确定所述至少一个固定设备的至少一个角度接收范围。
具体实现中,设备需要获取所述第一摄像头与所述第一用户之间的第一距离,例如通过所述第一摄像头的深度检测算法计算第一摄像头与所述第一用户之间的第一距离。
可见,本示例中,仲裁设备首先获取第一摄像头与第一用户之间的第一距离,以及第一用户的面部朝向角度;其次,根据第一摄像头的位置、第一距离、至少一个固定设备的位置以及第一用户的面部朝向角度,确定第一用户需要控制的目标设备;最后,控制目标设备执行第一用户的语音指令所指示的操作。可见,仲裁设备能够根据第一用户的面部朝向角度结合第一摄像头的位置、第一距离、至少一个固定设备的位置,智能决策第一用户 需要控制的目标设备,从而避免无法准确识别第一用户的控制意图的情况发生,有利于提高设备控制的准确度和智能性。
在本可能的实例中,所述根据第一摄像头的位置所述第一摄像头与所述第一用户之间的第一距离以及所述至少一个固定设备的位置确定所述至少一个固定设备的至少一个角度接收范围,包括:如图2c所示,若坐标点a1为所述第一摄像头的等效位置,以坐标点a1为坐标原点建立直角坐标系Xa1Y,坐标点b1为与根据所述第一距离对应的所述第一用户的等效位置,坐标点b2、坐标点b3为单个固定设备的两个边界点,坐标点a3为坐标点b2在X轴的水平投影点,坐标点a5为坐标点b3在X轴的水平投影点,坐标点a4为射线b1b2与X轴的交点,坐标点a6为射线b1b3与X轴的交点,则在坐标点b1约束下所述单个固定设备的角度接收范围的第一边界角度α1为∠a2b1b2,第二边界角度α2为∠a2b1b3,α1、α2构成所述单个固定设备的角度接收范围。
其中,第一距离对应a1b1的线段长度,根据该a1b1线段长度可以计算出水平投影线段a1a2线段长度以及垂直投影线段a2b1线段长度。
其中,通过如下公式计算α1和α2:
Figure PCTCN2022072355-appb-000001
Figure PCTCN2022072355-appb-000002
具体实现中,结合图2c分析,通过三角形相似定理可得:
Figure PCTCN2022072355-appb-000003
通过求解该式可解得:
Figure PCTCN2022072355-appb-000004
其中,a 2a 3通过a 1a 3-a 1a 2得到,根据三角函数可得:
Figure PCTCN2022072355-appb-000005
同理,通过三角形相似定理可得:
Figure PCTCN2022072355-appb-000006
该式可求得
Figure PCTCN2022072355-appb-000007
其中,a 2a 5通过a 1a 5-a 1a 2得到。
已知a 5a 6可得:
Figure PCTCN2022072355-appb-000008
通过上述公式可确定α 1、α 2的取值。
其中,所述单个固定设备的角度接收范围为[α1,α2]。
如图2d所示的示例图,仲裁设备确定第一用户的意图设备后,可以在手机等载体上通过显示屏显示该交互控制结果,可以通过文字提示信息展示本次确定出来的第一用户需要控制的目标设备(即意图设备)。
可见,本示例中,当通过图像的面部朝向算法检测到当前用户的面部朝向角度,即可判定当前用户朝向该设备,若该设备能够提供用户指令所述的能力,系统即调用该设备响应用户的请求。
在一个可能的实例中,所述获取所述第一用户的面部朝向角度,包括:获取通过所述第一摄像头采集的第一图像;检测到所述第一图像包含至少一个用户的影像信息,确定所述第一图像中所述第一用户的影像信息;根据所述第一图像中所述第一用户的影像信息确定所述第一用户的面部朝向角度。
具体实现中,所述根据所述第一图像中所述第一用户的影像信息确定所述第一用户的面部朝向角度,包括:通过神经网络算法提取相对于第一摄像头的人脸偏转角(yaw)、俯仰角(pitch)、滚动角(roll)。
在本可能的实例中,所述检测到所述第一图像包含至少一个用户的影像信息,确定所述第一图像中所述第一用户的影像信息,包括:
检测到所述第一图像中存在多个用户的影像信息;
根据所述语音指令的声纹信息和/或用户的生物特征信息检测是否能够确定所述第一用户的影像信息;
若否(即根据所述语音指令的声纹信息和/或用户的生物特征信息检测到无法能够确定所述第一用户的影像信息),则根据所述多个用户的影像信息确定所述多个用户的位置,根据所述多个用户的位置、每个用户的声源定位位置信息以及所述每个用户的状态检测是否能够确定所述第一用户的影像信息;
若否(即根据所述多个用户的影像信息确定所述多个用户的位置,根据所述多个用户的位置、每个用户的声源定位位置信息以及所述每个用户的状态检测到无法确定所述第一用户的影像信息),则根据所述多个用户的面部朝向是否存在设备、所述设备能否提供所述语音指令所描述的能力来确定所述第一用户的影像信息。
其中,所述多个用户还包括除所述第一用户之外的第二用户。
其中,用户的生物特征信息是指反映用户面部生物特征的特征数据,如眼睛间距、鼻子现对于脸部的占比、佩戴眼镜等。
具体实现中,仲裁设备可以预先设置或者实时获取用户的影像信息与用户的声纹信息之间的对应关系,和/或用户的影像信息与用户的生物特征信息之间的对应关系,仲裁设备确定语音指令的声纹特征,和/或提取第一图像的生物特征信息,然后查询上述对应关系,若查询到存在对应的用户的影像信息,则可以确定第一图像确实存在第一用户的影像信息。
进一步地,若未能够确定出来,则可以通过对第一图像分析得到每个用户的图像位置,以及通过声源定位技术处理第一用户的语音指令而识别出的第一用户的声源位置,进行位置比对,如果还未能够匹配出来,或者匹配出来多个,则可以进一步通过每个用户的状态进行筛选,其中,该每个用户的状态包括肢体状态和/或面部状态,通过分析每个用户的肢体状态和/或面部状态来确定当前用户是否在执行通过语音指令控制设备的操作。
进一步地,若还未能够确定出来,则可以进一步基于图像分析确定用户面部朝向的设备,以及该设备是否存在语音指令所描述的能力来确定。例如,用户面部朝向设备包括智 能手表,且语音指令所描述的功能是温度调节,显然不匹配,因此被控设备不是智能手表。
具体实现中,所述方法还包括:根据所述语音指令的声纹信息和/或用户的生物特征信息检测到确定出所述第一用户的影像信息。
具体实现中,所述方法还包括:根据所述多个用户的位置、每个用户的声源定位位置信息以及所述每个用户的状态确定所述第一用户的影像信息。
可见,本示例中,对于第一图像中第一用户的影像的识别问题,仲裁设备能够基于多类信息进行梯度化的逐级检测机制,全面、精细化的进行第一用户的检测。
在本可能的示例中,所述方法还包括:检测到所述第一图像中存在单个用户的影像信息;确定所述单个用户的影像信息为所述第一用户的影像信息。
可见,本示例中,针对近存在单个用户的情况,仲裁设备简化算法直接定位当前用户为第一用户,快捷高效实时性好。
在一个可能的实例中,所述确定所述第一用户需要控制的目标设备之前,所述方法还包括:根据所述第一图像中所述第一用户的影像信息检测到所述第一用户的面部未朝向移动设备。
其中,所述移动设备包括可穿戴设备。
具体实现中,可以基于图像分析算法识别出第一用户的面部朝向的图像区域是否存在移动设备,该移动设备可能是用户手持的手机、佩戴的智能手表等。
可见,本示例中,由于实际应用场景中,第一用户可能手持手机并面朝手机进行语音控制,如面朝手机说出“播放老郭的相声”之类的语音指令,因此仲裁设备需要能够基于采集到的第一图像先分析出第一用户是否有针对移动设备的控制意图,并在没有针对移动设备的控制意图的情况下进一步基于面部朝向准确定位需要控制的固定设备,提高设备控制的准确度和全面性。
在本可能的实例中,所述方法还包括:根据所述第一图像中所述第一用户的影像信息检测到所述第一用户的面部朝向存在所述移动设备;根据所述移动设备确定所述第一用户需要控制的目标设备。
其中,所述根据所述移动设备为所述第一用户需要控制的目标设备的具体实现方式包括:若所述移动设备为单个移动设备,则确定所述单个移动设备为所述第一用户需要控制的目标设备;若所述移动设备为多个移动设备,则获取所述多个移动设备中各移动设备的设备状态,并根据所述多个移动设备中各移动设备的设备状态确定所述第一用户需要控制的目标设备。
其中,所述各移动设备的设备状态包括以下至少一种:屏幕状态、是否被用户手持等。
可见,本示例中,由于实际应用场景中,第一用户可能手持手机并面朝手机进行语音控制,如面朝手机说出“播放老郭的相声”之类的语音指令,因此仲裁设备需要能够基于采集到的第一图像先分析出第一用户是否有针对移动设备的控制意图,并在识别出存在针对移动设备的控制意图的情况下,确定该移动设备为第一用户当前需要控制的目标设备,从而避免误识别的情况发生,提高设备控制的准确度和全面性。
在一个可能的实例中,所述第一摄像头是依据所述第一用户的位置进行选择确定。
具体来说,所述第一摄像头为与所述第一用户的声源定位参考位置关联的摄像头;所述第一用户的声源定位参考位置是通过至少三个设备采集所述第一用户的语音指令的时间差、所述三个设备的位置以及声源定位技术确定的。
具体实现中,所述第一摄像头可以是仲裁设备基于第一用户的声源定位结果,从多个摄像头中选择出来的满足预设条件的摄像头,其中,所述满足预设条件可以是以下条件中的至少一种:
摄像头与第一用户在同一房间内;
摄像头与第一用户的距离最小或者小于预设距离阈值;以及,
摄像头的取景范围包括第一用户,或者,摄像头能够正对着第一用户。
仲裁设备选取第一摄像头后,可以通过第一用户的大致方位调整第一摄像头的角度、焦距等状态,使其能清晰准确拍摄出用户的画面。
若当前第一摄像头画面不存在人物,则切换到其他备选摄像头。
若所有摄像头均无法捕捉人物画面,则退出该系统,并可以通过任意设备主动询问用户以确定用户的意图设备,并启动该意图设备为用户服务。
可见,本示例中,仲裁设备能够基于第一用户的声源定位结果从多摄像头中筛选出关联的第一摄像头,提高图像采集检测识别的成功率。
在一个可能的实例中,所述至少一个固定设备的位置和所述第一摄像头的位置是通过视觉扫描定位的方式进行位置标定。
具体实现中,用户可以使用具有双目摄像头的设备定位每个设备的相对位置(包括所属房间号、以及当前房间内的相对位置等),也可以由用户指定。同时用户可对每个设备的位置进行微调,扩大或缩小设备朝向角度的接收范围以提高控制准确度。
可见,本示例中,系统支持视觉扫描定位来快速构建多设备的空间位置关系,并支持用户微调,提高便捷性和准确度。
可以看出,本申请实施例中,仲裁设备首先获取至少一个固定设备的至少一个角度接收范围以及第一用户的面部朝向角度;其次,确定所述第一用户需要控制的目标设备;最后,控制所述目标设备执行所述第一用户的语音指令所指示的操作。可见,仲裁设备能够根据第一用户的面部朝向角度结合至少一个固定设备的角度接收范围,智能决策第一用户需要控制的目标设备,从而避免无法准确识别第一用户的控制意图的情况发生,有利于提高设备控制的准确度和智能性。
请参阅图3a,图3a是本申请实施例提供的一种展示意图设备方法的流程示意图,应用于设备控制系统10中的任意设备,如图所示,本展示意图设备方法包括以下操作。
步骤301,获取第一用户的语音指令的意图设备的检测结果,所述意图设备的检测结果是根据第一摄像头的位置、第一距离、至少一个固定设备的位置以及所述第一用户的面部朝向角度确定的,所述第一距离为所述第一摄像头与第一用户之间的距离。
步骤302,显示所述意图设备的检测结果。
其中,所述语音指令用于所述目标设备执行对应的操作以完成所述第一用户的控制意图。
在一个可能的示例中,所述显示所述意图设备的检测结果,包括:显示设备控制系统空间模型,所述设备控制系统空间模型包括通过视觉扫描定位的方式进行位置标定而得到的所述至少一个固定设备;突出显示所述至少一个固定设备中被确定的目标设备;和/或,显示用于指示所述目标设备为意图设备的提示信息。
举例来说,如图3b所示的意图设备的显示示意图,其中,意图设备为用虚线框标注的电视1,也可以直接将电视1的图标进行高亮显示等,此处不做唯一限定。
又举例来说,如图3c所示的意图设备的显示示意图,其中,通过文字信息展示意图设备为电视2。
可见,本示例中,设备控制系统支持通过显示屏幕直观展示意图设备的检测结果。
在一个可能的示例中,所述显示所述意图设备的检测结果,包括:显示设备控制系统空间模型,所述设备控制系统空间模型包括通过视觉扫描定位的方式进行位置标定而得到的所述至少一个固定设备和被确定的作为目标设备的移动设备;突出显示所述被确定的作 为目标设备的移动设备;和/或,显示用于指示所述目标设备为意图设备的提示信息。
举例来说,如图3d所示的意图设备的显示示意图,其中,意图设备为用突出标注的手机,也可以直接将手机的图标进行高亮显示等,此处不做唯一限定。
又举例来说,如图3e所示的意图设备的显示示意图,其中,通过文字信息展示意图设备为手机。
可见,本示例中,设备控制系统支持通过显示屏幕直观展示意图设备的检测结果。
可以看出,本申请实施例中,设备控制系统能够基于第一用户的面部朝向以及其他关联信息准确确定出第一用户的意图设备,并以可视化方式展示该意图设备的检测结果以直观呈现给用户,提高设备控制的直观性和智能性,提升用户体验。
本申请实施例提供一种设备控制装置,该设备控制装置可以为仲裁设备。具体的,设备控制装置用于执行以上设备控制方法中仲裁设备所执行的步骤。本申请实施例提供的设备控制装置可以包括相应步骤所对应的模块。
本申请实施例可以根据上述方法示例对设备控制装置进行功能模块的划分,例如,可以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图4示出上述实施例中所涉及的设备控制装置的一种可能的结构示意图。如图4所示,设备控制装置4应用于设备控制系统10中的仲裁设备400;所述装置包括:
获取单元40,用于获取第一摄像头与第一用户之间的第一距离,以及所述第一用户的面部朝向角度;
确定单元41,用于根据所述第一摄像头的位置、所述第一距离、至少一个固定设备的位置以及所述第一用户的面部朝向角度,确定所述第一用户需要控制的目标设备;
控制单元42,用于控制所述目标设备执行所述第一用户的语音指令所指示的操作。
在一个可能的示例中,在所述获取至少一个固定设备的至少一个角度接收范围方面,所述获取单元40具体用于:根据第一摄像头的位置、所述第一摄像头与所述第一用户之间的第一距离以及所述至少一个固定设备的位置确定所述至少一个固定设备的至少一个角度接收范围。
在一个可能的示例中,在所述根据第一摄像头的位置所述第一摄像头与所述第一用户之间的第一距离以及所述至少一个固定设备的位置确定所述至少一个固定设备的至少一个角度接收范围方面,所述获取单元40具体用于:若坐标点a1为所述第一摄像头的等效位置,以坐标点a1为坐标原点建立直角坐标系Xa1Y,坐标点b1为与根据所述第一距离对应的所述第一用户的等效位置,坐标点b2、坐标点b3为单个固定设备的两个边界点,坐标点a3为坐标点b2在X轴的水平投影点,坐标点a5为坐标点b3在X轴的水平投影点,坐标点a4为射线b1b2与X轴的交点,坐标点a6为射线b1b3与X轴的交点,则在坐标点b1约束下所述单个固定设备的角度接收范围的第一边界角度α1为∠a2b1b2,第二边界角度α2为∠a2b1b3,α1、α2构成所述单个固定设备的角度接收范围。
在一个可能的示例中,通过如下公式计算α1和α2:
Figure PCTCN2022072355-appb-000009
Figure PCTCN2022072355-appb-000010
在一个可能的示例中,在所述获取第一用户的面部朝向角度方面,所述获取单元40具体用于:获取通过所述第一摄像头采集的第一图像;以及检测到所述第一图像包含至少一个用户的影像信息,确定所述第一图像中所述第一用户的影像信息;以及根据所述第一图像中所述第一用户的影像信息确定所述第一用户的面部朝向角度。
在一个可能的示例中,在所述检测到所述第一图像包含至少一个用户的影像信息,确定所述第一图像中所述第一用户的影像信息方面,获取单元40具体用于:检测到所述第一图像中存在多个用户的影像信息;
根据所述语音指令的声纹信息和/或用户的生物特征信息检测是否能够确定所述第一用户的影像信息;
若否,则根据所述多个用户的影像信息确定所述多个用户的位置,根据所述多个用户的位置、每个用户的声源定位位置信息以及所述每个用户的状态检测是否能够确定所述第一用户的影像信息;
若否,则根据所述多个用户的面部朝向是否存在设备、所述设备能否提供所述语音指令所描述的能力来确定所述第一用户的影像信息。
在一个可能的示例中,所述第一摄像头是依据所述第一用户的位置进行选择确定。
在一个可能的示例中,所述至少一个固定设备的位置和所述第一摄像头的位置是通过视觉扫描定位的方式进行位置标定。
在一个可能的示例中,所述确定单元41确定所述第一用户需要控制的目标设备之前,还用于根据所述第一用户的面部朝向角度确定所述第一用户的面部未朝向移动设备。
在一个可能的示例中,所述确定单元41还用于:根据所述第一图像中所述第一用户的影像信息检测到所述第一用户的面部朝向存在所述移动设备;以及根据所述移动设备确定所述第一用户需要控制的目标设备。
在采用集成的单元的情况下,本申请实施例提供的另一种设备控制装置的结构示意图如图5所示。在图5中,设备控制装置5包括:处理模块50和通信模块51。处理模块50用于对设备控制装置的动作进行控制管理,例如,获取单元40、确定单元41、控制单元42、检测单元43所执行的步骤,和/或用于执行本文所描述的技术的其它过程。通信模块51用于支持设备控制装置与其他设备之间的交互。如图5所示,设备控制装置还可以包括存储模块52,存储模块52用于存储设备控制装置的程序代码和数据。
其中,处理模块50可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块51可以是收发器、RF电路或通信接口等。存储模块52可以是存储器。
其中,上述方法实施例涉及的各场景的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。上述设备控制装置4和设备控制装置5均可执行上述图2a所示的设备控制方法中仲裁设备所执行的步骤。
本申请实施例提供一种设备控制装置,该设备控制装置可以为设备控制系统中的任意设备。具体的,设备控制装置用于执行以上设备控制方法中设备控制系统中的任意设备所执行的步骤。本申请实施例提供的设备控制装置可以包括相应步骤所对应的模块。
本申请实施例可以根据上述方法示例对设备控制装置进行功能模块的划分,例如,可 以对应各个功能划分各个功能模块,也可以将两个或两个以上的功能集成在一个处理模块中。上述集成的模块既可以采用硬件的形式实现,也可以采用软件功能模块的形式实现。本申请实施例中对模块的划分是示意性的,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式。
在采用对应各个功能划分各个功能模块的情况下,图6示出上述实施例中所涉及的设备控制装置的一种可能的结构示意图。如图6所示,设备控制装置6应用于设备控制系统10中的仲裁设备600;所述装置包括:
获取单元60,用于获取第一用户的语音指令的意图设备的检测结果,所述意图设备的检测结果是根据第一摄像头的位置、第一距离、至少一个固定设备的位置以及所述第一用户的面部朝向角度确定的,所述第一距离为所述第一摄像头与第一用户之间的距离;
显示单元61,用于显示所述意图设备的检测结果。
在一个可能的示例中,所述语音指令用于所述目标设备执行对应的操作以完成所述第一用户的控制意图。
在一个可能的示例中,在所述显示所述意图设备的检测结果方面,所述显示单元61,具体用于显示设备控制系统空间模型,所述设备控制系统空间模型包括通过视觉扫描定位的方式进行位置标定而得到的所述至少一个固定设备;以及突出显示所述至少一个固定设备中被确定的目标设备;和/或,显示用于指示所述目标设备为意图设备的提示信息。
在一个可能的示例中,在所述显示所述意图设备的检测结果方面,所述显示单元61,具体用于显示设备控制系统空间模型,所述设备控制系统空间模型包括通过视觉扫描定位的方式进行位置标定而得到的所述至少一个固定设备和被确定的作为目标设备的移动设备;以及突出显示所述被确定的作为目标设备的移动设备;和/或,显示用于指示所述目标设备为意图设备的提示信息。
在采用集成的单元的情况下,本申请实施例提供的另一种设备控制装置的结构示意图如图7所示。在图7中,设备控制装置7包括:处理模块70和通信模块71。处理模块70用于对设备控制装置的动作进行控制管理,例如,获取单元60、显示单元61所执行的步骤,和/或用于执行本文所描述的技术的其它过程。通信模块71用于支持设备控制装置与其他设备之间的交互。如图7所示,设备控制装置还可以包括存储模块72,存储模块72用于存储设备控制装置的程序代码和数据。
其中,处理模块70可以是处理器或控制器,例如可以是中央处理器(Central Processing Unit,CPU),通用处理器,数字信号处理器(Digital Signal Processor,DSP),ASIC,FPGA或者其他可编程逻辑器件、晶体管逻辑器件、硬件部件或者其任意组合。其可以实现或执行结合本申请公开内容所描述的各种示例性的逻辑方框,模块和电路。所述处理器也可以是实现计算功能的组合,例如包含一个或多个微处理器组合,DSP和微处理器的组合等等。通信模块71可以是收发器、RF电路或通信接口等。存储模块72可以是存储器。
其中,上述方法实施例涉及的各场景的所有相关内容均可以援引到对应功能模块的功能描述,在此不再赘述。上述设备控制装置6和设备控制装置7均可执行上述图2a所示的设备控制方法中仲裁设备所执行的步骤。
上述实施例,可以全部或部分地通过软件、硬件、固件或其他任意组合来实现。当使用软件实现时,上述实施例可以全部或部分地以计算机程序产品的形式实现。所述计算机程序产品包括一个或多个计算机指令或计算机程序。在计算机上加载或执行所述计算机指令或计算机程序时,全部或部分地产生按照本申请实施例所述的流程或功能。所述计算机可以为通用计算机、专用计算机、计算机网络、或者其他可编程装置。所述计算机指令可以存储在计算机可读存储介质中,或者从一个计算机可读存储介质向另一个计算机可读存 储介质传输,例如,所述计算机指令可以从一个网站站点、计算机、服务器或数据中心通过有线或无线方式向另一个网站站点、计算机、服务器或数据中心进行传输。所述计算机可读存储介质可以是计算机能够存取的任何可用介质或者是包含一个或多个可用介质集合的服务器、数据中心等数据存储设备。所述可用介质可以是磁性介质(例如,软盘、硬盘、磁带)、光介质(例如,DVD)、或者半导体介质。半导体介质可以是固态硬盘。
本申请实施例还提供一种计算机存储介质,其中,该计算机存储介质存储用于电子数据交换的计算机程序,该计算机程序使得计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤,上述计算机包括电子设备。
本申请实施例还提供一种计算机程序产品,上述计算机程序产品包括存储了计算机程序的非瞬时性计算机可读存储介质,上述计算机程序可操作来使计算机执行如上述方法实施例中记载的任一方法的部分或全部步骤。该计算机程序产品可以为一个软件安装包,上述计算机包括电子设备。
应理解,在本申请的各种实施例中,上述各过程的序号的大小并不意味着执行顺序的先后,各过程的执行顺序应以其功能和内在逻辑确定,而不应对本申请实施例的实施过程构成任何限定。
在本申请所提供的几个实施例中,应该理解到,所揭露的方法、装置和系统,可以通过其它的方式实现。例如,以上所描述的装置实施例仅仅是示意性的;例如,所述单元的划分,仅仅为一种逻辑功能划分,实际实现时可以有另外的划分方式;例如多个单元或组件可以结合或者可以集成到另一个系统,或一些特征可以忽略,或不执行。另一点,所显示或讨论的相互之间的耦合或直接耦合或通信连接可以是通过一些接口,装置或单元的间接耦合或通信连接,可以是电性,机械或其它的形式。
所述作为分离部件说明的单元可以是或者也可以不是物理上分开的,作为单元显示的部件可以是或者也可以不是物理单元,即可以位于一个地方,或者也可以分布到多个网络单元上。可以根据实际的需要选择其中的部分或者全部单元来实现本实施例方案的目的。
另外,在本发明各个实施例中的各功能单元可以集成在一个处理单元中,也可以是各个单元单独物理包括,也可以两个或两个以上单元集成在一个单元中。上述集成的单元既可以采用硬件的形式实现,也可以采用硬件加软件功能单元的形式实现。
上述以软件功能单元的形式实现的集成的单元,可以存储在一个计算机可读取存储介质中。上述软件功能单元存储在一个存储介质中,包括若干指令用以使得一台计算机设备(可以是个人计算机,服务器,或者网络设备等)执行本发明各个实施例所述方法的部分步骤。而前述的存储介质包括:U盘、移动硬盘、只读存储器(Read-Only Memory,简称ROM)、随机存取存储器(Random Access Memory,简称RAM)、磁碟或者光盘等各种可以存储程序代码的介质。
虽然本发明披露如上,但本发明并非限定于此。任何本领域技术人员,在不脱离本发明的精神和范围内,可轻易想到变化或替换,均可作各种更动与修改,包含上述不同功能、实施步骤的组合,包含软件和硬件的实施方式,均在本发明的保护范围。

Claims (20)

  1. 一种设备控制方法,其特征在于,包括:
    获取至少一个固定设备的至少一个角度接收范围以及第一用户的面部朝向角度;
    确定所述第一用户需要控制的目标设备,所述目标设备的所述角度接收范围与所述第一用户的面部朝向角度相匹配;
    控制所述目标设备执行所述第一用户的语音指令所指示的操作。
  2. 根据权利要求1所述的方法,其特征在于,所述获取至少一个固定设备的至少一个角度接收范围,包括:
    根据第一摄像头的位置、所述第一摄像头与所述第一用户之间的第一距离以及所述至少一个固定设备的位置确定所述至少一个固定设备的至少一个角度接收范围。
  3. 根据权利要求2所述的方法,其特征在于,所述根据第一摄像头的位置所述第一摄像头与所述第一用户之间的第一距离以及所述至少一个固定设备的位置确定所述至少一个固定设备的至少一个角度接收范围,包括:
    若坐标点a1为所述第一摄像头的等效位置,以坐标点a1为坐标原点建立直角坐标系Xa1Y,坐标点b1为与根据所述第一距离对应的所述第一用户的等效位置,坐标点b2、坐标点b3为单个固定设备的两个边界点,坐标点a3为坐标点b2在X轴的水平投影点,坐标点a5为坐标点b3在X轴的水平投影点,坐标点a4为射线b1b2与X轴的交点,坐标点a6为射线b1b3与X轴的交点,则在坐标点b1约束下所述单个固定设备的角度接收范围的第一边界角度α1为∠a2b1b2,第二边界角度α2为∠a2b1b3,α1、α2构成所述单个固定设备的角度接收范围。
  4. 根据权利要求3所述的方法,其特征在于,通过如下公式计算α1和α2:
    Figure PCTCN2022072355-appb-100001
    Figure PCTCN2022072355-appb-100002
  5. 根据权利要求1-4任一项所述的方法,其特征在于,所述获取第一用户的面部朝向角度,包括:
    获取通过所述第一摄像头采集的第一图像;
    检测到所述第一图像包含至少一个用户的影像信息,确定所述第一图像中所述第一用户的影像信息;
    根据所述第一图像中所述第一用户的影像信息确定所述第一用户的面部朝向角度。
  6. 根据权利要求5所述的方法,其特征在于,所述检测到所述第一图像包含至少一个用户的影像信息,确定所述第一图像中所述第一用户的影像信息,包括:
    检测到所述第一图像中存在多个用户的影像信息;
    根据所述语音指令的声纹信息和/或用户的生物特征信息检测是否能够确定所述第一用户的影像信息;
    若否,则根据所述多个用户的影像信息确定所述多个用户的位置,根据所述多个用户的位置、每个用户的声源定位位置信息以及所述每个用户的状态检测是否能够确定所述第一用户的影像信息;
    若否,则根据所述多个用户的面部朝向是否存在设备、所述设备能否提供所述语音指令所描述的能力来确定所述第一用户的影像信息。
  7. 根据权利要求1-6任一项所述的方法,其特征在于,所述第一摄像头是依据所述第 一用户的位置进行选择确定。
  8. 根据权利要求1-6任一项所述的方法,其特征在于,所述至少一个固定设备的位置和所述第一摄像头的位置是通过视觉扫描定位的方式进行位置标定。
  9. 根据权利要求1-8任一项所述的方法,其特征在于,所述确定所述第一用户需要控制的目标设备之前,所述方法还包括:
    根据所述第一用户的面部朝向角度确定所述第一用户的面部未朝向移动设备。
  10. 根据权利要求5所述的方法,其特征在于,所述方法还包括:
    根据所述第一图像中所述第一用户的影像信息检测到所述第一用户的面部朝向存在所述移动设备;
    根据所述移动设备确定所述第一用户需要控制的目标设备。
  11. 一种设备控制装置,其特征在于,包括:
    获取单元,用于至少一个固定设备的至少一个角度接收范围以及第一用户的面部朝向角度;
    确定单元,用于确定所述第一用户需要控制的目标设备,所述目标设备的所述角度接收范围与所述第一用户的面部朝向角度相匹配;
    控制单元,用于控制所述目标设备执行所述第一用户的语音指令所指示的操作。
  12. 根据权利要求11所述的装置,其特征在于,在所述获取至少一个固定设备的至少一个角度接收范围方面,所述获取单元具体用于:根据第一摄像头的位置、所述第一摄像头与所述第一用户之间的第一距离以及所述至少一个固定设备的位置确定所述至少一个固定设备的至少一个角度接收范围。
  13. 根据权利要求12所述的装置,其特征在于,在所述根据第一摄像头的位置所述第一摄像头与所述第一用户之间的第一距离以及所述至少一个固定设备的位置确定所述至少一个固定设备的至少一个角度接收范围方面,所述获取单元具体用于:若坐标点a1为所述第一摄像头的等效位置,以坐标点a1为坐标原点建立直角坐标系Xa1Y,坐标点b1为与根据所述第一距离对应的所述第一用户的等效位置,坐标点b2、坐标点b3为单个固定设备的两个边界点,坐标点a3为坐标点b2在X轴的水平投影点,坐标点a5为坐标点b3在X轴的水平投影点,坐标点a4为射线b1b2与X轴的交点,坐标点a6为射线b1b3与X轴的交点,则在坐标点b1约束下所述单个固定设备的角度接收范围的第一边界角度α1为∠a2b1b2,第二边界角度α2为∠a2b1b3,α1、α2构成所述单个固定设备的角度接收范围。
  14. 根据权利要求13所述的装置,其特征在于,通过如下公式计算α1和α2:
    Figure PCTCN2022072355-appb-100003
    Figure PCTCN2022072355-appb-100004
  15. 根据权利要求11-14任一项所述的装置,其特征在于,在所述获取第一用户的面部朝向角度方面,所述获取单元具体用于:获取通过所述第一摄像头采集的第一图像;以及检测到所述第一图像包含至少一个用户的影像信息,确定所述第一图像中所述第一用户的影像信息;以及根据所述第一图像中所述第一用户的影像信息确定所述第一用户的面部朝向角度。
  16. 根据权利要求15所述的装置,其特征在于,在所述检测到所述第一图像包含至少一个用户的影像信息,确定所述第一图像中所述第一用户的影像信息方面,所述获取单元 具体用于:检测到所述第一图像中存在多个用户的影像信息;
    根据所述语音指令的声纹信息和/或用户的生物特征信息检测是否能够确定所述第一用户的影像信息;
    若否,则根据所述多个用户的影像信息确定所述多个用户的位置,根据所述多个用户的位置、每个用户的声源定位位置信息以及所述每个用户的状态检测是否能够确定所述第一用户的影像信息;
    若否,则根据所述多个用户的面部朝向是否存在设备、所述设备能否提供所述语音指令所描述的能力来确定所述第一用户的影像信息。
  17. 根据权利要求11-16任一项所述的装置,其特征在于,所述第一摄像头是依据所述第一用户的位置进行选择确定。
  18. 根据权利要求11-16任一项所述的装置,其特征在于,所述至少一个固定设备的位置和所述第一摄像头的位置是通过视觉扫描定位的方式进行位置标定。
  19. 一种电子设备,其特征在于,包括:
    一个或多个处理器;
    一个或多个存储器,用于存储程序,
    所述一个或多个存储器和所述程序被配置为,由所述一个或多个处理器控制所述电子设备执行如权利要求1-10任一项所述的方法中的步骤。
  20. 一种计算机可读存储介质,其特征在于,存储用于电子数据交换的计算机程序,其中,所述计算机程序使得计算机执行如权利要求1-10任一项所述的方法。
PCT/CN2022/072355 2021-03-10 2022-01-17 设备控制方法及相关装置 WO2022188552A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202110263322.5A CN115086095A (zh) 2021-03-10 2021-03-10 设备控制方法及相关装置
CN202110263322.5 2021-03-10

Publications (1)

Publication Number Publication Date
WO2022188552A1 true WO2022188552A1 (zh) 2022-09-15

Family

ID=83226327

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/072355 WO2022188552A1 (zh) 2021-03-10 2022-01-17 设备控制方法及相关装置

Country Status (2)

Country Link
CN (1) CN115086095A (zh)
WO (1) WO2022188552A1 (zh)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108398906A (zh) * 2018-03-27 2018-08-14 百度在线网络技术(北京)有限公司 设备控制方法、装置、电器、总控设备及存储介质
CN108490832A (zh) * 2018-03-27 2018-09-04 百度在线网络技术(北京)有限公司 用于发送信息的方法和装置
CN109032039A (zh) * 2018-09-05 2018-12-18 北京羽扇智信息科技有限公司 一种语音控制的方法及装置
WO2019179442A1 (zh) * 2018-03-21 2019-09-26 北京猎户星空科技有限公司 智能设备的交互目标确定方法和装置
CN111583937A (zh) * 2020-04-30 2020-08-25 珠海格力电器股份有限公司 一种语音控制唤醒方法及存储介质、处理器、语音设备、智能家电
WO2020244573A1 (zh) * 2019-06-06 2020-12-10 阿里巴巴集团控股有限公司 一种语音指令的处理方法、设备及控制系统
CN112201243A (zh) * 2020-09-29 2021-01-08 戴姆勒股份公司 人机交互设备以及相应的可移动的用户终端

Family Cites Families (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107728482A (zh) * 2016-08-11 2018-02-23 阿里巴巴集团控股有限公司 控制系统、控制处理方法及装置
JP2019103009A (ja) * 2017-12-05 2019-06-24 パナソニックIpマネジメント株式会社 指向性制御装置と収音システムおよび指向性制御方法、指向性制御プログラム
CN108241434B (zh) * 2018-01-03 2020-01-14 Oppo广东移动通信有限公司 基于景深信息的人机交互方法、装置、介质及移动终端
CN108509890B (zh) * 2018-03-27 2022-08-16 百度在线网络技术(北京)有限公司 用于提取信息的方法和装置
CN110853619B (zh) * 2018-08-21 2022-11-25 上海博泰悦臻网络技术服务有限公司 人机交互方法、控制装置、被控装置及存储介质
CN110691196A (zh) * 2019-10-30 2020-01-14 歌尔股份有限公司 一种音频设备的声源定位的方法及音频设备
CN111261159B (zh) * 2020-01-19 2022-12-13 百度在线网络技术(北京)有限公司 信息指示的方法及装置
CN111782045A (zh) * 2020-06-30 2020-10-16 歌尔科技有限公司 一种设备角度调节方法、装置、智能音箱及存储介质
CN112133296A (zh) * 2020-08-27 2020-12-25 北京小米移动软件有限公司 全双工语音控制方法、装置、存储介质及语音设备

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2019179442A1 (zh) * 2018-03-21 2019-09-26 北京猎户星空科技有限公司 智能设备的交互目标确定方法和装置
CN108398906A (zh) * 2018-03-27 2018-08-14 百度在线网络技术(北京)有限公司 设备控制方法、装置、电器、总控设备及存储介质
CN108490832A (zh) * 2018-03-27 2018-09-04 百度在线网络技术(北京)有限公司 用于发送信息的方法和装置
CN109032039A (zh) * 2018-09-05 2018-12-18 北京羽扇智信息科技有限公司 一种语音控制的方法及装置
WO2020244573A1 (zh) * 2019-06-06 2020-12-10 阿里巴巴集团控股有限公司 一种语音指令的处理方法、设备及控制系统
CN111583937A (zh) * 2020-04-30 2020-08-25 珠海格力电器股份有限公司 一种语音控制唤醒方法及存储介质、处理器、语音设备、智能家电
CN112201243A (zh) * 2020-09-29 2021-01-08 戴姆勒股份公司 人机交互设备以及相应的可移动的用户终端

Also Published As

Publication number Publication date
CN115086095A (zh) 2022-09-20

Similar Documents

Publication Publication Date Title
US10254936B2 (en) Devices and methods to receive input at a first device and present output in response on a second device different from the first device
US20220020339A1 (en) Display method and apparatus
US11483657B2 (en) Human-machine interaction method and device, computer apparatus, and storage medium
US10540995B2 (en) Electronic device and method for recognizing speech
CN109032039B (zh) 一种语音控制的方法及装置
TW201903644A (zh) 人臉識別方法、裝置以及虛假用戶的識別方法、裝置
KR102481486B1 (ko) 오디오 제공 방법 및 그 장치
US20150301609A1 (en) Gesture recognition method and gesture recognition apparatus
US10269377B2 (en) Detecting pause in audible input to device
US9947137B2 (en) Method for effect display of electronic device, and electronic device thereof
US10499164B2 (en) Presentation of audio based on source
CN114494487B (zh) 基于全景图语义拼接的户型图生成方法、设备及存储介质
KR20150064354A (ko) 입력 처리 방법 및 그 전자 장치
US20150220171A1 (en) Method for processing input and electronic device thereof
WO2023051305A1 (zh) 一种智能设备控制方法、系统、电子设备及存储介质
CN113559501B (zh) 游戏中的虚拟单位选取方法及装置、存储介质及电子设备
CN114529621A (zh) 户型图生成方法、装置、电子设备及介质
WO2022188552A1 (zh) 设备控制方法及相关装置
CN112533070A (zh) 视频声音和画面的调整方法、终端和计算机可读存储介质
WO2023142266A1 (zh) 远程交互方法、远程交互设备以及计算机存储介质
US9589126B2 (en) Lock control method and electronic device thereof
US11393198B1 (en) Interactive insurance inventory and claim generation
US10795432B1 (en) Maintaining virtual object location
US20230194654A1 (en) Detection of device providing audible notification and presentation of id/location of device in response
US11520145B2 (en) Visual overlay of distance information in video feed

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22766079

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 22766079

Country of ref document: EP

Kind code of ref document: A1