CN110827818B

CN110827818B - Control method, device, equipment and storage medium of intelligent voice equipment

Info

Publication number: CN110827818B
Application number: CN201911138882.7A
Authority: CN
Inventors: 孔秀哲
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2019-11-20
Filing date: 2019-11-20
Publication date: 2024-04-09
Anticipated expiration: 2039-11-20
Also published as: CN110827818A

Abstract

The invention provides a control method and device of intelligent voice equipment, electronic equipment and a storage medium; the method comprises the following steps: receiving a voice signal of a user in a space where first intelligent voice equipment is located; performing sensing processing on the space according to the voice signals to determine intelligent voice equipment included in the space and the position relation between the intelligent voice equipment and a user; when the space also comprises at least one second intelligent voice device, determining a target intelligent voice device meeting the use scene of the space in the first intelligent voice device and the at least one second intelligent voice device according to the position relation, and triggering the target intelligent voice device to be in an awakening state so as to respond to the voice signal of the user. According to the invention, the intelligent response to the voice signals of the user can be realized in the complex environment of a plurality of intelligent voice devices, so that the experience of the user is improved.

Description

Control method, device, equipment and storage medium of intelligent voice equipment

Technical Field

The present invention relates to artificial intelligence technology, and in particular, to a method and apparatus for controlling an intelligent voice device, an electronic device, and a storage medium.

Background

Artificial intelligence (Artificial Intelligence, AI) is a comprehensive technology of computer science, and by researching the design principles and implementation methods of various intelligent machines, the machines have the functions of sensing, reasoning and decision. Artificial intelligence technology is a comprehensive subject, and relates to a wide range of fields, such as natural language processing technology, machine learning/deep learning and other directions, and with the development of technology, the artificial intelligence technology will be applied in more fields and has an increasingly important value.

Along with the development of computer technology, intelligent voice equipment becomes one of important applications in the field of artificial intelligence, and intelligent voice equipment can be used for solving various problems through intelligent interaction between intelligent dialogue and instant question answering, namely, the intelligent voice equipment can answer the problems of a user and can meet the requirements set by the user, for example, the user needs to play an XX song, and then the intelligent voice equipment can play the XX song for the user.

However, with more and more intelligent Voice devices accessing to a Voice Service (VS), there are many situations in which multiple intelligent Voice devices exist in the same scene (the same home or the same room), and in this case, if a user wakes up the intelligent Voice devices and initiates a Voice request, multiple intelligent Voice devices respond to and reply to the Voice request initiated by the user at the same time, so that the experience of the user is greatly reduced.

Disclosure of Invention

The embodiment of the invention provides a control method, a control device, electronic equipment and a storage medium of intelligent voice equipment, which can realize intelligent response to voice signals of users in complex environments of a plurality of intelligent voice equipment, thereby improving experience of the users.

The technical scheme of the embodiment of the invention is realized as follows:

the embodiment of the invention provides a control method of intelligent voice equipment, which comprises the following steps:

receiving a voice signal of a user in a space where first intelligent voice equipment is located;

performing sensing processing on the space according to the voice signal to determine intelligent voice equipment included in the space and a position relation between the intelligent voice equipment and the user;

when the space further comprises at least one second intelligent voice device, determining target intelligent voice devices meeting the use scene of the space in the first intelligent voice device and the at least one second intelligent voice device according to the position relation, and

triggering the target intelligent voice equipment to be in an awakening state to respond to the voice signal of the user.

In the above technical solution, the determining, according to the positional relationship, among the first intelligent voice device and the at least one second intelligent voice device, a target intelligent voice device that meets a usage scenario of the space includes:

Identifying a user corresponding to the voice signal of the user based on the voiceprint characteristics of the voice signal of the user;

based on a user account bound by the first intelligent voice device and at least one second intelligent voice device, when the user corresponding to the account is determined to be the user corresponding to the voice signal, the intelligent voice device corresponding to the account is determined to be the wakeable intelligent voice device;

and determining the intelligent voice equipment with the highest matching degree with the position relation between the user as target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment.

The embodiment of the invention provides a control device of intelligent voice equipment, which comprises the following components:

the receiving module is used for receiving the voice signal of the user in the space where the first intelligent voice equipment is located;

the sensing module is used for sensing the space according to the voice signals so as to determine intelligent voice equipment included in the space and the position relation between the intelligent voice equipment and the user;

a processing module, configured to determine, when the space further includes at least one second intelligent voice device, a target intelligent voice device that meets a usage scenario of the space among the first intelligent voice device and the at least one second intelligent voice device according to the positional relationship, and

And the triggering module is used for triggering the target intelligent voice equipment to be in an awakening state so as to respond to the voice signal of the user.

In the above technical solution, the sensing module is further configured to perform, for a voice signal of the user received by any intelligent voice device in the space, the following processing:

analyzing the voice signals of the user received by the intelligent voice equipment from a plurality of directions to obtain energy values of the voice signals of the user received by the intelligent voice equipment from the plurality of directions;

determining the direction corresponding to the maximum energy value as the direction of the user relative to the intelligent voice equipment, and

and determining that the distance corresponding to the attenuation value is the distance between the intelligent voice equipment and the user according to the relation of the attenuation of the energy value of the voice signal along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user.

analyzing the voice signal to obtain a first distance between the intelligent voice equipment and the user and a first direction of the user relative to the intelligent voice equipment;

In response to the received voice signal, detecting an obstacle in the space to obtain a second distance between the intelligent voice equipment and the user;

identifying obstacles in the space to obtain a second direction of the user relative to the intelligent voice equipment;

and when the distance difference between the first distance and the second distance is greater than a distance error threshold value and/or the direction error between the first direction and the second direction is greater than a direction error threshold value, determining a weighted value of the first distance and the second distance as the distance between the intelligent voice equipment and the user, and determining an average value of the first direction and the second direction as the direction of the user relative to the intelligent voice equipment.

In the above technical solution, the processing module is further configured to perform the following processing on a positional relationship obtained by performing a sensing process on a voice signal received by any one of the intelligent devices in the space:

when the time for keeping the position relationship unchanged exceeds a time threshold, determining that the user is in a static state;

and determining the intelligent voice equipment with the minimum distance from the user in the space as target intelligent voice equipment according to the distance between the intelligent voice equipment and the user included in the position relation.

when the position relation changes, determining that the user is in a motion state;

determining the change direction of the direction as the moving direction of the user relative to the intelligent voice equipment according to the direction of the user relative to the intelligent voice equipment, which is included by the position relation;

multiplying the reciprocal of the distance by the vector of the moving direction according to the distance between the user and the intelligent equipment included in the position relation to obtain the matching degree of the position relation between the intelligent voice equipment and the user;

determining the intelligent voice device with the highest matching degree as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device;

when the direction change of the user relative to the intelligent voice equipment approaches the intelligent voice equipment, the moving direction value is positive, and when the direction change of the user relative to the intelligent voice equipment is far away from the intelligent voice equipment, the moving direction value is negative.

In the above technical solution, the processing module is further configured to determine an intelligent voice device in an awake state in the first intelligent voice device and the at least one second intelligent voice device;

when the distance between the intelligent voice equipment in the awakening state and the user does not exceed the critical distance, determining the intelligent voice equipment in the awakening state as target intelligent voice equipment;

the critical distance is the maximum distance when the user and the intelligent voice equipment can correctly perceive voice signals sent by the other party.

In the above technical solution, the processing module is further configured to, when there is an intelligent voice device that is interacting with the user in the first intelligent voice device and the at least one second intelligent voice device, and

and when the distance between the intelligent voice equipment and the user does not exceed the critical distance, determining the intelligent voice equipment which is interacting with the user as target intelligent voice equipment.

In the above technical solution, the processing module is further configured to determine a trend of change in a positional relationship between the intelligent voice device in the awake state and the user before receiving the voice signal;

when the intelligent voice equipment in the awakening state is determined to exceed a critical distance according to the change trend of the position relationship, the intelligent voice equipment with the highest matching degree with the position relationship between the users is determined to be target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment;

The triggering module is further used for triggering the intelligent voice equipment in the awakening state to be in a standby state when the intelligent voice equipment with the highest matching degree with the position relation of the user is determined to be the target intelligent voice equipment, and awakening the target intelligent voice equipment in real time.

In the above technical solution, the processing module is further configured to determine a trend of change in a positional relationship between the intelligent voice device in the awake state and the user before the voice signal is received;

when the intelligent voice equipment in the awakening state is determined to exceed a critical distance within a preset duration according to the change trend of the position relationship, determining the intelligent voice equipment with the highest matching degree with the position relationship between the users as target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment;

the triggering module is further configured to wake up the target intelligent voice device in advance before the intelligent voice device in the wake-up state does not exceed the critical distance.

And when the intelligent voice equipment in the awakening state is determined not to exceed the critical distance according to the change trend of the position relation, determining the intelligent voice equipment in the awakening state as the target intelligent voice equipment.

In the above technical solution, the processing module is further configured to obtain historical data of the intelligent voice device in the wake-up state before receiving the voice signal, and predict, by using an artificial intelligent model, the expected use time of the intelligent voice device in the wake-up state in combination with a trend of change in a positional relationship of the wake-up intelligent voice device, a use time period, and a wake-up number;

determining the intelligent voice device with the highest matching degree with the position relation between the user as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device;

the triggering module is also used for waking up the target intelligent voice equipment in real time when the expected use time is up; or,

the target intelligent voice device is awakened in advance before the expected use time arrives.

In the above technical solution, the apparatus further includes:

the switching module is used for triggering the intelligent voice equipment which is in an awake state and is outside the target intelligent voice equipment to switch to a standby state in real time; or,

And waiting for a preset time period, determining a change trend of the position relation between the target intelligent voice equipment and the user according to the intelligent voice equipment which is in an awake state and is outside the target intelligent voice equipment in the preset time period, and triggering the intelligent voice equipment in the awake state to switch to a standby state when determining that the intelligent voice equipment in the awake state exceeds a critical distance in a preset time period.

In the above technical solution, the apparatus further includes:

and the response module is used for triggering the target intelligent voice equipment to respond to the last voice signal again when the distance between the intelligent voice equipment in the awakening state and the user exceeds a critical distance in the process that the target intelligent voice equipment and the intelligent voice equipment in the awakening state respond to the voice signal of the user last time are not the same equipment before the target intelligent voice equipment receives the voice signal.

In the above technical solution, the processing module is further configured to determine, based on the user account bound by the first intelligent voice device and at least one second intelligent voice device, that, when the account corresponds to a plurality of intelligent voice devices, an intelligent voice device with a highest matching degree with a position relationship between the users among the plurality of intelligent voice devices is a target intelligent voice device.

In the above technical solution, the processing module is further configured to identify a user corresponding to the voice signal of the user based on voiceprint features of the voice signal of the user;

The embodiment of the invention provides intelligent voice equipment, which comprises the following components:

a memory for storing executable instructions;

and the processor is used for realizing the control method of the intelligent voice equipment when executing the executable instructions stored in the memory.

The embodiment of the invention provides a server for controlling intelligent voice equipment, which comprises the following components:

a memory for storing executable instructions;

The embodiment of the invention provides a storage medium which stores executable instructions for realizing the control method of the intelligent voice equipment when being executed by a processor.

The embodiment of the invention has the following beneficial effects:

the target intelligent voice equipment meeting the use scene of the space is determined in the first intelligent voice equipment and the at least one second intelligent voice equipment according to the position relation, and the target intelligent voice equipment is triggered to respond to the voice signal of the user, so that the intelligent voice equipment under the same scene is prevented from responding to the voice signal of the user, and the experience of the user is improved.

Drawings

Fig. 1 is a schematic diagram of an optional application scenario 10 of a control method of an intelligent voice device according to an embodiment of the present invention;

fig. 2 is a schematic structural diagram of an electronic device 500 according to an embodiment of the present invention;

FIGS. 3A-3C are schematic flow diagrams of a control method of an intelligent voice device according to an embodiment of the present invention;

fig. 4 is a flow chart of a control method of an intelligent voice device according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a user waking up an intelligent device according to an embodiment of the present invention;

fig. 6 is a schematic diagram of an application scenario of an intelligent voice device according to an embodiment of the present invention;

Fig. 7 is a schematic diagram of an application scenario in which an intelligent voice device interacts with a cloud end according to an embodiment of the present invention;

fig. 8 is a waveform diagram of voice data uploaded to a cloud by an intelligent voice device 1 according to an embodiment of the present invention;

fig. 9 is a spectrum diagram of voice data uploaded to a cloud by the intelligent voice device 1 according to the embodiment of the present invention;

fig. 10 is a waveform diagram of voice data uploaded to a cloud by the intelligent voice device 2 according to the embodiment of the present invention;

fig. 11 is a spectrum diagram of voice data uploaded to a cloud by the intelligent voice device 2 according to the embodiment of the present invention;

fig. 12 is a schematic diagram of another application scenario in which an intelligent voice device interacts with a cloud according to an embodiment of the present invention.

Detailed Description

The present invention will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present invention more apparent, and the described embodiments should not be construed as limiting the present invention, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present invention.

In the following description, the terms "first", "second", and the like are merely used to distinguish between similar objects and do not represent a particular ordering of the objects, it being understood that the "first", "second", or the like may be interchanged with one another, if permitted, to enable embodiments of the invention described herein to be practiced otherwise than as illustrated or described herein.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this invention belongs. The terminology used herein is for the purpose of describing embodiments of the invention only and is not intended to be limiting of the invention.

Before describing embodiments of the present invention in further detail, the terms and terminology involved in the embodiments of the present invention will be described, and the terms and terminology involved in the embodiments of the present invention will be used in the following explanation.

1) A voice assistant: the intelligent terminal application helps users to solve various problems through intelligent interaction between intelligent dialogue and instant question and answer, and mainly helps users to solve life problems.

2) Cloud: the cloud platform is also called as a software platform adopting an application program virtualization technology (Application Virtualization), and integrates multiple functions of software searching, downloading, using, managing, backing up and the like. Through the platform, various common software can be packaged in an independent virtualized environment, so that the application software cannot be coupled with a system, and the purpose of using the software in a green way is achieved.

3) Energy value: the greater the energy value of the voice data, the more clear the intelligent voice device receives the voice information of the user, that is, the closer the intelligent voice device is to the user. The energy value can be represented by the waveform diagram and the spectrogram, when the amplitude of the waveform on the waveform diagram is larger, the energy value of the voice data is larger, namely, the energy value of the voice data is in a proportional relation with the amplitude of the waveform; the more active the high frequency region on the spectrogram, the greater the energy value of the voice data, and the more active the high frequency region on the spectrogram, the greater the energy value of the voice data, i.e., the energy value of the voice data is in direct proportion to the activity level of the high frequency region.

4) And (3) voice recognition: a process for letting the machine convert the speech signal into a corresponding text or command by means of a recognition and understanding process.

In order to at least solve the above technical problems of the related art, embodiments of the present invention provide a method, an apparatus, an electronic device, and a storage medium for controlling an intelligent voice device, which can enable a target intelligent voice device to be in an awake state to respond to a voice signal of a user, so as to avoid that the intelligent voice devices in the same scene respond to the voice signal of the user, thereby improving the experience of the user. In the following, an exemplary application of the electronic device provided by the embodiment of the present invention is described, where the electronic device implementing the control scheme of the intelligent voice device provided by the embodiment of the present invention may be a server, for example, a server deployed in a cloud, and according to voice signals of a user provided by a first intelligent voice device and at least one second intelligent voice device in the same space, determine a positional relationship between the first intelligent voice device and at least one second intelligent voice device and the user, and determine, according to the positional relationship, a target intelligent voice device satisfying a use scenario of the space in the first intelligent voice device and at least one second intelligent voice device, and trigger the target intelligent voice device to be in a wake-up state to respond to the voice signals of the user.

The electronic device implementing the control scheme of the intelligent voice device provided by the embodiment of the invention can also be a notebook computer, a tablet computer, a desktop computer, a mobile device (for example, a mobile phone, a personal digital assistant) and other user terminals (intelligent voice devices) with intelligent voice functions, taking the first intelligent voice device as a handheld terminal as an example, determining the position relationship between the intelligent voice device and the user according to the received voice signal of the user and the voice signal of the user provided by at least one second intelligent voice device in the same space, determining the target intelligent voice device meeting the use scene of the space in the first intelligent voice device and the at least one second intelligent voice device according to the position relationship, and triggering the target intelligent voice device to be in an awake state to respond to the voice signal of the user.

Referring to fig. 1, fig. 1 is a schematic diagram of an optional application scenario 10 of a control method of an intelligent voice device according to an embodiment of the present invention, where a terminal 200 (an intelligent voice device 200-1, an intelligent voice device 200-2, and an intelligent voice device 200-3 are exemplarily shown) is connected to a server 100 through a network 300, and the network 300 may be a wide area network or a local area network, or a combination of the two.

The terminal 200 may be used to receive a voice signal of a user, for example, the terminal automatically collects the voice signal of the user after the user emits the voice signal.

In some embodiments, the terminal 200 locally executes the control method of the intelligent voice device provided by the embodiment of the present invention to complete sensing processing on the space according to the voice signal of the user, so as to determine the intelligent voice device included in the space and the position relationship between the intelligent voice device and the user, determine the target intelligent voice device meeting the use scenario of the space in the first intelligent voice device and at least one second intelligent voice device according to the position relationship, trigger the target intelligent voice device to be in an awake state to respond to the voice signal of the user, for example, install a voice assistant on the intelligent voice device 200-1 (first intelligent voice device), collect the voice signal of the user after the user sends the voice signal, and receive the voice signals of the user collected by the intelligent voice device 200-2 and the intelligent voice device 200-3 (second intelligent voice device), determine the position relationship between the intelligent voice device (first intelligent voice device and the second intelligent voice device) and the user according to the voice signal, determine the target intelligent voice device meeting the use scenario of the space (first intelligent voice device and the second intelligent voice device) according to the position relationship, and trigger any one of the voice assistant to respond to the voice signal.

The terminal 200 may also send a voice signal of a user to the server 100 through the network 300, invoke a control function of the intelligent voice device provided by the server 100, perform control processing through the control method of the intelligent voice device provided by the embodiment of the present invention, for example, install a voice assistant on the terminal 200 (intelligent voice device), after the user sends the voice signal, the terminal 200 collects the voice signal of the user through the voice assistant, and sends the voice signal of the user to the server 100 through the network 300, the server 100 determines the intelligent voice device and a location relationship between the intelligent voice device and the user based on the voice signal of the user, determines a target intelligent voice device meeting a usage scenario of a space according to the location relationship, and sends a control instruction to the target intelligent voice device, triggers the target intelligent voice device to be in a wake-up state, and responds to the voice signal of the user through the voice assistant.

Continuing to describe the structure of the electronic device for implementing the intelligent voice device control scheme provided by the embodiment of the present invention, referring to fig. 2, fig. 2 is a schematic structural diagram of the electronic device 500 provided by the embodiment of the present invention, and the electronic device 500 shown in fig. 2 includes: at least one processor 510, a memory 550, at least one network interface 520, and a user interface 530. The various components in electronic device 500 are coupled together by bus system 540. It is appreciated that the bus system 540 is used to enable connected communications between these components. The bus system 540 includes a power bus, a control bus, and a status signal bus in addition to the data bus. The various buses are labeled as bus system 540 in fig. 2 for clarity of illustration.

The processor 510 may be an integrated circuit chip with signal processing capabilities such as a general purpose processor, such as a microprocessor or any conventional processor, or the like, a digital signal processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic, discrete hardware components, or the like.

The user interface 530 includes one or more output devices 531 that enable presentation of media content, including one or more speakers and/or one or more visual displays. The user interface 530 also includes one or more input devices 532, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons and controls.

Memory 550 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The nonvolatile Memory may be a Read Only Memory (ROM), and the volatile Memory may be a random access Memory (RAM, random Access Memory). The memory 550 described in embodiments of the present invention is intended to comprise any suitable type of memory. Memory 550 may optionally include one or more storage devices physically located remote from processor 510.

In some embodiments, memory 550 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 551 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

network communication module 552 is used to reach other computing devices via one or more (wired or wireless) network interfaces 520, exemplary network interfaces 520 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), etc.;

a display module 553 for enabling presentation of information (e.g., a user interface for operating peripheral devices and displaying content and information) via one or more output devices 531 (e.g., a display screen, speakers, etc.) associated with the user interface 530;

the input processing module 554 is configured to detect one or more user inputs or interactions from one of the one or more input devices 532 and translate the detected inputs or interactions.

In some embodiments, the control device of the intelligent voice device provided in the embodiments of the present invention may be implemented by combining software and hardware, and as an example, the control device of the intelligent voice device provided in the embodiments of the present invention may be a processor in the form of a hardware decoding processor, which is programmed to execute the control method of the intelligent voice device provided in the embodiments of the present invention, for example, the processor in the form of a hardware decoding processor may use one or more application specific integrated circuits (ASIC, application Specific Integrated Circuit), DSP, programmable logic device (PLD, programmable Logic Device), complex programmable logic device (CPLD, complex Programmable Logic Device), field programmable gate array (FPGA, field-Programmable Gate Array), or other electronic components.

In other embodiments, the control device of the intelligent voice apparatus provided in the embodiments of the present invention may be implemented in software, and fig. 2 shows a control device 555 of the intelligent voice apparatus stored in a memory 550, which may be software in the form of a program, a plug-in unit, or the like, and includes a series of modules including a receiving module 5551, a sensing module 5552, a processing module 5553, a triggering module 5554, a switching module 5555, and a responding module 5556; the receiving module 5551, the sensing module 5552, the processing module 5553, the triggering module 5554, the switching module 5555, and the response module 5556 are configured to implement the control method of the intelligent voice device provided by the embodiment of the present invention.

The following describes an exemplary application and implementation of the intelligent voice device according to the embodiment of the present invention, taking a first intelligent voice device as an execution body as an example. Referring to fig. 3A, fig. 3A is a flowchart of a control method of an intelligent voice device according to an embodiment of the present invention, and is described with reference to steps shown in fig. 3A.

In step 101, a speech signal of a user in a space where a first intelligent speech device is located is received.

After the user sends out the voice signal, for example, the wake-up word "ABAB", the first intelligent voice device may collect the voice signal of the user, and transmit the collected voice signal of the user through a local area network broadcast or other near field communication modes, and the first intelligent voice device may also receive the collected voice signal of the user of the second intelligent voice device (that is, any intelligent voice device except the first intelligent voice device in the space, and the number may be one or more) in the space where the first intelligent voice device is located.

In step 102, a perception process is performed on the space according to the voice signal to determine the intelligent voice device included in the space and the positional relationship with the user.

After the first intelligent voice device receives the voice signal of the user, namely the voice signal of the user collected by the first intelligent voice device and the voice signal of the user sent by the second intelligent voice device, the space is subjected to sensing processing according to the voice signal, and the position relationship between the intelligent voice device and the user, namely the position relationship between the first intelligent voice device and the second intelligent voice device and the user, which are included in the space, is determined.

Referring to fig. 3B, fig. 3B is an alternative flow chart provided by an embodiment of the present invention, and in some embodiments, step 102 in fig. 3A shown in fig. 3B may be implemented by steps 1021 through 1023 shown in fig. 3B.

Performing sensing processing on the space according to the voice signal to determine the intelligent voice equipment included in the space and the position relation between the intelligent voice equipment and the user, wherein the sensing processing comprises the following steps: for a user's voice signal received by any intelligent voice device in space, the following processing is performed:

in step 1021, the voice signal of the user received by the intelligent voice device from the plurality of directions is analyzed, so as to obtain the energy value of the voice signal of the user received by the intelligent voice device from the plurality of directions.

In step 1022, the direction corresponding to the maximum energy value is determined as the direction of the user relative to the intelligent voice device.

In step 1023, according to the relationship of the energy value of the voice signal attenuated with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user, it is determined that the distance corresponding to the attenuation value is the distance between the intelligent voice device and the user.

After the first intelligent voice device receives voice signals (voice signal carrying device identification for uniquely identifying devices) of users sent by other intelligent voice devices, the first intelligent voice device can conduct recognition processing of directions and distances on voice signals of users received by any intelligent voice device (the first intelligent voice device and the second intelligent voice device). The intelligent voice equipment can be provided with a multi-directional microphone array which is used for receiving voice equipment of a user from a plurality of directions, so that voice signals of the user received by the intelligent voice equipment from the plurality of directions can be analyzed and processed to obtain energy values of the voice signals of the user received by the intelligent voice equipment from the plurality of directions, wherein the energy values of the voice signals in different directions are different, the energy values of the voice signals in the directions of the user which are closer to the intelligent voice equipment are larger, and therefore the direction corresponding to the maximum energy value of the voice signals is determined as the direction of the user relative to the intelligent voice equipment. After determining the direction of the user relative to the intelligent voice device, determining that the distance corresponding to the attenuation value is the distance between the intelligent voice device and the user according to the relation that the energy value of the voice signal is attenuated along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user, wherein the energy value of the reference voice signal of the user is a fixed value or is the standard energy value of the voice signal of the user detected by other accurate voice detection devices in real time.

In some embodiments, performing perceptual processing on a space based on a speech signal to determine a location relationship between an intelligent speech device included in the space and a user, including: for a user's voice signal received by any intelligent voice device in space, the following processing is performed: analyzing the voice signal to obtain a first distance between the intelligent voice equipment and the user and a first direction of the user relative to the intelligent voice equipment; in response to the received voice signal, detecting an obstacle in the space to obtain a second distance between the intelligent voice equipment and the user; performing obstacle recognition on the space to obtain a second direction of the user relative to the intelligent voice equipment; and when the distance difference between the first distance and the second distance is larger than the distance error threshold value and/or the direction error between the first direction and the second direction is larger than the direction error threshold value, determining the weighted value of the first distance and the second distance as the distance between the intelligent voice equipment and the user, and determining the average value of the first direction and the second direction as the direction of the user relative to the intelligent voice equipment.

After the first intelligent voice device receives voice signals (voice signal carrying device identification for uniquely identifying devices) of users sent by other intelligent voice devices, the first intelligent voice device can conduct recognition processing of directions and distances on voice signals of users received by any intelligent voice device (the first intelligent voice device and the second intelligent voice device). Since the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device are determined only by the voice signal, the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device can be determined in other ways, the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device obtained by combining the two methods can be obtained, and finally, the accurate distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device can be obtained.

Firstly, analyzing and processing voice signals of a user received by the intelligent voice equipment from a plurality of directions to obtain energy values of the voice signals of the user received by the intelligent voice equipment from the plurality of directions; and determining the corresponding direction of the maximum energy value as the first direction of the user relative to the intelligent voice equipment, and determining the corresponding distance of the attenuation value as the first distance between the intelligent voice equipment and the user according to the relation that the energy value of the voice signal is attenuated along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user. Then, in response to the received voice signal, other devices may be triggered to perform obstacle detection on the space to obtain a second distance between the intelligent voice device and the user, where the other devices may be an acoustic wave detection device (such as ultrasonic detection), an image acquisition analysis device (such as camera acquisition, to identify the outline of the person), a biological signal detection device (such as infrared detection), and the like, which are used to detect the distance between the intelligent voice device and the user. For example, the sound wave detection device can emit sound waves, receive the sound waves reflected by the obstacle, and determine the distance between the intelligent voice device and the user according to the time of the sound waves going back and forth; the image acquisition and analysis equipment can acquire images of obstacles in the current space, identify users according to an image identification method and determine the distance between the intelligent voice equipment and the users; the bio-signal detection apparatus may detect a bio-signal, for example, detect a user in a current space, and determine a distance between the smart voice device and the user based on the detected user. The other devices may be integrated into the intelligent voice device or may be independent devices that can be perceived and used by the intelligent voice device. The second distance between the intelligent voice equipment and the user is obtained by detecting the obstacle on the space, and meanwhile, the obstacle recognition can be carried out on the space through other equipment to obtain a second direction of the user relative to the intelligent voice equipment, wherein the other equipment can be sound wave detection equipment (such as ultrasonic detection), image acquisition analysis equipment (such as camera acquisition, for identifying the outline of a person), biological signal detection equipment (such as infrared detection) and the like for detecting the direction of the user relative to the intelligent voice equipment. For example, the sound wave detection device can emit sound waves, receive the sound waves reflected by the obstacle, and determine the direction of the user relative to the intelligent voice device according to the direction of the returned sound waves; the image acquisition and analysis equipment can acquire images of obstacles in the current space, identify users according to an image identification method and determine the directions of the users relative to the intelligent voice equipment; the bio-signal detection device may detect bio-signals, for example, detect a user in a current space, and determine a direction of the user with respect to the smart voice device based on the detected user.

After determining a first distance between the intelligent voice device and the user and a first direction (a first method) of the user relative to the intelligent voice device through voice signal analysis and determining a second distance between the intelligent voice device and the user and a second direction (a second method) of the user relative to the intelligent voice device through other devices, when determining that a distance difference value between the first distance and the second distance is larger than a distance error threshold value and/or a direction error between the first direction and the second direction is larger than a direction error threshold value, determining a weighted value of the first distance and the second distance as the distance between the intelligent voice device and the user, and determining an average value of the first direction and the second direction as the direction of the user relative to the intelligent voice device, so that the first method and the second method are fused, and the accuracy of the position relationship between the intelligent voice device and the user is improved. The distance error threshold, the direction error threshold and the weight are experience values preset by the user, for example, the weights of the first distance and the second distance can be set according to specific situations, when focusing on the first method, a higher weight can be set for the first distance, a lower weight can be set for the second distance, if the weight set for the first distance is 0.6, and the weight set for the second distance is 0.4, then the distance between the intelligent voice device and the user=the first distance×0.6+the second distance×0.4.

It should be noted that other devices may be configured to continuously sense, and thus, the other devices may respond to the received voice signal in real time. Other devices can be set to be turned on or off periodically, so that the purpose of saving electricity can be achieved.

Of course, the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device can be determined by adopting the second method, and the distance between the intelligent voice device and the user and the direction of the user relative to the intelligent voice device are determined without considering the first method, namely, the second distance between the intelligent voice device and the user is obtained by detecting the space by responding to the received voice signal; and identifying the obstacle in the space to obtain a second direction of the user relative to the intelligent voice equipment.

In step 103, when the space further includes at least one second intelligent voice device, determining a target intelligent voice device meeting the usage scenario of the space in the first intelligent voice device and the at least one second intelligent voice device according to the location relationship.

After determining the position relation (direction and distance) between the intelligent voice equipment and the user, when only the first intelligent voice equipment is arranged in the space, determining that the first intelligent voice equipment is the target intelligent voice equipment, and when at least one second intelligent voice equipment is also arranged in the space, determining the target intelligent voice equipment meeting the use scene of the space according to the position relation in the first intelligent voice equipment and the at least one second intelligent voice equipment, so that the target intelligent voice equipment responds to the voice signals of the user, and a plurality of intelligent voice equipment in the same space are prevented from responding to the voice signals of the user simultaneously.

Referring to fig. 3B, fig. 3B is an alternative flow diagram provided by an embodiment of the present invention, and in some embodiments, step 103 in fig. 3A shown in fig. 3B may be implemented by steps 1031 to 1032 shown in fig. 3B.

Determining a target intelligent voice device meeting a use scene of a space in a first intelligent voice device and at least one second intelligent voice device according to the position relation, wherein the target intelligent voice device comprises: for the position relation obtained by performing sensing processing on the voice signal received by any intelligent device in the space, the following processing is performed:

in step 1031, when the time during which the positional relationship remains unchanged exceeds the time threshold, it is determined that the user is in a stationary state.

In step 1032, the intelligent speech device with the smallest distance from the user in the space is determined as the target intelligent speech device according to the distance between the intelligent speech device and the user included in the position relation.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the position relationship between any intelligent voice device (the first intelligent voice device and the second intelligent voice device) and the user can be determined and processed by the target intelligent voice device. When the position relation between any intelligent voice device and the user is kept unchanged within a preset time period, the user can be determined to be in a static state, and at the moment, the target intelligent voice device can be determined according to the distance between the intelligent voice device and the user, namely, the intelligent voice device with the smallest distance between the intelligent voice device and the user in the space is determined to be the target intelligent voice device.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: for the position relation obtained by performing sensing processing on the voice signal received by any intelligent device in the space, the following processing is performed: when the position relation changes, determining that the user is in a motion state; determining the change direction of the direction as the moving direction of the user relative to the intelligent voice equipment according to the direction of the user relative to the intelligent voice equipment included in the position relation; multiplying the reciprocal of the distance by the vector of the moving direction according to the distance between the user and the intelligent equipment included in the position relation to obtain the matching degree of the position relation between the intelligent voice equipment and the user; determining the intelligent voice device with the highest matching degree as a target intelligent voice device in the first intelligent voice device and at least one second intelligent voice device; when the direction change of the user relative to the intelligent voice equipment is approaching the intelligent voice equipment, the moving direction value is positive, and when the direction change of the user relative to the intelligent voice equipment is far away from the intelligent voice equipment, the moving direction value is negative.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the position relationship between any intelligent voice device (the first intelligent voice device and the second intelligent voice device) and the user can be determined and processed by the target intelligent voice device. Typically, the user is in a state during the process of sending out the voice signal, for example, when the user is closer to the first intelligent voice device before sending out the voice signal, and further from the first intelligent voice device after sending out the voice signal. Thus, to determine the appropriate target intelligent speech device to respond to the user's speech signal, it may be determined whether the user is in motion. When the position relation between the intelligent voice equipment and the user changes, for example, the duration change time of the position relation exceeds a time threshold, the user is determined to be in a motion state, and at the moment, the target intelligent voice equipment can be determined according to the distance between the intelligent voice equipment and the user and the direction of the user relative to the intelligent voice equipment. The position relationship includes a distance between the intelligent voice device and the user and a direction of the user relative to the intelligent voice device, and the direction of change of the direction can be determined as a moving direction of the user relative to the intelligent voice device according to the direction of the user relative to the intelligent voice device in the position relationship, for example, when the direction of the user relative to the intelligent voice device changes to approach the intelligent voice device, the moving direction takes a positive value, when the direction of the user relative to the intelligent voice device changes to be far away from the intelligent voice device, the moving direction takes a negative value, or an included angle of the direction of change is determined as the moving direction of the user relative to the intelligent voice device. And multiplying the reciprocal of the distance by the vector of the moving direction to obtain the matching degree of the position relationship between the intelligent voice equipment and the user, wherein the intelligent voice equipment with higher matching degree can meet the user requirement. Therefore, the intelligent voice device with the highest matching degree is determined as the target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device, and the target intelligent voice device can meet the user requirement most.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: determining intelligent voice equipment in an awakening state in the first intelligent voice equipment and at least one second intelligent voice equipment; when the distance between the intelligent voice equipment in the awakening state and the user does not exceed the critical distance, determining the intelligent voice equipment in the awakening state as target intelligent voice equipment; the critical distance is the maximum distance when the user and the intelligent voice device can correctly perceive the voice signal sent by the other party.

When the space is determined to further comprise at least one second intelligent voice device, the first intelligent voice device receives a voice signal sent by the second intelligent voice device and can carry an awake state identifier, the awake state identifier is used for identifying whether the second intelligent voice device is in an awake state, when the first intelligent voice device broadcasts the voice signal, the awake state identifier of the first intelligent voice device is carried in the voice signal, according to the awake state identifier of any intelligent voice device, the intelligent voice devices in the awake state in the first intelligent voice device and the at least one second intelligent voice device can be determined, when the distance between the intelligent voice device in the awake state and a user does not exceed a critical distance, the intelligent voice device in the awake state can sense the voice signal sent by the user, the user can sense the voice signal sent by the intelligent voice device in the awake state, in order to improve the continuity of user experience, the intelligent voice device in the awake state can be determined to be the target intelligent voice device, even if the intelligent voice device in the awake state is not the intelligent voice device closest to the user, and the intelligent voice device in the awake state can meet the user experience in response to the voice signal of the user. For example, the 2 intelligent voice devices are relatively close to the user, and the intelligent voice device which is relatively far away is in a wake-up state, so that the voice signal output by the intelligent voice device perceived by the user is not obviously reduced, and therefore, the relatively far away device can continue to respond to the voice signal of the user, and the intelligent voice device which is closest to the user is not woken up until the distance of the intelligent voice device which is currently used exceeds the critical distance.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: and when the intelligent voice equipment which is interacted with the user exists in the first intelligent voice equipment and the at least one second intelligent voice equipment and the distance between the intelligent voice equipment and the user does not exceed the critical distance, determining the intelligent voice equipment which is interacted with the user as the target intelligent voice equipment.

When the space is determined to further comprise at least one second intelligent voice device, the first intelligent voice device receives a voice signal sent by the second intelligent voice device and can carry an interaction state identifier, the interaction state identifier is used for identifying whether the second intelligent voice device is in a state of interacting with a user, when the first intelligent voice device broadcasts the voice signal, the interaction state identifier of the first intelligent voice device is carried in the voice signal, according to the interaction state identifier of any intelligent voice device, the first intelligent voice device and the intelligent voice device interacting with the user in the at least one second intelligent voice device can be determined, when the distance between the intelligent voice device interacting with the user and the user does not exceed a critical distance, the intelligent voice device interacting with the user can sense the voice signal sent by the user, the user can sense the voice signal sent by the intelligent voice device interacting with the user, in order to improve the continuity of user experience, even if the intelligent voice device interacting with the user is not the intelligent voice device closest to the user, the intelligent voice device interacting with the user can continue to interact with the intelligent voice signal only, and the user can continue to interact with the intelligent voice device. For example, if 2 intelligent voice devices are relatively close to the user and relatively far intelligent voice devices are in an interactive state with the user, the relatively far intelligent voice devices continue to respond to the voice signals of the user until the relatively far intelligent voice devices exceed a critical distance, and the intelligent voice devices closest to the user are not awakened until delay of switching response to the voice of the user is avoided.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: determining a change trend of a position relationship between the intelligent voice equipment in an awake state before receiving the voice signal and a user; when the intelligent voice equipment in the awakening state is determined to exceed the critical distance according to the change trend of the position relationship, the intelligent voice equipment with the highest matching degree with the position relationship between the users is determined to be the target intelligent voice equipment in the first intelligent voice equipment and at least one second intelligent voice equipment.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the position relationship between any intelligent voice device (the first intelligent voice device and the second intelligent voice device) and the user can be determined and processed by the target intelligent voice device. Firstly, determining the change trend of the position relation between the intelligent voice equipment in the awakening state and a user before receiving a voice signal, and determining the intelligent voice equipment in the awakening state as target intelligent voice equipment when the intelligent voice equipment in the awakening state is determined not to exceed a critical distance according to the change trend of the position relation; when the intelligent voice equipment in the awakening state is determined to exceed the critical distance according to the change trend of the position relationship, the intelligent voice equipment in the awakening state can not meet the user requirement, so that the intelligent voice equipment with the highest matching degree of the position relationship between the intelligent voice equipment and the user is determined to be the target intelligent voice equipment in the first intelligent voice equipment and at least one second intelligent voice equipment according to the matching degree determination method.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: determining a change trend of a position relationship between the intelligent voice equipment in an awake state before receiving the voice signal and a user; when the intelligent voice equipment in the awakening state is determined to exceed the critical distance within the preset duration according to the change trend of the position relationship, the intelligent voice equipment with the highest matching degree with the position relationship between the user is determined to be the target intelligent voice equipment in the first intelligent voice equipment and at least one second intelligent voice equipment.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the position relationship between any intelligent voice device (the first intelligent voice device and the second intelligent voice device) and the user can be determined and processed by the target intelligent voice device. Firstly, determining the change trend of the position relation between the intelligent voice equipment in the wake-up state and the user before receiving the voice signal, and when the intelligent voice equipment in the wake-up state is determined to exceed the critical distance in the preset duration according to the change trend of the position relation, indicating that the intelligent voice equipment in the wake-up state can not meet the user requirement, so that the intelligent voice equipment with the highest matching degree of the position relation between the intelligent voice equipment and the user is determined to be the target intelligent voice equipment according to the matching degree determination method in the first intelligent voice equipment and the at least one second intelligent voice equipment.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: determining a change trend of a position relationship between the intelligent voice equipment in an awake state before receiving the voice signal and a user; and when the intelligent voice equipment in the awakening state is determined not to exceed the critical distance according to the change trend of the position relationship, determining the intelligent voice equipment in the awakening state as the target intelligent voice equipment.

When the intelligent voice equipment in the awakening state is determined to not exceed the critical distance within the preset duration or not exceed the critical distance within the preset duration according to the change trend of the position relationship, the intelligent voice equipment in the awakening state can meet the user requirements, and in order to improve the user experience continuity, the intelligent voice equipment in the awakening state can be determined to be the target intelligent voice equipment.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: acquiring historical data of intelligent voice equipment in a wake-up state before receiving a voice signal, and predicting the expected use time length of the intelligent voice equipment in the wake-up state through an artificial intelligent model by combining the change trend, the use time length and the wake-up times of the position relation of the awakened intelligent voice equipment; and determining the intelligent voice device with the highest matching degree with the position relation between the user as the target intelligent voice device in the first intelligent voice device and at least one second intelligent voice device.

Before determining a target intelligent voice device, acquiring historical data of the intelligent voice device in a wake-up state before receiving a voice signal, and predicting the expected use time length of the intelligent voice device in the wake-up state through an artificial intelligent model by combining the change trend, the use time length and the wake-up times of the position relationship of the wake-up intelligent voice device, wherein the change trend of the position relationship of the wake-up intelligent voice device indicates that the closer the user is to the wake-up intelligent voice device, the longer the expected use time length of the wake-up intelligent voice device is; the longer the use duration of the awakened intelligent voice equipment is, the longer the use duration is expected to be; the more the number of wakeups of the intelligent voice device that has been woken up, the longer the expected use time. And determining the intelligent voice equipment with the highest matching degree with the position relation between the user as target intelligent voice equipment in the first intelligent voice equipment and at least one second intelligent voice equipment so as to trigger the target intelligent voice equipment to be in an awakening state and respond to the voice signal of the user when the expected use time arrives later.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: and determining the intelligent voice device with the highest matching degree with the position relationship between the user as the target intelligent voice device in the plurality of intelligent voice devices when the account number is determined to correspond to the plurality of intelligent voice devices based on the user account number bound by the first intelligent voice device and the at least one second intelligent voice device.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the position relationship between any intelligent voice device (the first intelligent voice device and the second intelligent voice device) and the user can be determined and processed by the target intelligent voice device. And determining that the account number corresponds to a plurality of intelligent voice devices based on the user account number bound by the first intelligent voice device and at least one second intelligent voice device, and determining a target intelligent voice device from the plurality of intelligent voice devices to respond to the voice signal of the user. Therefore, among the plurality of intelligent voice devices, the intelligent voice device with the highest matching degree can be determined as the target intelligent voice device according to the method for determining the matching degree of the position relationship between the intelligent voice device and the user.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: identifying a user corresponding to the voice signal of the user based on the voiceprint characteristics of the voice signal of the user; based on the user account number bound by the first intelligent voice device and at least one second intelligent voice device, when the user corresponding to the account number is determined to be the user corresponding to the voice signal, the intelligent voice device corresponding to the account number is determined to be the wakeable intelligent voice device; and determining the intelligent voice equipment with the highest matching degree with the position relation between the user as the target intelligent voice equipment in the first intelligent voice equipment and at least one second intelligent voice equipment.

After the first intelligent voice device determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the position relationship between any intelligent voice device (the first intelligent voice device and the second intelligent voice device) and the user can be determined and processed by the target intelligent voice device. Based on voiceprint characteristics of voice signals of users, identifying users corresponding to the voice signals of the users, and when the user corresponding to the account is determined to be the user corresponding to the voice signals based on the user account bound by the first intelligent voice device and at least one second intelligent voice device, determining the intelligent voice device corresponding to the account as the wakeable intelligent voice device, and determining target intelligent voice devices from a plurality of wakeable intelligent voice devices to respond to the voice signals of the users. Therefore, in the plurality of awakenable intelligent voice devices, the intelligent voice device with the highest matching degree can be determined as the target intelligent voice device according to the method for determining the matching degree of the position relationship between the intelligent voice device and the user.

In step 104, the target smart voice device is triggered to be in an awake state in response to the user's voice signal.

After determining the target intelligent voice device, the first intelligent voice device may trigger the target intelligent voice device to be in an awake state and respond to a voice signal of a user, where the target intelligent voice device may also be in an awake state before triggering the target intelligent voice device to be in the awake state. The first intelligent voice device may trigger the target intelligent voice device to be in an awake state by broadcasting over a lan or by other near field communication means to respond to the voice signal of the user.

In some embodiments, when determining a trend of change in a positional relationship between the intelligent voice device in an awake state and the user before receiving the voice signal, and when determining that the intelligent voice device in the awake state is to exceed a critical distance according to the trend of change in the positional relationship, determining, in the first intelligent voice device and the at least one second intelligent voice device, the intelligent voice device with the highest matching degree to the positional relationship between the user as the target intelligent voice device, triggering the target intelligent voice device to be in the awake state, including: when the intelligent voice equipment with the highest matching degree with the position relation of the user is determined to be the target intelligent voice equipment, triggering the intelligent voice equipment in the awakening state to be in the standby state, and awakening the target intelligent voice equipment in real time.

When the intelligent voice equipment in the awakening state is determined to exceed the critical distance, the intelligent voice equipment in the awakening state can be triggered to be in a standby state, the target intelligent voice equipment is awakened in real time, the intelligent voice equipment in the awakening state is prevented from being still in the awakening state, the effect of saving electricity can be achieved, the plurality of intelligent voice equipment can be prevented from being in the awakening state, and the user experience is reduced.

In some embodiments, when determining a trend of change in a positional relationship between the intelligent voice device in the awake state and the user before receiving the voice signal, and when determining that the intelligent voice device in the awake state is to exceed a critical distance within a preset duration according to the trend of change in the positional relationship, determining, in the first intelligent voice device and the at least one second intelligent voice device, that the intelligent voice device with the highest matching degree to the positional relationship between the user is the target intelligent voice device, triggering the target intelligent voice device to be in the awake state, including: the target intelligent voice device is awakened in advance before the intelligent voice device in the awakening state does not exceed the critical distance.

When the intelligent voice equipment in the wake-up state is determined to exceed the critical distance within the preset time, the target intelligent voice equipment can be awakened in advance before the intelligent voice equipment in the wake-up state does not exceed the critical distance, so that seamless connection of the intelligent voice equipment is realized, and the target intelligent voice equipment is awakened again when the intelligent voice equipment in the wake-up state does not exceed the critical distance, thereby wasting the awakening time.

In some embodiments, when historical data of an intelligent voice device in a wake-up state before a voice signal is received is acquired, and a change trend, a use duration and a wake-up number of a position relation of the awakened intelligent voice device are combined, the estimated use duration of the intelligent voice device in the wake-up state is predicted through an artificial intelligent model, and when an intelligent voice device with the highest matching degree of the position relation with a user is determined as a target intelligent voice device in a first intelligent voice device and at least one second intelligent voice device, the method comprises the steps of: when the predicted use time length reaches, waking up the target intelligent voice equipment in real time; alternatively, the target smart voice device is pre-awakened before the expected time of use arrives.

After the estimated use time of the intelligent voice equipment in the awakening state is predicted through the artificial intelligent model, the target intelligent voice equipment can be awakened in real time when the estimated use time is reached, so that the effect of saving electricity is achieved; or, the target intelligent voice equipment is awakened in advance before the expected use time arrives, so that seamless connection of the intelligent voice equipment is realized, and the problem that the target intelligent voice equipment is awakened again when the expected use time arrives, and the awakening time is wasted is avoided.

In some embodiments, after triggering the target intelligent speech device to be in the awake state, the method further comprises: triggering intelligent voice equipment which is outside the target intelligent voice equipment and is in an awake state to switch to a standby state in real time; or, waiting for a preset time period, determining a change trend of the position relation between the target intelligent voice device and the user according to the intelligent voice device in the wake-up state and the target intelligent voice device, and triggering the intelligent voice device in the wake-up state to switch to the standby state when the intelligent voice device in the wake-up state is determined to exceed the critical distance within the preset time period.

In order to avoid that a plurality of intelligent voice devices are in an awake state in the same space, the intelligent voice devices in the awake state except the target intelligent voice device can be triggered to be switched to be in a standby state in real time. Or, in order to avoid the intelligent voice device switching the wake-up state back and forth, after the target intelligent voice device is triggered to be in the wake-up state, waiting for a preset time period, determining a change trend of a position relationship with a user for the intelligent voice device which is in the wake-up state and is outside the target intelligent voice device in the preset time period, and triggering the intelligent voice device in the wake-up state to be switched to a standby state when the intelligent voice device in the wake-up state is determined to exceed a critical distance in a preset time period.

Referring to fig. 3C, fig. 3C is a schematic flowchart of an alternative embodiment of the present invention, in some embodiments, fig. 3C shows that, in step 105, when the target smart voice device is not the same device as the smart voice device in the awake state before receiving the voice signal, and in the process that the smart voice device in the awake state responds to the voice signal of the user last time, the distance between the smart voice device in the awake state and the user exceeds the critical distance, triggering the target smart voice device to respond to the last voice signal again.

In order to avoid that the user does not clearly obtain the voice signal of the intelligent voice device in response to the user, when the target intelligent voice device is not the same device as the intelligent voice device in the wake-up state before receiving the voice signal, and the distance between the intelligent voice device in the wake-up state and the user exceeds the critical distance when the intelligent voice device in the wake-up state responds to the voice signal of the user last time, it is indicated that the user may not perceive the information of the intelligent voice device in the wake-up state responding to the voice signal of the user last time when the intelligent voice device in the wake-up state responds to the voice signal of the user last time, so that the target intelligent voice device can be triggered to respond to the last voice signal again, and signal omission is avoided.

The following describes a control method of an intelligent voice device according to the embodiment of the present invention, taking an execution main body for implementing the control scheme of the intelligent voice device provided by the embodiment of the present invention as an example of a server. Referring to fig. 4, fig. 4 is a flowchart of a control method of an intelligent voice device according to an embodiment of the present invention, and is described with reference to steps shown in fig. 4.

In step 201, a first intelligent speech device and at least one second intelligent speech device receive a speech signal of a user.

After the user emits the speech signal, e.g., the wake-up word "ABAB", the first intelligent speech device and the at least one second intelligent speech device may collect the user's speech signal.

In step 202, a first intelligent speech device and at least one second intelligent speech device send a speech signal of a user to a server.

In step 203, the server receives a voice signal of a user sent to the server by the first intelligent voice device and at least one second intelligent voice device.

In step 204, the server performs a perception process on the space according to the voice signal to determine the intelligent voice device included in the space and a positional relationship with the user.

After the server receives the voice signals of the users of the first intelligent voice device and the at least one second intelligent voice device, the space can be subjected to sensing processing according to the voice signals so as to determine the intelligent voice devices included in the space and the position relationship between the intelligent voice devices and the users.

In some embodiments, performing perceptual processing on a space based on a speech signal to determine a location relationship between an intelligent speech device included in the space and a user, including: for a user's voice signal received by any intelligent voice device in space, the following processing is performed:

analyzing and processing the voice signals of the user received by the intelligent voice equipment from a plurality of directions to obtain energy values of the voice signals of the user received by the intelligent voice equipment from the plurality of directions; and determining the direction corresponding to the maximum energy value as the direction of the user relative to the intelligent voice equipment, and determining the distance corresponding to the attenuation value as the distance between the intelligent voice equipment and the user according to the relation that the energy value of the voice signal is attenuated along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user.

After the server receives the voice signals (the voice signals carry the device identifier and are used for uniquely identifying the device) of the user sent by the first intelligent voice device and the second intelligent voice device, the voice signals of the user received by any intelligent voice device (the first intelligent voice device and the second intelligent voice device) are subjected to direction and distance identification processing. The intelligent voice device may be provided with a multidirectional microphone array for receiving the voice device of the user from a plurality of directions, so that the intelligent voice device obtains energy values of the voice signal of the user received from the plurality of directions, and transmits the energy values of the voice signal of the user received from the plurality of directions to the server, and the server performs recognition processing of directions and distances according to the energy values of the voice signal of the user received from the plurality of directions.

In step 205, when the space further includes at least one second intelligent voice device, the server determines a target intelligent voice device that satisfies a usage scenario of the space among the first intelligent voice device and the at least one second intelligent voice device according to the positional relationship.

After the server determines the position relation (direction and distance) between the intelligent voice equipment and the user, when only the first intelligent voice equipment is arranged in the space, the first intelligent voice equipment is determined to be the target intelligent voice equipment, when at least one second intelligent voice equipment is further arranged in the space, the target intelligent voice equipment meeting the use scene of the space is determined in the first intelligent voice equipment and the at least one second intelligent voice equipment according to the position relation, so that the target intelligent voice equipment responds to the voice signals of the user, and a plurality of intelligent voice equipment in the same space are prevented from simultaneously responding to the voice signals of the user.

In some embodiments, determining a target smart voice device that satisfies a usage scenario of a space in a first smart voice device and at least one second smart voice device according to a positional relationship includes: for the position relation obtained by performing sensing processing on the voice signal received by any intelligent device in the space, the following processing is performed: when the time for keeping the position relationship unchanged exceeds a time threshold, determining that the user is in a static state; and determining the intelligent voice device with the smallest distance from the user in the space as the target intelligent voice device according to the distance between the intelligent voice device and the user included in the position relation.

After the server determines the position relationship between the intelligent voice devices (the first intelligent voice device and the second intelligent voice device) and the user, the position relationship between any intelligent voice device (the first intelligent voice device and the second intelligent voice device) and the user can be determined and processed by the target intelligent voice device.

In step 206, the server triggers the target smart voice device to be in an awake state in response to the user's voice signal.

After the server determines the target intelligent voice device, a wake-up instruction can be sent to the target intelligent voice device according to the address of the target intelligent voice device, the target intelligent voice device receives the wake-up instruction, enters a wake-up state and responds to the voice signal of the user, and therefore the target intelligent voice device is triggered to be in the wake-up state to respond to the voice signal of the user.

In some embodiments, after triggering the target intelligent speech device to be in the awake state, the method further comprises: when the target intelligent voice device and the intelligent voice device in the wake-up state before receiving the voice signal are not the same device, and the distance between the intelligent voice device in the wake-up state and the user exceeds the critical distance in the process that the intelligent voice device in the wake-up state responds to the voice signal of the user for the last time, triggering the target intelligent voice device to respond to the last voice signal again.

The method for controlling the intelligent voice device provided by the embodiment of the present invention has been described with reference to the exemplary application and implementation of the electronic device provided by the embodiment of the present invention when the electronic device is an intelligent voice device or a server, and the scheme for implementing the control of the intelligent voice device by matching each module in the control device 555 of the intelligent voice device provided by the embodiment of the present invention is further described below.

A receiving module 5551, configured to receive a voice signal of a user in a space where the first intelligent voice device is located;

a sensing module 5552, configured to perform sensing processing on the space according to the voice signal, so as to determine an intelligent voice device included in the space and a location relationship between the intelligent voice device and the user;

a processing module 5553, configured to determine, when the space further includes at least one second intelligent voice device, a target intelligent voice device that meets a usage scenario of the space in the first intelligent voice device and the at least one second intelligent voice device according to the location relationship, and

and the triggering module 5554 is used for triggering the target intelligent voice device to be in an awake state to respond to the voice signal of the user.

In the above technical solution, the sensing module 5552 is further configured to perform, for a voice signal of the user received by any intelligent voice device in the space, the following processing: analyzing the voice signals of the user received by the intelligent voice equipment from a plurality of directions to obtain energy values of the voice signals of the user received by the intelligent voice equipment from the plurality of directions; and determining the direction corresponding to the maximum energy value as the direction of the user relative to the intelligent voice equipment, and determining the distance corresponding to the attenuation value as the distance between the intelligent voice equipment and the user according to the relation that the energy value of the voice signal is attenuated along with the distance and the attenuation value of the maximum energy value relative to the energy value of the reference voice signal of the user.

In the above technical solution, the sensing module 5552 is further configured to perform, for a voice signal of the user received by any intelligent voice device in the space, the following processing: analyzing the voice signal to obtain a first distance between the intelligent voice equipment and the user and a first direction of the user relative to the intelligent voice equipment; in response to the received voice signal, detecting an obstacle in the space to obtain a second distance between the intelligent voice equipment and the user; identifying obstacles in the space to obtain a second direction of the user relative to the intelligent voice equipment; and when the distance difference between the first distance and the second distance is greater than a distance error threshold value and/or the direction error between the first direction and the second direction is greater than a direction error threshold value, determining a weighted value of the first distance and the second distance as the distance between the intelligent voice equipment and the user, and determining an average value of the first direction and the second direction as the direction of the user relative to the intelligent voice equipment.

In the above technical solution, the processing module 5553 is further configured to perform the following processing on a positional relationship obtained by performing a sensing process on a voice signal received by any one of the intelligent devices in the space: when the time for keeping the position relationship unchanged exceeds a time threshold, determining that the user is in a static state; and determining the intelligent voice equipment with the minimum distance from the user in the space as target intelligent voice equipment according to the distance between the intelligent voice equipment and the user included in the position relation.

In the above technical solution, the processing module 5553 is further configured to perform the following processing on a positional relationship obtained by performing a sensing process on a voice signal received by any one of the intelligent devices in the space: when the position relation changes, determining that the user is in a motion state; determining the change direction of the direction as the moving direction of the user relative to the intelligent voice equipment according to the direction of the user relative to the intelligent voice equipment, which is included by the position relation; multiplying the reciprocal of the distance by the vector of the moving direction according to the distance between the user and the intelligent equipment included in the position relation to obtain the matching degree of the position relation between the intelligent voice equipment and the user; determining the intelligent voice device with the highest matching degree as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device; when the direction change of the user relative to the intelligent voice equipment approaches the intelligent voice equipment, the moving direction value is positive, and when the direction change of the user relative to the intelligent voice equipment is far away from the intelligent voice equipment, the moving direction value is negative.

In the above technical solution, the processing module 5553 is further configured to determine an intelligent voice device in an awake state in the first intelligent voice device and the at least one second intelligent voice device; when the distance between the intelligent voice equipment in the awakening state and the user does not exceed the critical distance, determining the intelligent voice equipment in the awakening state as target intelligent voice equipment; the critical distance is the maximum distance when the user and the intelligent voice equipment can correctly perceive voice signals sent by the other party.

In the above technical solution, the processing module 5553 is further configured to determine that the intelligent voice device that is interacting with the user is the target intelligent voice device when the first intelligent voice device and the at least one second intelligent voice device have intelligent voice devices that are interacting with the user, and the distance between the first intelligent voice device and the user does not exceed the critical distance.

In the above technical solution, the processing module 5553 is further configured to determine a trend of change in a positional relationship between the intelligent voice device in the awake state and the user before receiving the voice signal; when the intelligent voice equipment in the awakening state is determined to exceed a critical distance according to the change trend of the position relationship, the intelligent voice equipment with the highest matching degree with the position relationship between the users is determined to be target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment;

The triggering module 5554 is further configured to trigger the intelligent voice device in the wake-up state to be in a standby state when it is determined that the intelligent voice device with the highest matching degree with the position relationship of the user is the target intelligent voice device, and wake up the target intelligent voice device in real time.

In the above technical solution, the processing module 5553 is further configured to determine a trend of a change in a positional relationship between the intelligent voice device in the awake state and the user before receiving the voice signal; when the intelligent voice equipment in the awakening state is determined to exceed a critical distance within a preset duration according to the change trend of the position relationship, determining the intelligent voice equipment with the highest matching degree with the position relationship between the users as target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment;

the triggering module 5554 is further configured to wake up the target smart voice device in advance before the smart voice device in the wake-up state does not exceed the critical distance.

In the above technical solution, the processing module 5553 is further configured to determine a trend of a change in a positional relationship between the intelligent voice device in the awake state and the user before receiving the voice signal; and when the intelligent voice equipment in the awakening state is determined not to exceed the critical distance according to the change trend of the position relation, determining the intelligent voice equipment in the awakening state as the target intelligent voice equipment.

In the above technical solution, the processing module 5553 is further configured to obtain historical data of an intelligent voice device in a wake-up state before receiving the voice signal, and predict, according to an artificial intelligence model, a predicted usage time of the intelligent voice device in the wake-up state by combining a trend of change in a positional relationship of the wake-up intelligent voice device, a usage time period, and a wake-up number of times; determining the intelligent voice device with the highest matching degree with the position relation between the user as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device; the triggering module is also used for waking up the target intelligent voice equipment in real time when the expected use time is up; or, waking up the target intelligent voice device in advance before the expected use time arrives.

In the above technical solution, the apparatus further includes:

a switching module 5555, configured to trigger the intelligent voice device in the awake state other than the target intelligent voice device to switch to the standby state in real time; or,

In the above technical solution, the control device 555 of the intelligent voice device further includes:

and a response module 5556, configured to trigger, when the target intelligent voice device is not the same device as the intelligent voice device in the wake-up state before the voice signal is received, and the distance between the intelligent voice device in the wake-up state and the user exceeds a critical distance in the process that the intelligent voice device in the wake-up state responds to the voice signal of the user for the last time, the target intelligent voice device responds to the last voice signal again.

In the above technical solution, the processing module 5553 is further configured to determine, based on the user account bound by the first intelligent voice device and at least one second intelligent voice device, that, when the account corresponds to a plurality of intelligent voice devices, among the plurality of intelligent voice devices, an intelligent voice device with a highest matching degree with a positional relationship between the users is a target intelligent voice device.

In the above technical solution, the processing module 5553 is further configured to identify, based on voiceprint features of the voice signal of the user, a user corresponding to the voice signal of the user; based on a user account bound by the first intelligent voice device and at least one second intelligent voice device, when the user corresponding to the account is determined to be the user corresponding to the voice signal, the intelligent voice device corresponding to the account is determined to be the wakeable intelligent voice device; and determining the intelligent voice equipment with the highest matching degree with the position relation between the user as target intelligent voice equipment in the first intelligent voice equipment and the at least one second intelligent voice equipment.

The embodiment of the present invention also provides a storage medium storing executable instructions, where the executable instructions are stored, and when the executable instructions are executed by a processor, the processor is caused to perform a control method of an intelligent voice device provided by the embodiment of the present invention, for example, a control method of an intelligent voice device as shown in fig. 3A-3C, or a control method of an intelligent voice device as shown in fig. 4.

In some embodiments, the storage medium may be FRAM, ROM, PROM, EPROM, EEPROM, flash, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, the executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, for example, in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or, alternatively, distributed across multiple sites and interconnected by a communication network.

In the following, an exemplary application of the embodiment of the present invention in a practical application scenario will be described.

Automatic speech recognition (ASR, automatic Speech Recognition) techniques can well meet user requirements in a single intelligent speech device scenario, but the user experience is not very good in a scenario where multiple intelligent speech devices coexist.

With more and more intelligent voice devices, a situation that a plurality of intelligent voice devices exist in the same scene (the same home or the same room) occurs, and in this case, if a user wakes up the intelligent voice devices and initiates a voice request, the plurality of intelligent voice devices simultaneously respond to and reply to the voice request initiated by the user, so that the experience of the user is greatly reduced.

In order to solve the above problems, the embodiment of the invention provides a control method (a single device response method (VSSP, voice Service Spatial Perception) based on space perception) of an intelligent voice device, which can judge a device closest to a user in physical space according to the energy of voice received by the intelligent voice device from the user side, the comprehensive latitude of a VS account system and the like. Therefore, under the above situation, even when the user initiates a voice request to the intelligent voice device, only the intelligent voice device nearest to the user will give a response, and other intelligent voice devices farther from the user will not respond to the request of the user and automatically enter a standby state to wait for the next wake-up, so that the voice confusion is avoided, and the voice communication method is better suitable for the scene of coexistence of multiple intelligent voice devices.

Fig. 5 is a schematic diagram of a user waking up an intelligent device according to an embodiment of the present invention, as shown in fig. 5, where the intelligent voice device 1 and the intelligent voice device 2 are in the same environment, and the physical space of the intelligent voice device 1 is closer to the user than the physical space of the intelligent voice device 2, and when the user speaks a wake-up word of "ABAB", both the intelligent voice device 1 and the intelligent voice device 2 are woken up and wait for responding to a voice request of the user.

Fig. 6 is a schematic diagram of an application scenario of an intelligent voice device according to an embodiment of the present invention, as shown in fig. 6, when a user initiates a real voice request, for example, "how weather today" is, the intelligent voice device 1 and the intelligent voice device 2 will both receive the voice request of the user. At this time, since the intelligent voice device 1 is closer to the user than the intelligent voice device 2, only the intelligent voice device 1 (target intelligent voice device) will reply to the voice request of the user and report a reply word, for example, "Shenzhen today sunny, air temperature.

Fig. 7 is a schematic diagram of an application scenario in which an intelligent voice device and a cloud end interact with each other, as shown in fig. 7, in the application scenario in fig. 6, after receiving voice requests of users, the intelligent voice device 1 and the intelligent voice device 2 send voice requests (voice data) of the corresponding users to the cloud end, and the cloud end receives the requests from the intelligent voice device 1 and the requests from the intelligent voice device 2, wherein the requests of the intelligent voice device 1 and the requests of the intelligent voice device 2 are the same voice flows of "how weather today. At this time, the cloud end can determine whether the intelligent voice device 1 and the intelligent voice device 2 are the same login account, and if the login account numbers of the intelligent voice device 1 and the intelligent voice device 2 are the same, it is explained that the intelligent voice device 1 and the intelligent voice device 2 are very likely to be 2 devices belonging to the same user. Moreover, if the cloud received request of the intelligent voice device 1 and the cloud received request of the intelligent voice device 2 are similar in time, the probability that the intelligent voice device 1 and the intelligent voice device 2 are in the same environment is very high. At this time, the cloud end can perform VSSP processing, compares the energy value of the voice data uploaded by the intelligent voice device 1 with the energy value of the voice data uploaded by the intelligent voice device 2, and judges that the intelligent voice device 1 is closer to the user according to the energy value of the voice data, so that the cloud end can issue a broadcast instruction to the intelligent voice device 1 (target intelligent voice device), and the intelligent voice device 1 broadcasts a reply according to the broadcast instruction, for example, "Shenzhen today, sunny and rainy and temperate. Meanwhile, the cloud end can send a standby instruction to the intelligent voice equipment 2, so that the intelligent voice equipment 2 enters a standby state and waits for the next awakening of the user.

Fig. 8 is a waveform diagram of voice data uploaded to the cloud by the intelligent voice device 1 provided by the embodiment of the present invention, fig. 9 is a spectrum diagram of voice data uploaded to the cloud by the intelligent voice device 1 provided by the embodiment of the present invention, fig. 10 is a waveform diagram of voice data uploaded to the cloud by the intelligent voice device 2 provided by the embodiment of the present invention, and fig. 11 is a spectrum diagram of voice data uploaded to the cloud by the intelligent voice device 2 provided by the embodiment of the present invention, as can be seen from the waveform diagrams shown in fig. 8 and 10, in the application scenario of fig. 7, after "how weather today" is uploaded to the cloud, the cloud can determine that the waveform energy value in fig. 8 is much larger than the waveform energy value in fig. 10 through voice recognition; as can be seen from the spectrograms shown in fig. 9 and 11, in the application scenario of fig. 9, after the "how weather today" is uploaded to the cloud end, the cloud end can determine that the spectral energy value in fig. 9 is much larger than the spectral energy value in fig. 11 in the same time through voice recognition, that is, as can be seen from the boxes in the spectrograms shown in fig. 9 and 11, the high frequency region of the voice data in fig. 9 is more active than the high frequency region of the voice data in fig. 11. Therefore, as can be seen from fig. 8-11, if the energy value of the voice data received by the intelligent voice device 1 at the same time is greater than the energy value of the voice data of the intelligent voice device 2, it is explained that the intelligent voice device 1 is closer to the physical space of the user than the intelligent voice device 3, the VSSP method needs to be triggered, and the cloud control only allows the intelligent voice device 1 to respond to the voice request of the user.

Fig. 12 is another application scenario schematic diagram of interaction between an intelligent voice device and a cloud end, where, as shown in fig. 12, the intelligent voice device 1 and the intelligent voice device 2 are the same login account, the intelligent voice device 3 is a different account from the intelligent voice device 1 and the intelligent voice device 2, and the distances between the intelligent voice device and a user are respectively: intelligent voice device 3> intelligent voice device 2> intelligent voice device 1, at this time, the VSSP method only takes effect between intelligent voice device 1 and intelligent voice device 2, intelligent voice device 1 (target intelligent voice device) and intelligent voice device 3 will simultaneously respond to the request of the user, and intelligent voice device 2 can automatically enter a standby state due to the VSSP method, so that multiple intelligent voice devices in the same space can coexist, but intelligent voice devices belong to different user scenes (such as offices and other public places), and the intelligent voice devices with more distant distances cannot be used simultaneously due to the fact that the intelligent voice devices are in the same scene, so that the effect of sharing the intelligent voice devices is achieved.

In order to verify the effect achieved by the embodiment of the invention, an intelligent voice device (jingle intelligent screen) adopting a VSSP method is compared with the existing intelligent voice device (jingle intelligent screen) and the existing intelligent voice device (jingle intelligent screen and the existing intelligent voice device), wherein under the same scene, two jingle intelligent screens and two existing intelligent voice devices are respectively arranged, the same account number is logged in, and the comparison result is shown in table 1:

TABLE 1

Product model	The device 1 is 1 meter from the user	The device 2 is 3 meters from the user
			Degree at home	Response to	Response to
Fairy in the daily life	Response to	Response to
			Ding Dang intelligent screen	Response to	Not respond to

As can be seen from table 1, in the same application scenario, the degree is at home and the day fairy is in the coexistence of 2 devices, and the simultaneous wake-up occurs, and the voice request of the user is responded to and simultaneously replied, thereby causing the phenomenon of voice confusion. While the intelligent screen started with the VSSP method can wake up simultaneously under the condition that 2 devices coexist, only one device (1 meter away from the user) which is nearest replies the voice request of the user, so that the experience of the user is better.

When multiple intelligent voice devices are in the same local area network and receive voice requests of users from multiple intelligent voice devices (intelligent voice device 1 and intelligent voice device 2 and … …), the intelligent voice device 1 can send encrypted account information and energy values of voice data under the local area network without using a cloud, and receive the encrypted account information and the energy values of the voice data sent by other intelligent voice devices (intelligent voice device 2 and … …), and locally compare the encrypted account information and the energy values of the voice data sent by other intelligent voice devices with the encrypted account information and the energy values of the voice data. If the received account information is consistent with the own account information and the energy value of the received voice data is larger than the energy value of the own voice data, the intelligent voice equipment 1 automatically enters a standby state.

In addition, the intelligent voice device can judge a device which is most in the physical space and meets the user requirement from other latitudes of the voice signal received by the user side. For example, 1) the position relation (i.e. at least one of the direction and the distance) of the voice signal can be considered, when the user is in a static state, the dimension of the distance is preferentially considered, and the target intelligent voice device is determined according to the distance to respond to the voice signal of the user; when the user is in a motion state, the matching degree of the moving direction and the distance, for example, the matching degree weighting of the moving direction and the distance, can be comprehensively considered to determine the target intelligent voice device to respond to the voice signal of the user. 2) Besides using the energy value of the voice signal to represent the distance between the user and the intelligent voice equipment, the device such as an infrared sensing device, an ultrasonic device, a camera and the like can be used for sensing the position relationship, and the device can be integrated into the intelligent voice equipment or can be a separate device sensed and used by the intelligent voice equipment. 3) When a plurality of awakened intelligent voice devices exist, the awakened intelligent voice devices can be filtered according to the identity information of the user, namely, the awakened intelligent voice devices are only limited to be bound to the same user account. 4) If the current user has interaction with the intelligent voice device (e.g. the user walks around), determining the target intelligent semantic device is performed on the premise of ensuring the user interaction.

In summary, according to the embodiment of the invention, the target intelligent voice device meeting the use scene of the space is determined in the first intelligent voice device and at least one second intelligent voice device through the position relationship, so that the intelligent voice devices in the same scene are prevented from responding to the voice signals of the user, and the experience of the user is improved.

The foregoing is merely exemplary embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present invention are included in the protection scope of the present invention.

Claims

1. A method for controlling an intelligent voice device, the method comprising:

when the space further comprises at least one second intelligent voice device, determining a target intelligent voice device meeting the use scene of the space in the first intelligent voice device and the at least one second intelligent voice device according to the position relation, wherein the determining the target intelligent voice device meeting the use scene of the space in the first intelligent voice device and the at least one second intelligent voice device according to the position relation comprises the following steps: the following processing is executed for the position relation obtained by performing sensing processing on the voice signal received by any intelligent device in the space: when the position relation changes, determining that the user is in a motion state; determining the change direction of the direction as the moving direction of the user relative to the intelligent voice equipment according to the direction of the user relative to the intelligent voice equipment, which is included by the position relation; multiplying the reciprocal of the distance by the vector of the moving direction according to the distance between the user and the intelligent equipment included in the position relation to obtain the matching degree of the position relation between the intelligent voice equipment and the user; determining the intelligent voice device with the highest matching degree as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device;

2. The method of claim 1, wherein said performing perceptual processing on said space based on said speech signal to determine intelligent speech devices included in said space and a positional relationship with said user comprises:

for the user's voice signal received by any intelligent voice device in the space, performing the following processing:

3. The method of claim 1, wherein said performing perceptual processing on said space based on said speech signal to determine intelligent speech devices included in said space and a positional relationship with said user comprises:

4. The method of claim 1, wherein the determining, from the first intelligent voice device and the at least one second intelligent voice device according to the positional relationship, a target intelligent voice device that satisfies a usage scenario of the space, further comprises:

The following processing is executed for the position relation obtained by performing sensing processing on the voice signal received by any intelligent device in the space:

5. The method of claim 1, wherein the movement direction value is positive when the change in direction of the user relative to the intelligent voice device is approaching the intelligent voice device, and wherein the movement direction value is negative when the change in direction of the user relative to the intelligent voice device is away from the intelligent voice device.

6. The method of claim 1, wherein the determining, from the first intelligent voice device and the at least one second intelligent voice device according to the positional relationship, a target intelligent voice device that satisfies a usage scenario of the space, further comprises:

determining intelligent voice devices in an awake state in the first intelligent voice device and the at least one second intelligent voice device;

7. The method of claim 1, wherein the determining, from the first intelligent voice device and the at least one second intelligent voice device according to the positional relationship, a target intelligent voice device that satisfies a usage scenario of the space, further comprises:

when there is a smart voice device in the first smart voice device and the at least one second smart voice device that is interacting with the user, and

8. The method of claim 1, wherein the determining, from the first intelligent voice device and the at least one second intelligent voice device according to the positional relationship, a target intelligent voice device that satisfies a usage scenario of the space, further comprises:

Determining a change trend of a position relationship between the intelligent voice equipment in an awake state before receiving the voice signal and the user;

the triggering the target intelligent voice equipment to be in an awake state comprises the following steps:

when the intelligent voice equipment with the highest matching degree with the position relation of the user is determined to be the target intelligent voice equipment, triggering the intelligent voice equipment in the awakening state to be in the standby state, and awakening the target intelligent voice equipment in real time.

9. The method of claim 1, wherein the determining, from the first intelligent voice device and the at least one second intelligent voice device according to the positional relationship, a target intelligent voice device that satisfies a usage scenario of the space, further comprises:

and pre-waking the target intelligent voice device before the intelligent voice device in the wake-up state does not exceed the critical distance.

10. The method of claim 1, wherein after the triggering the target intelligent voice device to be in the awake state, the method further comprises:

triggering intelligent voice equipment which is in an awake state and is outside the target intelligent voice equipment to switch to a standby state in real time; or,

11. The method according to any one of claims 1-10, wherein said determining, from said location relationship, among said first intelligent speech device and said at least one second intelligent speech device, a target intelligent speech device that satisfies a usage scenario of said space, further comprises:

and determining the intelligent voice equipment with the highest matching degree with the position relation between the user in the plurality of intelligent voice equipment as target intelligent voice equipment when the account number corresponds to the plurality of intelligent voice equipment based on the user account number bound by the first intelligent voice equipment and at least one second intelligent voice equipment.

12. A control device for an intelligent speech device, the device comprising:

the processing module is configured to determine, when the space further includes at least one second intelligent voice device, a target intelligent voice device that satisfies a usage scenario of the space in the first intelligent voice device and the at least one second intelligent voice device according to the positional relationship, where the determining, in the first intelligent voice device and the at least one second intelligent voice device according to the positional relationship, the target intelligent voice device that satisfies the usage scenario of the space includes: the following processing is executed for the position relation obtained by performing sensing processing on the voice signal received by any intelligent device in the space: when the position relation changes, determining that the user is in a motion state; determining the change direction of the direction as the moving direction of the user relative to the intelligent voice equipment according to the direction of the user relative to the intelligent voice equipment, which is included by the position relation; multiplying the reciprocal of the distance by the vector of the moving direction according to the distance between the user and the intelligent equipment included in the position relation to obtain the matching degree of the position relation between the intelligent voice equipment and the user; determining the intelligent voice device with the highest matching degree as a target intelligent voice device in the first intelligent voice device and the at least one second intelligent voice device;

13. An intelligent speech device, comprising:

a memory for storing executable instructions;

a processor for implementing the control method of the intelligent speech device according to any one of claims 1 to 11 when executing the executable instructions stored in the memory.

14. A server for controlling an intelligent voice appliance, comprising:

a memory for storing executable instructions;

15. A storage medium storing executable instructions for causing a processor to implement the method of controlling a smart voice device of any one of claims 1 to 11 when executed.