CN116580711B - Audio control method and device, storage medium and electronic equipment - Google Patents

Audio control method and device, storage medium and electronic equipment Download PDF

Info

Publication number
CN116580711B
CN116580711B CN202310846810.8A CN202310846810A CN116580711B CN 116580711 B CN116580711 B CN 116580711B CN 202310846810 A CN202310846810 A CN 202310846810A CN 116580711 B CN116580711 B CN 116580711B
Authority
CN
China
Prior art keywords
equipment
intelligent
audio signal
audio
distributed network
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202310846810.8A
Other languages
Chinese (zh)
Other versions
CN116580711A (en
Inventor
鲁勇
王献康
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Intengine Technology Co Ltd
Original Assignee
Beijing Intengine Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Intengine Technology Co Ltd filed Critical Beijing Intengine Technology Co Ltd
Priority to CN202310846810.8A priority Critical patent/CN116580711B/en
Publication of CN116580711A publication Critical patent/CN116580711A/en
Application granted granted Critical
Publication of CN116580711B publication Critical patent/CN116580711B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/225Feedback of the input speech
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D30/00Reducing energy consumption in communication networks
    • Y02D30/70Reducing energy consumption in communication networks in wireless communication networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Selective Calling Equipment (AREA)

Abstract

The embodiment of the application discloses an audio control method, an audio control device, a storage medium and electronic equipment. The method comprises the following steps: receiving an audio signal through intelligent equipment positioned in a distributed network, identifying command words in the audio signal, determining candidate equipment associated with the command words, if the number of the candidate equipment is at least two, calculating audio signal energy values received by different intelligent equipment in the distributed network, determining target equipment from the candidate equipment according to the audio signal energy values, and waking up the target equipment to realize audio control. According to the scheme provided by the embodiment of the application, the target equipment can be determined from the plurality of candidate equipment in the distributed network according to the audio signal, so that the target equipment is controlled, and the recognition accuracy of the voice command and the use efficiency of the intelligent equipment are effectively improved.

Description

Audio control method and device, storage medium and electronic equipment
Technical Field
The application relates to the technical field of audio data processing, in particular to an audio control method, an audio control device, a storage medium and electronic equipment.
Background
In recent years, with the popularization of smart speakers, voice assistants, and the like, voice recognition is increasingly accepted, and the application of the technology is also increasingly in a scene such as: the control of the device by voice, the realization of the content search, is an important part of the daily life of the person. The continuous development of the voice recognition technology is perfect, the development and popularization of a voice intelligent home control system are greatly promoted, a large number of intelligent home control systems taking voice sound boxes or other voice collectors as control interfaces appear in the market at present, and great convenience is brought to the daily life of users.
However, in the current products, when a user controls a certain device through voice, the user needs to accurately speak the name of the device and perform the operation, which has high requirements on the user, and when there are a plurality of identical devices, the voice control may have misoperation, for example, when the living room and bedroom of the user have air conditioners, the user needs to accurately speak where to control the air conditioner to perform the follow-up operation, if only: the instruction of opening the air conditioner may cause the system to separate the equipment to be controlled, thereby causing misoperation and affecting the use efficiency.
Disclosure of Invention
The embodiment of the application provides an audio control method, an audio control device, a storage medium and electronic equipment, which can determine target equipment from a plurality of candidate equipment in a distributed network according to audio signals, thereby controlling the target equipment and effectively improving the recognition accuracy of voice instructions and the use efficiency of intelligent equipment.
The embodiment of the application provides an audio control method, which comprises the following steps:
receiving an audio signal by an intelligent device located in a distributed network;
identifying a command word in the audio signal and determining a candidate device with which the command word is associated;
If the number of the candidate devices is at least two, calculating the energy values of the audio signals received by different intelligent devices in the distributed network;
and determining a target device from the candidate devices according to the audio signal energy value, and waking up the target device to realize audio control.
In an embodiment, the determining the candidate device to which the command word is associated includes:
judging whether the command word contains a device name:
if so, determining the associated candidate equipment by the equipment name;
and if the function information is not contained, extracting the corresponding function information in the command word, and determining candidate equipment associated with the function information.
In one embodiment, the step of calculating the audio signal energy value includes:
filtering for each frame of audio signal;
and obtaining the energy value of the audio signal after filtering, and calculating the average value of the energy values of all the frames of audio signals to be used as the average energy value of the audio signal.
In an embodiment, the determining a target device from the candidate devices according to the audio signal energy value includes:
determining the current area of the user according to the energy values of the audio signals received by different intelligent devices in the distributed network;
And searching target equipment located in the area from the candidate equipment.
In an embodiment, the waking up the target device to implement audio control includes:
generating a control instruction according to the command word and the target equipment;
broadcasting the control instruction to the distributed network to wake up the target equipment, and executing the corresponding instruction after the target equipment receives the control instruction.
In an embodiment, before receiving the audio signal by the smart device located in the distributed network, the method further comprises:
accessing at least one intelligent device to a public network;
controlling the at least one intelligent device to broadcast own device information and receiving broadcast information of other intelligent devices;
selecting a central device from the at least one intelligent device according to the number of broadcasts received by the intelligent device and the signal strength;
and establishing a distributed network based on the central equipment, and controlling other intelligent equipment to join the distributed network.
In an embodiment, the establishing a distributed network based on the central device and controlling other intelligent devices to join the distributed network includes:
Controlling the central equipment to generate a private network key and broadcasting the private network key to other intelligent equipment;
and controlling the other intelligent devices to exit the public network after receiving the private network key and enter a distributed network corresponding to the private network key.
The embodiment of the application also provides an audio control device, which comprises:
the receiving module is used for receiving the audio signal through intelligent equipment positioned in the distributed network;
the identification module is used for identifying command words in the audio signal and determining candidate devices associated with the command words;
a calculation module, configured to calculate audio signal energy values received by different intelligent devices in the distributed network when the number of the candidate devices is at least two;
and the wake-up module is used for determining target equipment from the candidate equipment according to the audio signal energy value, and waking up the target equipment to realize audio control.
Embodiments of the present application also provide a storage medium storing a computer program adapted to be loaded by a processor to perform the steps of the audio control method according to any of the embodiments above.
The embodiment of the application also provides an electronic device, which comprises a memory and a processor, wherein the memory stores a computer program, and the processor executes the steps in the audio control method according to any embodiment by calling the computer program stored in the memory.
According to the audio control method, the device, the storage medium and the electronic equipment, the intelligent equipment in the distributed network can receive the audio signals, the command words in the audio signals are identified, candidate equipment associated with the command words is determined, if the number of the candidate equipment is at least two, the energy values of the audio signals received by different intelligent equipment in the distributed network are calculated, the target equipment is determined from the candidate equipment according to the energy values of the audio signals, and the target equipment is awakened to realize audio control. According to the scheme provided by the embodiment of the application, the target equipment can be determined from the plurality of candidate equipment in the distributed network according to the audio signal, so that the target equipment is controlled, and the recognition accuracy of the voice command and the use efficiency of the intelligent equipment are effectively improved.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings that are needed in the description of the embodiments will be briefly described below, it being obvious that the drawings in the following description are only some embodiments of the present application, and that other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
Fig. 1 is a schematic system diagram of an audio control device according to an embodiment of the present application;
fig. 2 is a schematic flow chart of an audio control method according to an embodiment of the present application;
fig. 3 is a schematic flow chart of another audio control method according to an embodiment of the present application;
fig. 4 is a schematic structural diagram of an audio control device according to an embodiment of the present application;
fig. 5 is another schematic structural diagram of an audio control device according to an embodiment of the present application;
fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
The embodiment of the application provides an audio control method, an audio control device, a storage medium and electronic equipment. Specifically, the audio control method of the embodiment of the application can be executed by an electronic device or a server, wherein the electronic device can be a terminal. The terminal can be a smart phone, a tablet personal computer, a notebook computer, a touch screen, a game machine, a personal computer (PC, personal Computer), a personal digital assistant (Personal Digital Assistant, PDA), an intelligent home and other devices, and the terminal can also comprise a client, wherein the client can be a media playing client or an instant messaging client and the like.
For example, when the audio control method is operated on the electronic device, the electronic device may receive the audio signal through the intelligent devices located in the distributed network, identify the command word in the audio signal, determine the candidate device associated with the command word, if the number of the candidate devices is at least two, calculate the energy values of the audio signal received by different intelligent devices in the distributed network, determine the target device from the candidate devices according to the energy values of the audio signal, and wake up the target device to realize audio control. The electronic device may be any one of intelligent devices in a distributed network.
Referring to fig. 1, fig. 1 is a schematic system diagram of an audio control device according to an embodiment of the application. The system may include at least one smart device 1000, the at least one smart device 1000 may be connected through a distributed network. The electronic device 1000 may be a terminal device having computing hardware capable of supporting and executing software products corresponding to multimedia. The network may be a wireless network or a wired network, such as a Wireless Local Area Network (WLAN), a Local Area Network (LAN), a cellular network, a 2G network, a 3G network, a 4G network, a 5G network, etc. In addition, the different electronic devices 1000 may also be connected to other embedded platforms or to a server, a personal computer, etc. by using their own bluetooth network or hotspot network, where bluetooth may further include conventional bluetooth (Classic Bluetooth, BT for short) and bluetooth low energy (Bluetooth Low Energy, BLE for short). The server may be an independent physical server, a server cluster or a distributed system formed by a plurality of physical servers, or a cloud server providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, basic cloud computing services such as big data and artificial intelligent platforms.
The embodiment of the application provides an audio control method which can be used by electronic equipment. The embodiment of the application is described by taking an audio control method executed by electronic equipment as an example. The electronic equipment comprises a microphone, wherein the microphone is used for receiving voice sent by a user and converting the voice into an audio signal, so that subsequent equipment control is realized according to the audio signal.
Referring to fig. 2, the specific flow of the method may be as follows:
step 101, receiving an audio signal by a smart device located in a distributed network.
In an embodiment, the smart device may be a smart home device accessing a distributed network, such as a smart light, a smart television, a smart air conditioner, a smart curtain, a smart water heater, a smart washing machine, and the like. The smart device may include at least one microphone, for receiving voice uttered by a user and converting the voice into an audio signal.
In an embodiment, to further improve the recognition rate of the audio, a noise reduction operation may be performed after the audio signal is received, for example, the human voice and the environmental sound in the audio signal are separated, so as to obtain the human voice audio. In one implementation manner, the audio signal may be input into an existing human voice separation model, and separation of human voice audio and environmental audio is performed to obtain human voice audio, where the human voice separation model may be a human voice separation model based on a PIT (Permutation Invariant Train, substitution-invariant training) deep neural network. In another implementation manner, the separation tool is used to separate the voice audio from the environmental audio, so as to obtain the voice audio, for example, voice extraction processing can be performed according to the frequency spectrum characteristic or the frequency characteristic of the audio data.
Step 102, identifying a command word in the audio signal and determining a candidate device with which the command word is associated.
In an embodiment, after receiving the audio signal, the intelligent device in the distributed network may further identify in advance whether the audio signal includes an activation word, and when detecting that the audio signal includes the activation word, continue to identify the command word in the audio signal. The activation words may be preset for the intelligent device, or may be set for user customization, for example, words such as "little college", "heaven cat eidolon", "Hi Siri", etc., which is not further limited in the present application.
In an embodiment, the command word may be composed of a plurality of keywords, and the step of identifying the command word in the audio signal may include extracting at least one keyword in the audio signal and forming the command word from the keywords. Wherein the step of extracting at least one keyword in the audio signal further comprises: the method comprises the steps of constructing an audio recognition model aiming at any set keyword, firstly training a preset number of audio instructions according to extracted feature data after feature extraction, generating the audio recognition model aiming at the keyword through training, constructing a list of trigger keywords by a plurality of different keyword audio recognition models, carrying out feature extraction on the acquired audio signals after the intelligent equipment receives the audio signals, matching the audio recognition models in the list of the trigger keywords, and then outputting the instruction with the highest score in the list of the trigger keywords as a recognition result.
In one embodiment, when keyword information is detected, the smart device may generate a command word from at least one keyword and determine candidate devices associated with the command word. In another embodiment, the step of generating the command word may be performed at the cloud end, for example, the intelligent device determines whether the current environment is in a network connection state after detecting the keyword information, if yes, uploads the preprocessed audio signal to the cloud voice recognition platform, reprocesses the submitted audio signal on the cloud voice recognition platform, and then converts the recognized text result into the command word for the candidate intelligent device.
It should be noted that, the keywords in the extracted voice information generally include three types of keywords, namely, keywords representing control objects, namely, intelligent devices, such as intelligent electric lamps, intelligent televisions, intelligent air conditioners, and the like; second, keywords indicating control actions, such as "on", "off", etc.; and thirdly, identifying information, such as a living room and a room, which is a keyword for clearly controlling the object. When the three types of keywords are extracted from the audio signal, the obtained voice control instruction is a complete voice control instruction, for example, a lamp in a living room is turned on. At this time, the candidate device associated with the command word generated according to the keyword is the lamp of the living room.
If the intelligent device corresponding to the current audio signal has a plurality of similar devices, for example, the audio signal only includes keywords of the control object and the control action, the intelligent device is an incomplete voice control instruction, i.e. a fuzzy voice control instruction. Such as "turn on air conditioner", "turn on light", etc., but it is not clear which air conditioner, which light, and therefore it is a vague control instruction. If the audio signal contains a keyword representing identification information, that is, a keyword for identifying a specific control object, the current control instruction is a complete control instruction, for example, a "turning on a bedroom lamp" and a "turning on a living room lamp", where the "bedroom" and the "living room" are identification information of the control object, and the smart home device at a certain position can be clarified. It will be appreciated that the identification information may be information of a color or shape other than the position information, such as "turn on a red light" or "turn on a circular light", where "red" and "circular" are identification information.
Thus, if the current control command is an incomplete control command, such as "turn on a light", then multiple lights located in the distributed network may all be candidates, such as a bedroom light and a living room light.
In an embodiment, when the command word in the audio signal is identified, the collected audio signal may be analyzed, and the environmental description word therein may be extracted. The environmental descriptors can be pre-stored environmental descriptors, and the audio signals are compared with the environmental descriptors stored in the system by analyzing the audio signals so as to extract the environmental descriptors in the audio signals of the user. The environmental descriptor may be a word, phrase or sentence related to location, environment, such as "hot living room" or "too dark room", etc.
If the current voice control instruction cannot clearly control the object, determining the current voice control instruction as an incomplete voice control instruction. At this time, the standard voice control instruction matched with the current voice control instruction can be determined according to the semantic association rule. The semantic association rule comprises a corresponding relation between the environment descriptor and a standard voice control instruction, for example, the standard voice control instruction corresponding to 'hot living room' is 'air conditioner for opening living room'. The semantic association rule can be prestored in a distributed network system, and a user can edit and store the semantic association rule.
It can be understood that the distributed network system can acquire and analyze the audio signals of the user in real time, record and analyze the operation data of each intelligent home equipment, and obtain the semantic association rule. The intelligent home system records daily talking, communication and voice control instructions of the user and the running state of the intelligent home equipment, such as 'room is hot' of the daily talking of the user, and gives a voice control instruction 'air conditioner of the room is opened', and at the moment, the air conditioner of the room starts to be opened and run. The semantic association rule, namely the relation between the environment descriptor and the standard voice control instruction, is obtained through a plurality of records and analyses, for example, the semantic association relation is formed by 'room very hot' and 'air conditioner for opening the room', and the semantic association relation is formed by 'room too dark' and 'lamp for opening the room'. Correspondingly, if the voice control instruction obtained through the semantic association rule does not contain a keyword representing the identification information, the associated devices are required to be used as candidate devices.
Step 103, if the number of candidate devices is at least two, calculating the energy value of the audio signal received by different intelligent devices in the distributed network.
In an embodiment, if the number of candidate devices is only one, for example, the user speaks "turn on bedroom lamp", and the voice control command includes a keyword indicating identification information, it may be determined that the candidate device is the bedroom lamp, and then the command is directly transmitted to the bedroom lamp through the distributed network and turned on without executing the subsequent steps, which is not further described in this embodiment.
When the number of candidate devices is at least two, such as a lamp including a living room and a lamp of a bedroom, it is necessary to further determine a target device from among the candidate devices and perform control. The present embodiment performs the calculation by calculating the energy values of the above-mentioned audio signals received by different intelligent devices in the distributed network. It can be understood that the larger the energy value is, the closer the position of the user is to the current device, whereas the smaller the energy value is, the farther the position of the user is to the current device, so that a plurality of intelligent devices in the distributed network can obtain the energy value of the audio signal sent by the user and simultaneously receive the audio signal, and then comprehensively determine the current position of the user. For example, if the user is close to the bedroom lamp, the bedroom lamp in the candidate device can be determined to be the target device to be controlled, and if the user is close to the living room lamp, the living room lamp in the candidate device can be determined to be the target device to be controlled.
The energy value of the audio signal received by the intelligent device can be represented by a db value of the audio signal, and the db value can be calculated in various manners, such as calculating audio energy data and calculating a root mean square db value, improving a root mean square algorithm, and the like.
And 104, determining a target device from the candidate devices according to the energy value of the audio signal, and waking up the target device to realize audio control.
In an embodiment, when determining the target device, considering that the audio signal energy value may be attenuated due to shielding by an obstacle, and finally, the detection result is inaccurate, and the current position of the user may be comprehensively determined by combining the camera on the intelligent device and the audio signal energy value, for example, when the distance between the user and the bedroom lamp is determined to be relatively short according to the audio signal energy value, and the distance between the user and the living room lamp is relatively long, the current user is detected by the camera on the bedroom lamp, and the current user is not detected by the camera on the living room lamp, so that the target device is the bedroom lamp. At this point, the lights located in the bedroom in the distributed network are again awakened and the corresponding command word "turn on" is executed.
In an embodiment, after the target device is awakened and audio control is implemented according to the command word, a prompt message may be generated to remind the user, for example, after the air conditioner in the bedroom is completed, a voice signal of "the air conditioner is turned on" may be sent through the built-in speaker to remind the user.
As can be seen from the above, in the audio control method provided by the embodiment of the present application, the intelligent devices located in the distributed network may receive the audio signal, identify the command word in the audio signal, determine the candidate device associated with the command word, if the number of the candidate devices is at least two, calculate the energy values of the audio signal received by different intelligent devices in the distributed network, determine the target device from the candidate devices according to the energy values of the audio signal, and wake up the target device to implement audio control. According to the scheme provided by the embodiment of the application, the target equipment can be determined from the plurality of candidate equipment in the distributed network according to the audio signal, so that the target equipment is controlled, and the recognition accuracy of the voice command and the use efficiency of the intelligent equipment are effectively improved.
Fig. 3 is a schematic flow chart of an audio control method according to an embodiment of the application. The specific flow of the method can be as follows:
And step 201, at least one intelligent device is accessed to a public network, and the at least one intelligent device is controlled to broadcast own device information and receive broadcast information of other intelligent devices.
In an embodiment, the condition that at least one intelligent device accesses the same network is that a plurality of devices possess the same network key (network key), so the purpose of the network configuration is to have the plurality of devices possess the same network key. Therefore, in this embodiment, after the voice chip in the smart device recognizes the command word for starting networking, the at least one smart device may be set as a public network key (preset by factory), and devices in the public network key may communicate with each other.
The intelligent devices entering the public network can continuously broadcast own device information (such as MAC) according to a preset device discovery protocol, and receive device discovery data packets from other intelligent devices and cache the device discovery data packets in a device discovery list.
Step 202, selecting a central device from at least one intelligent device according to the number of broadcasts received by the intelligent device and the signal strength.
Specifically, the embodiment can comprehensively elect the intelligent device at the central position according to the number of the discovered intelligent devices and the accumulation of the receiving sensitivity (or signal strength, for example, RSSI) to be used as the optimal node of the networking.
Step 203, a distributed network is established based on the central device, and other intelligent devices are controlled to join the distributed network.
In one embodiment, the best node elected in step 202 generates a private network key according to a certain rule, and the best node broadcasts the generated private network key to other intelligent devices in the public network through radio frequency. It should be noted that, the intelligent device that obtains the private network key may complete forwarding at least once. Finally, the intelligent device with the private network key can exit the public network and enter the private distributed network, so that the automatic networking is completed. That is, the step of establishing a distributed network based on the central device and controlling other intelligent devices to join the distributed network may include: the control center device generates a private network key, broadcasts the private network key to other intelligent devices, and controls the other intelligent devices to exit the public network after receiving the private network key and enter a distributed network corresponding to the private network key.
In step 204, an audio signal is received by a smart device located in a distributed network.
In an embodiment, the smart device may be a smart home device accessing a distributed network, such as a smart light, a smart television, a smart air conditioner, a smart curtain, a smart water heater, a smart washing machine, and the like. The smart device may include at least one microphone, for receiving voice uttered by a user and converting the voice into an audio signal.
Step 205, identifying a command word in the audio signal and determining a candidate device with which the command word is associated.
In an embodiment, if the corresponding smart device in the command word of the current audio signal has a plurality of devices of the same type, for example, the audio signal only includes the control object and the keyword of the control action, the command is an incomplete voice control command, i.e. a fuzzy voice control command. Such as "turn on" but not explicitly which light is and therefore a vague control command. All lamps in the distributed network that meet the on-condition may be considered candidates at this time, such as bedroom lamps and living room lamps.
If the audio signal contains a keyword representing identification information, that is, a keyword for identifying a specific control object, the current control instruction is a complete control instruction, for example, a "turning on a bedroom lamp" and a "turning on a living room lamp", where the "bedroom" and the "living room" are identification information of the control object, and the smart home device at a certain position can be clarified. That is, the step of determining the candidate device with which the command word is associated may include: judging whether the command word contains the equipment name, if so, determining the equipment name to be associated with the candidate equipment, and if not, extracting the corresponding function information in the command word and determining the candidate equipment associated with the function information.
In step 206, if the number of candidate devices is at least two, the energy values of the audio signals received by different intelligent devices in the distributed network are calculated.
In one embodiment, if the number of candidate devices is only one, for example, the user speaks "turn on bedroom lights", and the voice control command includes a keyword indicating identification information, it may be determined that the candidate device is a bedroom light, and no subsequent steps are required. When the number of candidate devices is at least two, such as a lamp including a living room and a lamp of a bedroom, it is necessary to further determine a target device from among the candidate devices and perform control. The present embodiment performs the calculation by calculating the energy values of the above-mentioned audio signals received by different intelligent devices in the distributed network. Specifically, the step of calculating the energy value of the audio signal may include: filtering is carried out for each frame of audio signal, the energy value of the audio signal after filtering is obtained, and the average value of the energy values of all frames of audio signal is calculated to be used as the average energy value of the audio signal.
Step 207, determining the current area of the user according to the energy values of the audio signals received by different intelligent devices in the distributed network.
And step 208, searching target equipment in the area from the candidate equipment, and generating a control instruction according to the command word and the target equipment.
Specifically, the energy value of the audio signal sent by the user can be received by a plurality of intelligent devices in the distributed network at the same time, and then the current area of the user is comprehensively determined. For example, if the user is located in the bedroom area, the bedroom lamp in the candidate device may be determined to be the target device to be controlled, and if the user is located in the living room area, the living room lamp in the candidate device may be determined to be the target device to be controlled.
Step 209, broadcasting the control instruction to the distributed network to wake up the target device, and executing the corresponding instruction after the target device receives the control instruction.
After the target equipment to be controlled is determined, the determined optimal control command can be broadcasted to the distributed network, and the equipment to be controlled can perform corresponding actions after receiving the optimal control command. For example, when a user wants to turn on a lamp in a bedroom, the user can say that the bedroom lamp is turned on in a traditional sense, only then the user can accurately control the designated equipment, and after the user joins in the near wake-up accurate control scheme, the user can turn on the bedroom lamp only by speaking the lamp on in the bedroom, and can turn on the living room lamp only by speaking the lamp on in the living room.
All the above technical solutions may be combined to form an optional embodiment of the present application, and will not be described in detail herein.
As can be seen from the foregoing, the audio control method provided by the embodiment of the present application may access at least one intelligent device to a public network, control the at least one intelligent device to broadcast its own device information and receive broadcast information of other intelligent devices, select a central device from the at least one intelligent device according to the number of broadcasts and signal strength received by the intelligent device, establish a distributed network based on the central device, control the other intelligent devices to join the distributed network, receive an audio signal through the intelligent devices located in the distributed network, identify a command word in the audio signal, determine candidate devices associated with the command word, if the number of candidate devices is at least two, calculate the energy values of the audio signal received by different intelligent devices in the distributed network, determine the current area of the user according to the energy values of the audio signal received by different intelligent devices in the distributed network, search a target device located in the area from the candidate devices, generate a control instruction according to the command word and the target device, broadcast the control instruction to the distributed network, and execute the corresponding instruction after the target device receives the control instruction. According to the scheme provided by the embodiment of the application, the target equipment can be determined from the plurality of candidate equipment in the distributed network according to the audio signal, so that the target equipment is controlled, and the recognition accuracy of the voice command and the use efficiency of the intelligent equipment are effectively improved.
In order to facilitate better implementation of the audio control method according to the embodiment of the present application, the embodiment of the present application further provides an audio control device. Referring to fig. 4, fig. 4 is a schematic structural diagram of an audio control device according to an embodiment of the application. The audio control apparatus may include:
a receiving module 301, configured to receive an audio signal through an intelligent device located in a distributed network;
an identification module 302, configured to identify a command word in the audio signal and determine a candidate device associated with the command word;
a calculating module 303, configured to calculate energy values of audio signals received by different intelligent devices in the distributed network when the number of the candidate devices is at least two;
and a wake-up module 304, configured to determine a target device from the candidate devices according to the audio signal energy value, and wake-up the target device to implement audio control.
In an embodiment, please further refer to fig. 5, fig. 5 is a schematic diagram of another structure of an audio control device according to an embodiment of the present application. Wherein the calculating module 303 may include:
a filtering sub-module 3031 for filtering each frame of audio signal;
a calculation submodule 3032 is configured to obtain an energy value of the filtered audio signal, and calculate an average value of energy values of all frame audio signals as an average energy value of the audio signal.
In an embodiment, the wake module 304 may include:
a determining submodule 3041, configured to determine an area where a user is currently located according to energy values of audio signals received by different intelligent devices in the distributed network;
a searching submodule 3042, configured to search the candidate device for a target device located in the area.
In an embodiment, the audio control device further comprises:
the networking module 305 is configured to access at least one intelligent device to a public network before the receiving module 301 receives the audio signal, control the at least one intelligent device to broadcast its own device information and receive broadcast information of other intelligent devices, select a central device from the at least one intelligent device according to the number of broadcasts received by the intelligent device and the signal strength, establish a distributed network based on the central device, and control other intelligent devices to join the distributed network.
All the above technical solutions may be combined to form an optional embodiment of the present application, and will not be described in detail herein.
As can be seen from the above, in the audio control apparatus provided by the embodiment of the present application, an intelligent device located in a distributed network receives an audio signal, identifies a command word in the audio signal, determines candidate devices associated with the command word, if the number of the candidate devices is at least two, calculates energy values of the audio signal received by different intelligent devices in the distributed network, determines a target device from the candidate devices according to the energy values of the audio signal, and wakes the target device to implement audio control. According to the scheme provided by the embodiment of the application, the target equipment can be determined from the plurality of candidate equipment in the distributed network according to the audio signal, so that the target equipment is controlled, and the recognition accuracy of the voice command and the use efficiency of the intelligent equipment are effectively improved.
Correspondingly, the embodiment of the application also provides electronic equipment which can be a terminal or a server, wherein the terminal can be terminal equipment such as a smart phone, a tablet personal computer, a notebook computer, a touch screen, a game machine, a personal computer (PC, personal Computer), a personal digital assistant (Personal Digital Assistant, PDA) and the like. Fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present application, as shown in fig. 6. The electronic device 400 includes a processor 401 having one or more processing cores, a memory 402 having one or more storage media, and a computer program stored on the memory 402 and executable on the processor. The processor 401 is electrically connected to the memory 402. It will be appreciated by those skilled in the art that the electronic device structure shown in the figures is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The processor 401 is a control center of the electronic device 400, connects various parts of the entire electronic device 400 using various interfaces and lines, and performs various functions of the electronic device 400 and processes data by running or loading software programs and/or modules stored in the memory 402, and calling data stored in the memory 402, thereby performing overall monitoring of the electronic device 400.
In the embodiment of the present application, the processor 401 in the electronic device 400 loads the instructions corresponding to the processes of one or more application programs into the memory 402 according to the following steps, and the processor 401 executes the application programs stored in the memory 402, so as to implement various functions:
receiving an audio signal by an intelligent device located in a distributed network;
identifying a command word in the audio signal and determining a candidate device with which the command word is associated;
if the number of the candidate devices is at least two, calculating the energy values of the audio signals received by different intelligent devices in the distributed network;
and determining a target device from the candidate devices according to the audio signal energy value, and waking up the target device to realize audio control.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Optionally, as shown in fig. 6, the electronic device 400 further includes: a touch display 403, a radio frequency circuit 404, an audio circuit 405, an input unit 406, and a power supply 407. The processor 401 is electrically connected to the touch display 403, the radio frequency circuit 404, the audio circuit 405, the input unit 406, and the power supply 407, respectively. It will be appreciated by those skilled in the art that the electronic device structure shown in fig. 6 is not limiting of the electronic device and may include more or fewer components than shown, or may combine certain components, or a different arrangement of components.
The touch display 403 may be used to display a graphical user interface and receive operation instructions generated by a user acting on the graphical user interface. The touch display screen 403 may include a display panel and a touch panel. Wherein the display panel may be used to display information entered by a user or provided to a user as well as various graphical user interfaces of the electronic device, which may be composed of graphics, text, icons, video, and any combination thereof. Alternatively, the display panel may be configured in the form of a liquid crystal display (LCD, liquid Crystal Display), an Organic Light-Emitting Diode (OLED), or the like. The touch panel may be used to collect touch operations on or near the user (such as operations on or near the touch panel by the user using any suitable object or accessory such as a finger, stylus, etc.), and generate corresponding operation instructions, and the operation instructions execute corresponding programs. Alternatively, the touch panel may include two parts, a touch detection device and a touch controller. The touch detection device detects the touch azimuth of a user, detects a signal brought by touch operation and transmits the signal to the touch controller; the touch controller receives touch information from the touch detection device, converts it into touch point coordinates, and sends the touch point coordinates to the processor 401, and can receive and execute commands sent from the processor 401. The touch panel may overlay the display panel, and upon detection of a touch operation thereon or thereabout, the touch panel is passed to the processor 401 to determine the type of touch event, and the processor 401 then provides a corresponding visual output on the display panel in accordance with the type of touch event. In the embodiment of the present application, the touch panel and the display panel may be integrated into the touch display screen 403 to realize the input and output functions. In some embodiments, however, the touch panel and the touch panel may be implemented as two separate components to perform the input and output functions. I.e. the touch-sensitive display 403 may also implement an input function as part of the input unit 406.
In an embodiment of the present application, the graphical user interface is generated on the touch display 403 by the processor 401 executing an application program. The touch display 403 is used for presenting a graphical user interface and receiving an operation instruction generated by a user acting on the graphical user interface.
The radio frequency circuitry 404 may be used to transceive radio frequency signals to establish wireless communication with a network device or other electronic device via wireless communication.
The audio circuitry 405 may be used to provide an audio interface between a user and an electronic device through a speaker, microphone. The audio circuit 405 may transmit the received electrical signal after audio data conversion to a speaker, where the electrical signal is converted into a sound signal for output; on the other hand, the microphone converts the collected sound signals into electrical signals, which are received by the audio circuit 405 and converted into audio data, which are processed by the audio data output processor 401 and sent via the radio frequency circuit 404 to e.g. another electronic device, or which are output to the memory 402 for further processing. The audio circuit 405 may also include an ear bud jack to provide communication of the peripheral headphones with the electronic device.
The input unit 406 may be used to receive input numbers, character information, or user characteristic information (e.g., fingerprint, iris, facial information, etc.), and to generate keyboard, mouse, joystick, optical, or trackball signal inputs related to user settings and function control.
The power supply 407 is used to power the various components of the electronic device 400. Alternatively, the power supply 407 may be logically connected to the processor 401 through a power management system, so as to implement functions of managing charging, discharging, and power consumption management through the power management system. The power supply 407 may also include one or more of any of a direct current or alternating current power supply, a recharging system, a power failure detection circuit, a power converter or inverter, a power status indicator, and the like.
Although not shown in fig. 6, the electronic device 400 may further include a camera, a sensor, a wireless fidelity module, a bluetooth module, etc., which are not described herein.
In the foregoing embodiments, the descriptions of the embodiments are emphasized, and for parts of one embodiment that are not described in detail, reference may be made to related descriptions of other embodiments.
As can be seen from the above, the electronic device provided in this embodiment may receive the audio signal through the intelligent devices located in the distributed network, identify the command word in the audio signal, determine the candidate device associated with the command word, if the number of the candidate devices is at least two, calculate the energy values of the audio signal received by different intelligent devices in the distributed network, determine the target device from the candidate devices according to the energy values of the audio signal, and wake up the target device to implement audio control. According to the scheme provided by the embodiment of the application, the target equipment can be determined from the plurality of candidate equipment in the distributed network according to the audio signal, so that the target equipment is controlled, and the recognition accuracy of the voice command and the use efficiency of the intelligent equipment are effectively improved.
Those of ordinary skill in the art will appreciate that all or a portion of the steps of the various methods of the above embodiments may be performed by instructions or by controlling associated hardware, which may be stored in a storage medium and loaded and executed by a processor.
To this end, an embodiment of the present application provides a storage medium in which a plurality of computer programs are stored, the computer programs being capable of being loaded by a processor to perform the steps of any of the audio control methods provided by the embodiments of the present application. For example, the computer program may perform the steps of:
receiving an audio signal by an intelligent device located in a distributed network;
identifying a command word in the audio signal and determining a candidate device with which the command word is associated;
if the number of the candidate devices is at least two, calculating the energy values of the audio signals received by different intelligent devices in the distributed network;
and determining a target device from the candidate devices according to the audio signal energy value, and waking up the target device to realize audio control.
The specific implementation of each operation above may be referred to the previous embodiments, and will not be described herein.
Wherein the storage medium may include: read Only Memory (ROM), random access Memory (RAM, random Access Memory), magnetic or optical disk, and the like.
The steps in any audio control method provided by the embodiment of the present application can be executed by the computer program stored in the storage medium, so that the beneficial effects that any audio control method provided by the embodiment of the present application can be achieved, and detailed descriptions of the foregoing embodiments are omitted.
The foregoing describes in detail an audio control method, apparatus, storage medium and electronic device provided in the embodiments of the present application, and specific examples are applied to illustrate the principles and embodiments of the present application, where the foregoing examples are only used to help understand the method and core idea of the present application; meanwhile, as those skilled in the art will have variations in the specific embodiments and application scope in light of the ideas of the present application, the present description should not be construed as limiting the present application.

Claims (8)

1. An audio control method, comprising:
setting at least one intelligent device as a public network key to access a public network, controlling the at least one intelligent device to broadcast own device information and receive broadcast information of other intelligent devices, selecting a central device from the at least one intelligent device according to the number of broadcasts received by the intelligent device and the signal strength, establishing a distributed network based on the private network key of the central device, and controlling other intelligent devices to join the distributed network;
Receiving an audio signal by an intelligent device located in a distributed network;
identifying a command word in the audio signal, judging whether the command word contains a device name, if so, determining the associated candidate device by the device name, and if not, extracting corresponding function information in the command word and determining the candidate device associated with the function information;
if the number of the candidate devices is at least two, calculating the energy values of the audio signals received by different intelligent devices in the distributed network;
and determining a target device from the candidate devices according to the audio signal energy value, and waking up the target device to realize audio control.
2. The audio control method of claim 1, wherein the calculating of the audio signal energy value includes:
filtering for each frame of audio signal;
and obtaining the energy value of the audio signal after filtering, and calculating the average value of the energy values of all the frames of audio signals to be used as the average energy value of the audio signal.
3. The audio control method of claim 1, wherein the determining a target device from the candidate devices according to the audio signal energy value comprises:
Determining the current area of the user according to the energy values of the audio signals received by different intelligent devices in the distributed network;
and searching target equipment located in the area from the candidate equipment.
4. The audio control method of claim 1, wherein the waking up the target device to achieve audio control comprises:
generating a control instruction according to the command word and the target equipment;
broadcasting the control instruction to the distributed network to wake up the target equipment, and executing the corresponding instruction after the target equipment receives the control instruction.
5. The audio control method according to claim 1, wherein the establishing a distributed network based on the private network key of the center device and controlling other intelligent devices to join the distributed network includes:
controlling the central equipment to generate a private network key and broadcasting the private network key to other intelligent equipment;
and controlling the other intelligent devices to exit the public network after receiving the private network key and enter a distributed network corresponding to the private network key.
6. An audio control device, comprising:
the networking module is used for setting at least one intelligent device as a public network key to access a public network, controlling the at least one intelligent device to broadcast own device information and receive broadcast information of other intelligent devices, selecting a central device from the at least one intelligent device according to the number of the broadcasts received by the intelligent device and the signal strength, establishing a distributed network based on the private network key of the central device, and controlling the other intelligent devices to join the distributed network;
the receiving module is used for receiving the audio signal through intelligent equipment positioned in the distributed network;
the identification module is used for identifying command words in the audio signal, judging whether the command words contain equipment names, if so, determining associated candidate equipment by the equipment names, and if not, extracting corresponding function information in the command words and determining the candidate equipment associated with the function information;
a calculation module, configured to calculate audio signal energy values received by different intelligent devices in the distributed network when the number of the candidate devices is at least two;
And the wake-up module is used for determining target equipment from the candidate equipment according to the audio signal energy value, and waking up the target equipment to realize audio control.
7. A storage medium storing a computer program adapted to be loaded by a processor to perform the steps of the audio control method according to any one of claims 1-5.
8. An electronic device comprising a memory in which a computer program is stored and a processor that performs the steps in the audio control method according to any one of claims 1-5 by calling the computer program stored in the memory.
CN202310846810.8A 2023-07-11 2023-07-11 Audio control method and device, storage medium and electronic equipment Active CN116580711B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202310846810.8A CN116580711B (en) 2023-07-11 2023-07-11 Audio control method and device, storage medium and electronic equipment

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202310846810.8A CN116580711B (en) 2023-07-11 2023-07-11 Audio control method and device, storage medium and electronic equipment

Publications (2)

Publication Number Publication Date
CN116580711A CN116580711A (en) 2023-08-11
CN116580711B true CN116580711B (en) 2023-09-29

Family

ID=87536241

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202310846810.8A Active CN116580711B (en) 2023-07-11 2023-07-11 Audio control method and device, storage medium and electronic equipment

Country Status (1)

Country Link
CN (1) CN116580711B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265012A (en) * 2019-06-19 2019-09-20 泉州师范学院 It can interactive intelligence voice home control device and control method based on open source hardware
AU2020102042A4 (en) * 2020-08-28 2020-10-08 Bi, Taoran Mr A smart home system for the elderly based on speech recognition and FPGA
CN112151013A (en) * 2020-09-25 2020-12-29 海尔优家智能科技(北京)有限公司 Intelligent equipment interaction method
CN113470635A (en) * 2020-04-29 2021-10-01 海信集团有限公司 Intelligent sound box control method and device, central control device and storage medium
CN114172757A (en) * 2021-12-13 2022-03-11 海信视像科技股份有限公司 Server, intelligent home system and multi-device voice awakening method
CN115457958A (en) * 2022-09-05 2022-12-09 珠海格力电器股份有限公司 Voice control method based on BLE discovery and intelligent home system

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110265012A (en) * 2019-06-19 2019-09-20 泉州师范学院 It can interactive intelligence voice home control device and control method based on open source hardware
CN113470635A (en) * 2020-04-29 2021-10-01 海信集团有限公司 Intelligent sound box control method and device, central control device and storage medium
AU2020102042A4 (en) * 2020-08-28 2020-10-08 Bi, Taoran Mr A smart home system for the elderly based on speech recognition and FPGA
CN112151013A (en) * 2020-09-25 2020-12-29 海尔优家智能科技(北京)有限公司 Intelligent equipment interaction method
CN114172757A (en) * 2021-12-13 2022-03-11 海信视像科技股份有限公司 Server, intelligent home system and multi-device voice awakening method
CN115457958A (en) * 2022-09-05 2022-12-09 珠海格力电器股份有限公司 Voice control method based on BLE discovery and intelligent home system

Also Published As

Publication number Publication date
CN116580711A (en) 2023-08-11

Similar Documents

Publication Publication Date Title
CN111223497B (en) Nearby wake-up method and device for terminal, computing equipment and storage medium
CN109240107A (en) A kind of control method of electrical equipment, device, electrical equipment and medium
TW202025138A (en) Voice interaction method, device and system
CN110248021A (en) A kind of smart machine method for controlling volume and system
WO2020048431A1 (en) Voice processing method, electronic device and display device
CN116582382B (en) Intelligent device control method and device, storage medium and electronic device
CN112201246A (en) Intelligent control method and device based on voice, electronic equipment and storage medium
CN111192590B (en) Voice wake-up method, device, equipment and storage medium
CN110738994A (en) Control method, device, robot and system for smart homes
CN109712623A (en) Sound control method, device and computer readable storage medium
CN116580711B (en) Audio control method and device, storage medium and electronic equipment
CN113393838A (en) Voice processing method and device, computer readable storage medium and computer equipment
CN109377993A (en) Intelligent voice system and its voice awakening method and intelligent sound equipment
CN113160815A (en) Intelligent control method, device and equipment for voice awakening and storage medium
CN112420043A (en) Intelligent awakening method and device based on voice, electronic equipment and storage medium
WO2023246036A1 (en) Control method and apparatus for speech recognition device, and electronic device and storage medium
CN116582381B (en) Intelligent device control method and device, storage medium and intelligent device
CN116566760B (en) Smart home equipment control method and device, storage medium and electronic equipment
CN111240634A (en) Sound box working mode adjusting method and device
CN116129942A (en) Voice interaction device and voice interaction method
CN114999496A (en) Audio transmission method, control equipment and terminal equipment
CN111045641B (en) Electronic terminal and voice recognition method
CN116896488A (en) Voice control method and device, storage medium and electronic equipment
CN113241073B (en) Intelligent voice control method, device, electronic equipment and storage medium
CN117012202B (en) Voice channel recognition method and device, storage medium and electronic equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant