CN108962259B - Processing method and first electronic device - Google Patents

Processing method and first electronic device Download PDF

Info

Publication number
CN108962259B
CN108962259B CN201810825087.4A CN201810825087A CN108962259B CN 108962259 B CN108962259 B CN 108962259B CN 201810825087 A CN201810825087 A CN 201810825087A CN 108962259 B CN108962259 B CN 108962259B
Authority
CN
China
Prior art keywords
condition
sound
server
voice control
electronic device
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810825087.4A
Other languages
Chinese (zh)
Other versions
CN108962259A (en
Inventor
付宏让
赵佩璐
钱泰良
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Lenovo Beijing Ltd
Original Assignee
Lenovo Beijing Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Lenovo Beijing Ltd filed Critical Lenovo Beijing Ltd
Priority to CN201810825087.4A priority Critical patent/CN108962259B/en
Publication of CN108962259A publication Critical patent/CN108962259A/en
Application granted granted Critical
Publication of CN108962259B publication Critical patent/CN108962259B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/30Distributed recognition, e.g. in client-server systems, for mobile phones or network applications
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/28Constructional details of speech recognition systems
    • G10L15/34Adaptation of a single recogniser for parallel processing, e.g. by use of multiple processors or cloud computing
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS OR SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • G10L2015/223Execution procedure of a spoken command

Abstract

The embodiment of the application discloses a processing method and first electronic equipment, wherein sound input is monitored, and if the first sound is detected to meet a first condition in a plurality of preset conditions, a voice control function is started; wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers; collecting a second sound input after the first sound input; and obtaining the analysis result of the first server associated with the first condition for the second sound. That is, for the same first electronic device, different servers can be accessed according to sounds meeting different preset conditions. Therefore, the function that one first electronic device accesses a plurality of servers is realized.

Description

Processing method and first electronic device
Technical Field
The present application relates to the field of data transmission technologies, and in particular, to a processing method and a first electronic device.
Background
The voice control is widely applied, and for example, electronic equipment such as a smart sound box and the like has a voice control function.
At present, electronic devices with a voice control function all realize access of one server singly, for example, amazon's Alexa smart speaker sends a voice to a server corresponding to amazon when detecting that the voice is input; when the intelligent sound box of the google detects voice input, the voice can be sent to the server corresponding to the google.
Disclosure of Invention
In view of the above, the present application provides a processing method and a first electronic device.
In order to achieve the above purpose, the present application provides the following technical solutions:
a processing method is applied to a first electronic device and comprises the following steps:
monitoring the sound input;
if the first sound is detected to meet a first condition in a plurality of preset conditions, starting a voice control function;
wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers;
collecting a second sound input after the first sound input;
and obtaining the analysis result of the first server associated with the first condition for the second sound.
Wherein, if it is detected that the first sound satisfies a first condition of a plurality of preset conditions, starting the voice control function comprises:
detecting whether the first sound meets a plurality of preset conditions respectively to obtain detection results corresponding to the first sound and the preset conditions respectively;
and if the detection result comprises that the first sound meets the first condition, starting a voice control function.
Wherein, still include:
determining a first voice control mode from a plurality of voice control modes;
one voice control mode represents a preset condition that the sound capable of starting the voice control function of the first electronic equipment meets, and different voice control modes represent different preset conditions;
the first voice control mode represents that a preset condition met by a sound capable of starting a voice control function of the first electronic equipment is the first condition.
Wherein, if it is detected that the first sound satisfies the first condition, starting the voice control function includes:
detecting whether the first sound satisfies a first condition;
and if the first sound meets the first condition, starting a voice control function.
Wherein the initiating a voice control function comprises:
determining a server that analyzes subsequently input speech as the first server associated with the first condition.
A first electronic device, comprising:
a voice receiving means for monitoring voice input;
the processing chip is used for starting a voice control function if the voice receiving device detects that the first sound meets a first condition in a plurality of preset conditions;
wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers;
the voice receiving device is further configured to: collecting a second sound input after the first sound input;
the processor chip is further to: obtaining an analysis result of the first server associated with the first condition for the second sound;
and the output device is used for outputting the analysis result.
Wherein, still include:
the transmission device is used for carrying out signal transmission with the second electronic equipment;
the processing chip is specifically configured to, when executing the voice control function if it is detected that the first sound satisfies a first condition of the plurality of preset conditions and the voice control function is started:
detecting whether the first sound meets a plurality of preset conditions respectively to obtain detection results corresponding to the first sound and the preset conditions respectively;
and if the detection result comprises that the first sound meets the first condition, starting a voice control function.
Wherein, still include:
the processor chip is further to: determining a first voice control mode from a plurality of voice control modes;
one voice control mode represents a preset condition that the sound capable of starting the voice control function of the first electronic equipment meets, and different voice control modes represent different preset conditions;
the first voice control mode represents that a preset condition met by a sound capable of starting a voice control function of the first electronic equipment is the first condition.
Wherein, the processing chip is specifically configured to, when executing that if it is detected that the first sound satisfies a first condition of the plurality of preset conditions, the voice control function is started:
detecting whether the first sound satisfies a first condition;
and if the first sound meets the first condition, starting a voice control function.
A first electronic device, comprising:
a memory for storing a program;
a processor configured to execute the program, the program specifically configured to:
monitoring the sound input;
if the first sound is detected to meet a first condition in a plurality of preset conditions, starting a voice control function;
wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers;
collecting a second sound input after the first sound input;
and obtaining the analysis result of the first server associated with the first condition for the second sound.
According to the technical scheme, compared with the prior art, the processing method comprises the steps of firstly monitoring sound input, and starting a voice control function if a first sound is detected to meet a first condition of a plurality of preset conditions; wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers; collecting a second sound input after the first sound input; and obtaining the analysis result of the first server associated with the first condition for the second sound. That is, for the same first electronic device, different servers can be accessed according to sounds meeting different preset conditions. Therefore, the function that one first electronic device accesses a plurality of servers is realized.
Drawings
In order to more clearly illustrate the embodiments of the present application or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the prior art will be briefly introduced below, it is obvious that the drawings in the following description are only embodiments of the present application, and for those skilled in the art, other drawings can be obtained according to the provided drawings without creative efforts.
FIG. 1 is a block diagram of one implementation of a processing system provided in an embodiment of the present application;
FIG. 2 is a block diagram of another implementation of a processing system provided in an embodiment of the present application;
fig. 3 is a signaling diagram of an implementation manner of a processing method provided in the embodiment of the present application;
FIG. 4 is a schematic diagram of an implementation of a voice-controlled mode selection page according to an embodiment of the present application;
fig. 5 is a block diagram of an implementation manner of a first electronic device according to an embodiment of the present application;
fig. 6 is a block diagram of another implementation manner of a first electronic device according to an embodiment of the present application;
fig. 7 is a block diagram of another implementation manner of a first electronic device according to an embodiment of the present application;
FIG. 8 is a block diagram of another implementation of a processing system provided in an embodiment of the present application;
fig. 9 is a block diagram of still another implementation manner of a first electronic device according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only a part of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.
The processing method provided in the embodiment of the present application may be applied to a processing system, and is a structure diagram of an implementation manner of the processing system provided in the embodiment of the present application, as shown in fig. 1.
The processing system comprises: a first electronic device 11, and a plurality of servers 12.
The first electronic device 11 may be a PAD, a smart phone, a laptop, a desktop, a smart speaker, or a smart home.
Each server 12 may be a background server, or a server cluster consisting of several servers, or a cloud computing service center.
In an alternative embodiment, different servers correspond to different vendors. The service functions that can be implemented by the servers corresponding to different providers may be different, for example, the server corresponding to amazon provider may implement a shopping service function, for example, the server corresponding to amazon provider may analyze a sound that characterizes shopping; for another example, a server corresponding to google supplier may implement a music service function, for example, the server corresponding to google supplier may respond to a sound representing playing music.
Alternatively, the service functions that can be implemented by the servers corresponding to different providers may be the same.
Optionally, in this embodiment of the present application, the servers may be divided according to providers, and different servers correspond to different providers.
At present, an electronic device (taking a smart speaker as an example) can only access one server, for example, if the smart speaker can only access a server of amazon provider, if the smart speaker receives a sound of a server corresponding to google provider or a server corresponding to hundredth provider, because the smart speaker cannot access the server corresponding to google provider or the server corresponding to hundredth provider, the smart speaker cannot respond to the sound.
In the embodiment of the present application, the first electronic device 11 may implement access to multiple servers. So that it can respond to voice control commands for different types of servers.
Fig. 2 is a block diagram of another implementation manner of the processing system according to the embodiment of the present application.
The processing system comprises: a first electronic device 21, a second electronic device 22, and a plurality of servers 12.
The first electronic device 21 may be a Docking Station (Docking Station) or the like.
The second electronic device 22 may be a PAD or a smartphone or a laptop or desktop device.
The first electronic device 21 and the second electronic device 22 may be connected wirelessly or by wire.
Each of the first electronic device 21 and the second electronic device 22 may include a wireless data transmission device, and be wirelessly connected by the wireless data transmission device.
The wireless data transmission device can be a low-speed long-distance transmission device or a high-speed short-distance transmission device, and the high-speed short-distance transmission device comprises: an ultra high frequency (uhf) wireless signal based data transmission device, an NFC (Near Field Communication) device; the low-speed long-distance transmission device includes: wifi (WIreless-FIdelity) device, bluetooth device.
The data transmission rate of the data transmission device based on the ultrahigh frequency wireless signal can be up to 6 GB/S.
The frequency range of the ultrahigh frequency wireless signal is 3GHz to 30 GHz.
In an alternative embodiment, the first electronic device 21 may be as shown in fig. 2, and the first electronic device 21 may include a carrying device, and the carrying device may carry the second electronic device 22. In an alternative embodiment, the first electronic device 21 may not include a carrying device, that is, the first electronic device 21 does not carry the second electronic device. The first electronic device 21 and the second electronic device 22 may not be attached to each other, and may have a certain distance.
Since the first electronic device 21 and the second electronic device 22 may have a certain distance, the wireless data transmission device may be a wifi device or a bluetooth device. In practical applications, if the distance between the first electronic device 21 and the second electronic device 22 is smaller than a preset distance, for example, 10cm, the wireless data transmission device may be a high-speed short-distance transmission device.
Each server 12 may be a server, or a server cluster consisting of several servers, or a cloud computing service center. Specifically, reference may be made to the description of the server 12 in fig. 1, which is not described herein again.
In the embodiment of the present application, the first electronic device 21 may control the second electronic device 22 to implement access to multiple servers. So that it can respond to voice control commands for different types of servers.
The processing method provided in the embodiment of the present application is generally described with reference to fig. 1 or fig. 2, and as shown in fig. 3, a signaling diagram of an implementation manner of the processing method provided in the embodiment of the present application is provided, where the method includes:
step S301: the first electronic device 31 monitors the sound input.
The first electronic device 31 may be the first electronic device 11 described in fig. 1 or the first electronic device 21 described in fig. 2.
In an alternative embodiment, the voice monitoring function of the first electronic device 31 may be always on. Namely, the first electronic device can monitor the external sound in real time.
In an alternative embodiment, the first electronic device 31 may process the first sound, for example, perform a noise reduction process and/or an encoding process on the first sound; or, the first sound is subjected to noise reduction processing and/or decoding processing.
Step S302: the first electronic device 31 starts a voice control function if it detects that the first sound satisfies a first condition of a plurality of preset conditions.
Wherein, a preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers.
In an alternative embodiment, one predetermined condition may be: the first sound comprises a wake-up word, or the first sound is a control instruction for controlling and starting a voice control function.
The first condition is any one of a plurality of preset conditions.
Different awakening words or control instructions corresponding to different preset conditions are different; for example, the preset conditions are: preset condition 1, preset condition 2 and preset condition 4; the preset condition 1 may be: the first sound comprises a wake-up word 1, or the first sound is a first control instruction for controlling and starting a voice control function; the preset condition 2 may be: the first sound comprises a wake-up word 2, or the first sound is a second control instruction for controlling and starting a voice control function; the preset condition 3 may be: the first sound includes a wakeup word 3, or the first sound is a third control instruction that controls activation of a voice control function.
The first control instruction, the second control instruction and the third control instruction are different control instructions; the awakening words 1, 2 and 3 are different awakening words.
If the first sound satisfies any one of a plurality of predetermined conditions, a voice control function of the first electronic device is activated. In the embodiment of the application, after monitoring that the first sound meeting the first condition is input, the first electronic device enters a state of waiting for subsequent sound input, and after receiving the subsequent sound input, the subsequent sound can be sent to the server corresponding to the first condition, so that the server performs speech recognition on the subsequent sound. In the embodiment of the application, a function that can be realized after the first electronic device monitors that the first sound meeting the first condition is input is called a voice control function.
In an optional embodiment, before monitoring the first sound input meeting the first condition, the first electronic device may monitor a plurality of sounds that do not meet all of the predetermined conditions, assuming that the first sound meeting any one of the predetermined conditions is the first sound including any one of a plurality of wakeup words, assuming that the plurality of wakeup words are: a wake-up word 1, a wake-up word 2, and a wake-up word 3; the first electronic device may also monitor sounds such as "i have eaten", "really good eating", and the like before monitoring the first sound including the wakeup word 1, 2, or 3, and none of the sounds can start the voice control function of the first electronic device. If the voice control function of the first electronic device is not activated, the first electronic device is always in a state of monitoring whether the current voice input meets any one of a plurality of predetermined conditions. I.e. always in a state of looking for "the first sound input satisfying any one of a plurality of predetermined conditions".
Step S303: the first electronic device 31 captures a second sound input after the first sound input.
Step S304: the first electronic device 31 sends the second sound to the first server 12 corresponding to the first condition.
Different preset conditions represent different servers, the first server 12 is different if the first sound satisfies different conditions, and optionally, the provider corresponding to the first server is different if the first sound satisfies different conditions.
The first electronic device may send the second sound to the first server, and the first server performs analysis processing on the second sound and feeds back an analysis result to the first electronic device.
In an alternative embodiment, the first electronic device 31 may process the second sound, for example, perform noise reduction processing and/or encoding processing on the second sound; alternatively, the second sound may be subjected to noise reduction processing and/or decoding processing, and the processed sound may be transmitted to the first server 12.
In an alternative embodiment, the first electronic device 31 may send the second sound to the first server 12 directly without processing the second sound.
Step S305: the first server 12 analyzes the second sound, obtains an analysis result, and feeds the analysis result back to the first electronic device 31.
In an alternative embodiment, each server may be in a speech recognition state at all times; when the second sound transmitted by the first electronic equipment is received, the second sound is analyzed, and when the second sound transmitted by the first electronic equipment is not received, the state of waiting for receiving the sound is achieved.
In an optional embodiment, each server may be in a non-speech recognition state, and when it is detected that the first sound satisfies a first condition of the plurality of preset conditions, the first condition corresponds to the first server, and at this time, the first server is in the speech recognition state; since the first sound does not satisfy the other preset conditions except the first condition, the other servers except the first server are in the non-voice recognition state.
Step S306: the first electronic device 31 outputs the analysis result.
In an alternative embodiment, the first electronic device 31 may output the analysis result by voice or display the analysis result through a display screen.
The application discloses a processing method, firstly, monitoring sound input, and starting a voice control function if a first sound is detected to meet a first condition in a plurality of preset conditions; wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers; collecting a second sound input after the first sound input; and obtaining the analysis result of the first server associated with the first condition for the second sound. That is, for the same first electronic device, different servers can be accessed according to sounds meeting different preset conditions. Therefore, the function that one first electronic device accesses a plurality of servers is realized.
In an alternative embodiment, "activating a voice control function" may include: determining a server that analyzes subsequently input speech as the first server associated with the first condition.
With reference to fig. 1, an implementation of "determining a server for analyzing subsequently input speech as the first server associated with the first condition" is described, but the embodiment of the present application provides, but is not limited to, the following ways.
In a first mode, a communication connection between the first electronic device and the first server is established.
In an alternative embodiment, the servers respectively associated with the plurality of preset conditions are not connected to the first electronic device until the first sound input meeting the first condition of the plurality of preset conditions is received. Optionally, when it is detected that the first sound satisfies a first condition of a plurality of preset conditions, a communication connection of a first server associated with the first condition is established. Optionally, other servers not associated with the first condition remain unconnected to the first electronic device.
Optionally, in the first manner, after the first electronic device establishes a communication connection with the first server, the first server may automatically enter a speech recognition state.
In the second mode, the first server is controlled to enter a voice recognition state.
In an alternative embodiment, the servers respectively associated with the plurality of preset conditions are connected with the first electronic device before the first sound input meeting the first condition of the plurality of preset conditions is received. But multiple servers may not enter the speech recognition state. Optionally, when a first sound input meeting a first condition of the multiple preset conditions is detected, the first server associated with the first condition is controlled to enter a speech recognition state. Optionally, other servers not associated with the first condition remain not entering the speech recognition state.
With reference to fig. 2, an implementation of "determining a server for analyzing a subsequently input voice as the first server associated with the first condition" is described, but the embodiment of the present application provides, but is not limited to, the following ways.
In a first mode, a first instruction is generated, and the first instruction is used for instructing a second electronic device to establish a communication connection with a first server.
In an alternative embodiment, the servers respectively associated with the plurality of predetermined conditions are not connected to the second electronic device 22 until the first sound input satisfying the first condition of the plurality of predetermined conditions is received. Optionally, when it is detected that the first sound satisfies a first condition of the plurality of preset conditions, a first instruction may be generated to instruct the second electronic device 22 to establish a communication connection with a first server associated with the first condition. Optionally, other servers not associated with the first condition remain unconnected to the second electronic device 22.
Optionally, in the first manner, after the first electronic device establishes a communication connection with the first server, the first server may automatically enter a speech recognition state.
In a second mode, a second instruction is generated, and the second instruction is used for instructing the second electronic device to trigger the first server associated with the first condition to enter the voice recognition state.
In an alternative embodiment, the servers respectively associated with the plurality of preset conditions are connected with the second electronic device before the first sound input meeting the first condition of the plurality of preset conditions is received. But multiple servers may not enter the speech recognition state. Optionally, when detecting the first sound input meeting the first condition of the multiple preset conditions, a second instruction is generated to instruct the second electronic device 22 to trigger the first server associated with the first condition to enter the speech recognition state. Optionally, other servers not associated with the first condition remain not entering the speech recognition state.
There are various ways of "detecting that the first sound satisfies the first condition of the preset conditions, and starting the voice control function", and the embodiments of the present application provide but are not limited to the following:
the first method comprises the following steps: detecting whether the first sound meets a plurality of preset conditions respectively to obtain detection results corresponding to the first sound and the preset conditions respectively; and if the detection result comprises that the first sound meets the first condition, starting a voice control function.
Whether the first sound respectively meets a plurality of preset conditions is detected. The power consumption of the first electronic device is increased, and if the first electronic device is not connected with the power supply in real time, the cruising ability of the first electronic device is also reduced.
In order to reduce the power consumption of the first electronic device, the embodiment of the present application further provides a second implementation manner.
The second implementation mode comprises the following steps:
it is understood that, when the user uses the first electronic device, the user may only make the first sound satisfying the first condition of the plurality of preset conditions during a period of time (e.g., one or more weeks), for example, the user may only play music with the first electronic device during one or more weeks, for example, the user may be provided with music through a server corresponding to a Google supplier of the first electronic device, and the first condition may include a wakeup word of Google for the first sound.
If the first implementation manner is adopted, after the user sends the first sound "hi, Google", the first electronic device still detects whether the first sound meets a plurality of preset conditions respectively, so that the processing speed is low, and the power consumption of the first electronic device is increased.
The second implementation manner comprises the following steps:
determining a first voice control mode from a plurality of voice control modes;
one voice control mode represents a preset condition that the sound capable of starting the voice control function of the first electronic equipment meets, and different voice control modes represent different preset conditions;
the first voice control mode represents that a preset condition met by a sound capable of starting a voice control function of the first electronic equipment is the first condition.
Correspondingly, if the first sound is detected to meet the first condition, the starting of the voice control function comprises the following steps:
detecting whether the first sound satisfies a first condition;
and if the first sound meets the first condition, starting a voice control function.
To sum up, after receiving first sound, only need detect first sound whether satisfy first condition can, need not to detect whether first sound satisfies other conditions except first condition to improve the processing speed, reduced first electronic equipment's consumption.
There are various implementations of "determining the first voice control mode from the multiple voice control modes", and the embodiments of the present application provide, but are not limited to, the following: first, at least one key is arranged on the first electronic device. The user can change the voice control mode of the first electronic equipment through the at least one key. Second, the first electronic device may display a voice control mode selection page, where the voice control mode selection page displays a plurality of voice control modes; from the plurality of speech control modes, a first speech control mode is determined.
In an alternative embodiment, the voice control mode selection page may be as shown in fig. 4, and optionally, the voice control mode may be the name of the provider, and the voice control mode selection page in fig. 4 may include: google, hundredths, amazon, millet, etc.
The method is described in detail in the embodiments disclosed in the present application, and the method of the present application can be implemented by various types of apparatuses, so that an apparatus is also disclosed in the present application, and the following detailed description is given of specific embodiments.
As shown in fig. 5, a structure diagram of an implementation manner of a first electronic device provided in an embodiment of the present application is provided, where the first electronic device may include:
a voice receiving means 51 for monitoring the voice input.
Alternatively, the voice receiving device 51 may be a microphone.
The processing chip 52 is configured to start the voice control function if the voice receiving apparatus detects that the first sound satisfies a first condition of a plurality of preset conditions.
Wherein, a preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers.
Alternatively, the processing chip 52 may be a CPU (central processing unit) or a bluetooth chip.
The voice receiving apparatus 51 is further configured to: and collecting a second sound input after the first sound input.
The processor chip 52 is also configured to: and obtaining the analysis result of the first server associated with the first condition for the second sound.
And an output device 53 for outputting the analysis result.
Alternatively, the output device may be a voice playing device or a display, and the voice playing device may be a speaker. Optionally, the voice receiving device and the voice playing device are in a working state at the same time. That is, the first electronic device may monitor the input of the sound while playing the voice, or may play the voice while monitoring the input of the sound.
Optionally, the first electronic device may further include: and the noise processing device is used for processing the sound collected by the voice receiving device and reducing the noise contained in the sound.
Optionally, the method further includes: the transmission device is used for carrying out signal transmission with the second electronic equipment;
the processing chip is specifically configured to, when executing the voice control function if it is detected that the first sound satisfies a first condition of the plurality of preset conditions and the voice control function is started:
detecting whether the first sound meets a plurality of preset conditions respectively to obtain detection results corresponding to the first sound and the preset conditions respectively;
and if the detection result comprises that the first sound meets the first condition, starting a voice control function.
The transmission device may be a wireless data transmission device or a wired transmission device.
Optionally, the method further includes:
the processor chip is further to: determining a first voice control mode from a plurality of voice control modes;
one voice control mode represents a preset condition that the sound capable of starting the voice control function of the first electronic equipment meets, and different voice control modes represent different preset conditions;
the first voice control mode represents that a preset condition met by a sound capable of starting a voice control function of the first electronic equipment is the first condition.
Optionally, when the processing chip executes the voice control function if it is detected that the first sound satisfies a first condition of the plurality of preset conditions, the processing chip is specifically configured to:
detecting whether the first sound satisfies a first condition;
and if the first sound meets the first condition, starting a voice control function.
Optionally, when executing the voice control starting function, the processing chip is specifically configured to:
determining a server that analyzes subsequently input speech as the first server associated with the first condition.
There are various specific implementations of the first electronic device shown in fig. 5, and the embodiments of the present application provide, but are not limited to, the following.
A first implementation manner, which is described with reference to fig. 1; fig. 6 is a block diagram of another implementation manner of a first electronic device according to an embodiment of the present application.
The first electronic device 11 comprises at least one microphone 51, at least one loudspeaker 53 and a processing chip 52; optionally, a codec 61 and/or at least one amplifier 62 may also be included. The connection relationship of the above components can be as shown in fig. 6.
Optionally, the codec 61 may further include a DSP (digital signal processing) 63, which processes the sound collected by the at least one microphone 51 to reduce noise contained in the sound. Alternatively, the DSP 63 and the codec 61 may be independent of each other, or the DSP is integrated in the codec 61.
Optionally, the processing chip 52 may further integrate a storage unit, such as a DRAM (Dynamic random access Memory) and/or a FLASH.
The memory unit may store wake-up words or control commands corresponding to a plurality of preset conditions, respectively.
The operation of the first electronic device 11 shown in fig. 6 will be explained.
Any one of the at least one microphone 51 monitors sound input, and sends the detected first sound to the codec 61, optionally, the DSP in the codec 61 processes the first sound to obtain a noise-reduced first sound, and the codec 61 encodes the noise-reduced first sound and sends the encoded first sound to the processing chip 52; the processing chip 52 detects whether the first sound satisfies any one of a plurality of preset conditions, and if it is detected that the first sound satisfies the first condition of the plurality of preset conditions, establishes a communication connection with the first server 12, and/or controls the first server 12 to enter a speech recognition state.
Any one of the at least one microphone 51 continues to pick up the second sound; the detected second sound is sent to the codec 61, optionally, the DSP in the codec 61 processes the second sound to obtain the second sound after noise reduction, the codec 61 encodes the second sound after noise reduction, and sends the encoded second sound to the processing chip 52, and the processing chip 52 sends the second sound to the first server 12 associated with the first condition.
The first server 12 performs speech analysis processing on the second sound to obtain an analysis result, and sends the analysis result to the processing chip 52; the processing chip 52 sends the analysis result to the codec 61, and the DSP 63 in the optional codec 61 performs noise reduction processing on the analysis result; the codec 61 decodes the analysis result after noise reduction to obtain voice data corresponding to the analysis result; the voice data is sent to the speaker 53 through the amplifier 62, so that the first electronic device plays the analysis result by voice.
Optionally, the first electronic device 11 may have a display screen, which may display the analysis result.
In an alternative embodiment, if the storage capacity of the storage unit integrated by the processing chip 52 is small, the external storage unit (e.g., DRAM and/or FLASH) of the processing chip 52 may store the wake-up words or the control commands corresponding to the plurality of preset conditions in the external storage unit 71. Fig. 7 is a block diagram of another implementation manner of the first electronic device according to the embodiment of the present application.
The difference between fig. 7 and fig. 6 is that the storage unit for storing the wakeup words or the control commands corresponding to the plurality of preset conditions in fig. 7 is external, and the storage unit for storing the wakeup words or the control commands corresponding to the plurality of preset conditions in fig. 6 is integrated in the processing chip 52.
A second implementation manner, which is described with reference to fig. 2; fig. 8 is a block diagram of another implementation manner of the processing system according to the embodiment of the present application.
The first electronic device 21 shown in fig. 8 includes: a microphone 51, a processing chip 52; optionally, a DSP 81, an amplifier 82 and a speaker 84 are also included. Fig. 8 illustrates an example in which the processing chip 52 is a bluetooth chip.
Optionally, the bluetooth chip 52 includes: a wireless data transmission device 83 and a processing device 84, optionally, the wireless data transmission device 83 includes: a wireless data receiving device 831 and a wireless data transmitting device 832.
Optionally, the bluetooth chip 52 may be integrated with a storage unit (e.g., a DRAM and/or a FLASH), and store the wakeup words or the control commands corresponding to the plurality of preset conditions in the external storage unit.
The operation principle shown in fig. 8 will be explained below.
Monitoring sound input in the microphone 51, sending the detected first sound to the DSP 81, performing noise reduction processing on the first sound by the DSP to obtain processed first sound, and sending the processed first sound to the Bluetooth chip 52; the processing device 84 in the bluetooth chip 52 detects whether the first sound satisfies any one of the preset conditions, generates a control command (which may be a first command or a second command) if it is detected that the first sound satisfies the first condition of the preset conditions, and sends the control command to the second electronic device 22 through the wireless data sending device 832 in the wireless data transmission device 83.
The second electronic device 22 establishes a communication connection with the first server 12 based on the control instruction and/or controls the first server 12 to enter a speech recognition state.
The microphone 51 continues to collect the second sound, the second sound is sent to the DSP 81, the DSP performs noise reduction processing on the second sound to obtain processed second sound, and the processed second sound is sent to the Bluetooth chip 52; the processing means 84 in the bluetooth chip 52 controls the wireless data transmitting means 832 in the wireless data transmission means 83 to transmit the second sound to the second electronic device 22.
The second electronic device 22 sends the second sound to the first server 12.
The first server 12 performs speech analysis processing on the second sound to obtain an analysis result, and sends the analysis result to the second electronic device 22; the second electronic device 22 transmits the analysis result to the first electronic device 21.
The first electronic device 21 receives the analysis result through the wireless data receiving device 831 in the wireless data transmission device 83; the processing means 84 in the first electronic device 21 controls the wireless data transmitting means 832 to transmit the voice data corresponding to the analysis result to the speaker 84 through the amplifier 82, thereby playing the voice data corresponding to the analysis result.
Optionally, the first electronic device 21 may have a display screen, which may display the analysis result.
As shown in fig. 9, a structure diagram of a further implementation manner of a first electronic device provided in an embodiment of the present application is shown, where the first electronic device includes:
a memory 91 for storing a program;
a processor 92 configured to execute the program, the program being specifically configured to:
monitoring the sound input;
if the first sound is detected to meet a first condition in a plurality of preset conditions, starting a voice control function;
wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers;
collecting a second sound input after the first sound input;
and obtaining the analysis result of the first server associated with the first condition for the second sound.
The memory 91 may comprise a high-speed RAM memory, and may also include a non-volatile memory (non-volatile memory), such as at least one disk memory.
The processor 92 may be a central processing unit CPU or an application Specific Integrated circuit asic or one or more Integrated circuits configured to implement embodiments of the present application.
Optionally, the electronic device may further include a communication bus 93 and a communication interface 94, wherein the memory 91, the processor 92, and the communication interface 94 complete mutual communication through the communication bus 93;
alternatively, the communication interface 94 may be an interface of a communication module, such as an interface of a GSM module.
An embodiment of the present application further provides a readable storage medium, on which a computer program is stored, where the computer program is executed by a processor to implement the steps included in any one of the processing methods described above.
It should be noted that, in the present specification, the embodiments are all described in a progressive manner, each embodiment focuses on differences from other embodiments, and the same and similar parts among the embodiments may be referred to each other. For the device or system type embodiment, since it is basically similar to the method embodiment, the description is simple, and for the relevant points, refer to the partial description of the method embodiment.
It is further noted that, herein, relational terms such as first and second, and the like may be used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Also, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in a process, method, article, or apparatus that comprises the element.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied directly in hardware, in a software module executed by a processor, or in a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The previous description of the disclosed embodiments is provided to enable any person skilled in the art to make or use the present application. Various modifications to these embodiments will be readily apparent to those skilled in the art, and the generic principles defined herein may be applied to other embodiments without departing from the spirit or scope of the application. Thus, the present application is not intended to be limited to the embodiments shown herein but is to be accorded the widest scope consistent with the principles and novel features disclosed herein.

Claims (6)

1. A processing method applied to a first electronic device, the processing method comprising:
monitoring the sound input;
if the first sound is detected to meet a first condition in a plurality of preset conditions, starting a voice control function;
wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers;
collecting a second sound input after the first sound input;
obtaining an analysis result of the first server associated with the first condition for the second sound;
further comprising:
determining a first voice control mode from a plurality of voice control modes;
one voice control mode represents a preset condition that the sound capable of starting the voice control function of the first electronic equipment meets, and different voice control modes represent different preset conditions;
the first voice control mode represents that a preset condition met by sound capable of starting a voice control function of the first electronic equipment is the first condition;
the initiating voice control function comprises: determining that a server analyzing subsequently input speech is the first server associated with the first condition;
the determining that the server analyzing the subsequently input speech is the first server associated with the first condition comprises:
establishing a communication connection between the first electronic equipment and the first server; wherein, when detecting that the first sound meets a first condition of a plurality of preset conditions, establishing a communication connection of the first server associated with the first condition;
or, controlling the first server to enter a speech recognition state, wherein when a first sound input meeting a first condition of a plurality of preset conditions is detected, the first server associated with the first condition is controlled to enter the speech recognition state;
or generating a first instruction, wherein the first instruction is used for instructing a second electronic device to establish a communication connection with the first server, and when detecting that a first sound meets a first condition in a plurality of preset conditions, generating a first instruction for instructing the second electronic device to establish a communication connection with the first server associated with the first condition;
or generating a second instruction, wherein the second instruction is used for instructing a second electronic device to trigger a first server associated with a first condition to enter a voice recognition state, and when a first sound input meeting a first condition in a plurality of preset conditions is detected, generating a second instruction instructing the second electronic device to trigger the first server associated with the first condition to enter the voice recognition state.
2. The processing method according to claim 1, wherein if it is detected that the first sound satisfies the first condition, the starting of the voice control function comprises:
detecting whether the first sound satisfies a first condition;
and if the first sound meets the first condition, starting a voice control function.
3. A first electronic device, comprising:
a voice receiving means for monitoring voice input;
the processing chip is used for starting a voice control function if the voice receiving device detects that the first sound meets a first condition in a plurality of preset conditions;
wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers;
the voice receiving device is further configured to: collecting a second sound input after the first sound input;
the processor chip is further to: obtaining an analysis result of the first server associated with the first condition for the second sound;
output means for outputting the analysis result;
further comprising:
the processor chip is further to: determining a first voice control mode from a plurality of voice control modes;
one voice control mode represents a preset condition that the sound capable of starting the voice control function of the first electronic equipment meets, and different voice control modes represent different preset conditions;
the first voice control mode represents that a preset condition met by sound capable of starting a voice control function of the first electronic equipment is the first condition;
the initiating voice control function comprises: determining that a server analyzing subsequently input speech is the first server associated with the first condition;
the determining that the server analyzing the subsequently input speech is the first server associated with the first condition comprises:
establishing a communication connection between the first electronic equipment and the first server; wherein, when detecting that the first sound meets a first condition of a plurality of preset conditions, establishing a communication connection of the first server associated with the first condition;
or, controlling the first server to enter a speech recognition state, wherein when a first sound input meeting a first condition of a plurality of preset conditions is detected, the first server associated with the first condition is controlled to enter the speech recognition state;
or generating a first instruction, wherein the first instruction is used for instructing a second electronic device to establish a communication connection with the first server, and when detecting that a first sound meets a first condition in a plurality of preset conditions, generating a first instruction for instructing the second electronic device to establish a communication connection with the first server associated with the first condition;
or generating a second instruction, wherein the second instruction is used for instructing a second electronic device to trigger a first server associated with a first condition to enter a voice recognition state, and when a first sound input meeting a first condition in a plurality of preset conditions is detected, generating a second instruction instructing the second electronic device to trigger the first server associated with the first condition to enter the voice recognition state.
4. The first electronic device of claim 3, further comprising:
and the transmission device is used for carrying out signal transmission with the second electronic equipment.
5. The first electronic device of claim 3, wherein the processing chip, when executing the voice control function that is started if it is detected that the first sound satisfies a first condition of a plurality of preset conditions, is specifically configured to:
detecting whether the first sound satisfies a first condition;
and if the first sound meets the first condition, starting a voice control function.
6. A first electronic device, comprising:
a memory for storing a program;
a processor configured to execute the program, the program specifically configured to:
monitoring the sound input;
if the first sound is detected to meet a first condition in a plurality of preset conditions, starting a voice control function;
wherein, one preset condition is at least associated with a server for analyzing the subsequent input voice, and different preset conditions are associated with different servers;
collecting a second sound input after the first sound input;
obtaining an analysis result of the first server associated with the first condition for the second sound;
the program is specifically further configured to:
determining a first voice control mode from a plurality of voice control modes;
one voice control mode represents a preset condition that the sound capable of starting the voice control function of the first electronic equipment meets, and different voice control modes represent different preset conditions;
the first voice control mode represents that a preset condition met by sound capable of starting a voice control function of the first electronic equipment is the first condition;
the initiating voice control function comprises: determining that a server analyzing subsequently input speech is the first server associated with the first condition;
the determining that the server analyzing the subsequently input speech is the first server associated with the first condition comprises:
establishing a communication connection between the first electronic equipment and the first server; wherein, when detecting that the first sound meets a first condition of a plurality of preset conditions, establishing a communication connection of the first server associated with the first condition;
or, controlling the first server to enter a speech recognition state, wherein when a first sound input meeting a first condition of a plurality of preset conditions is detected, the first server associated with the first condition is controlled to enter the speech recognition state;
or generating a first instruction, wherein the first instruction is used for instructing a second electronic device to establish a communication connection with the first server, and when detecting that a first sound meets a first condition in a plurality of preset conditions, generating a first instruction for instructing the second electronic device to establish a communication connection with the first server associated with the first condition;
or generating a second instruction, wherein the second instruction is used for instructing a second electronic device to trigger a first server associated with a first condition to enter a voice recognition state, and when a first sound input meeting a first condition in a plurality of preset conditions is detected, generating a second instruction instructing the second electronic device to trigger the first server associated with the first condition to enter the voice recognition state.
CN201810825087.4A 2018-07-25 2018-07-25 Processing method and first electronic device Active CN108962259B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810825087.4A CN108962259B (en) 2018-07-25 2018-07-25 Processing method and first electronic device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810825087.4A CN108962259B (en) 2018-07-25 2018-07-25 Processing method and first electronic device

Publications (2)

Publication Number Publication Date
CN108962259A CN108962259A (en) 2018-12-07
CN108962259B true CN108962259B (en) 2021-06-15

Family

ID=64464137

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810825087.4A Active CN108962259B (en) 2018-07-25 2018-07-25 Processing method and first electronic device

Country Status (1)

Country Link
CN (1) CN108962259B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111096680B (en) * 2019-12-31 2022-02-01 广东美的厨房电器制造有限公司 Cooking equipment, electronic equipment, voice server, voice control method and device
CN112104949B (en) * 2020-09-02 2022-05-27 北京字节跳动网络技术有限公司 Method and device for detecting pickup assembly and electronic equipment

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106960667A (en) * 2017-03-08 2017-07-18 杭州联络互动信息科技股份有限公司 Position reminding methods, devices and systems
CN107704275A (en) * 2017-09-04 2018-02-16 百度在线网络技术(北京)有限公司 Smart machine awakening method, device, server and smart machine
US9934777B1 (en) * 2016-07-01 2018-04-03 Amazon Technologies, Inc. Customized speech processing language models

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10115400B2 (en) * 2016-08-05 2018-10-30 Sonos, Inc. Multiple voice services

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9934777B1 (en) * 2016-07-01 2018-04-03 Amazon Technologies, Inc. Customized speech processing language models
CN106960667A (en) * 2017-03-08 2017-07-18 杭州联络互动信息科技股份有限公司 Position reminding methods, devices and systems
CN107704275A (en) * 2017-09-04 2018-02-16 百度在线网络技术(北京)有限公司 Smart machine awakening method, device, server and smart machine

Also Published As

Publication number Publication date
CN108962259A (en) 2018-12-07

Similar Documents

Publication Publication Date Title
CN109166593B (en) Audio data processing method, device and storage medium
CN107277754B (en) Bluetooth connection method and Bluetooth peripheral equipment
JP6489563B2 (en) Volume control method, system, device and program
EP3127116B1 (en) Attention-based dynamic audio level adjustment
US11605372B2 (en) Time-based frequency tuning of analog-to-information feature extraction
CN109284080B (en) Sound effect adjusting method and device, electronic equipment and storage medium
KR20170131465A (en) Control of electronic devices based on direction of speech
CN109151211B (en) Voice processing method and device and electronic equipment
CN107993672B (en) Frequency band expanding method and device
CN106847307B (en) Signal detection method and device
CN111433737A (en) Electronic device and control method thereof
KR20150103586A (en) Method for processing voice input and electronic device using the same
CN109872710B (en) Sound effect modulation method, device and storage medium
JP2017509009A (en) Track music in an audio stream
CN108962259B (en) Processing method and first electronic device
CN110600058A (en) Method and device for awakening voice assistant based on ultrasonic waves, computer equipment and storage medium
CN112291708A (en) Data transmission method, device, equipment and computer readable storage medium
CN112687286A (en) Method and device for adjusting noise reduction model of audio equipment
CN107682553B (en) Call signal sending method and device, mobile terminal and storage medium
CN109346102B (en) Method and device for detecting audio beginning crackle and storage medium
TW202020652A (en) Voice processing method and apparatus
CN112259076A (en) Voice interaction method and device, electronic equipment and computer readable storage medium
US11908464B2 (en) Electronic device and method for controlling same
JP2016033530A (en) Utterance section detection device, voice processing system, utterance section detection method and program
CN113905302B (en) Method and device for triggering prompt message and earphone

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant