WO2021040201A1

WO2021040201A1 - Electronic device and method for controlling same

Info

Publication number: WO2021040201A1
Application number: PCT/KR2020/007257
Authority: WO
Inventors: 이재철; 김해종
Original assignee: 삼성전자주식회사
Priority date: 2019-08-27
Filing date: 2020-06-04
Publication date: 2021-03-04
Also published as: KR20210025348A

Abstract

Disclosed is an electronic device. The electronic device comprises: an acoustic amplification unit; a speaker; a microphone; a microphone acoustic amplification unit; and a processor which amplifies an audio signal through the acoustic amplification unit, outputs the amplified audio signal through the speaker, and when the audio signal output through the speaker and a user's voice signal are input to the microphone, amplifies the signal input to the microphone through the microphone acoustic amplification unit, acquires a voice signal by performing acoustic echo cancellation on the amplified signal, and performs voice recognition preprocessing of the acquired voice signal. The processor determines the gain of the microphone acoustic amplification unit on the basis of the gain of the acoustic amplification unit, and amplifies the signal input through the microphone acoustic amplification unit on the basis of the determined gain.

Description

Electronic device and control method thereof

The present disclosure relates to an electronic device and a method for controlling the same, and more particularly, to an electronic device for providing a response to a user's voice, and a method for controlling the same.

With the development of electronic technology, in recent years, electronic devices receive a user voice and provide a response to the user voice. For example, in response to a user's voice inquiring about weather, the electronic device may provide information about the current weather to the user.

Meanwhile, in the case of an electronic device with a built-in speaker, acoustic echo cancellation (AEC) is performed by comparing an audio signal output from the speaker with an audio signal input to a microphone in order to improve a speech recognition rate. In this case, when the acoustic echo cancellation cannot be effectively performed, there is a problem in that the voice recognition rate for the user's voice is lowered. In addition, even when the user's voice input into the microphone is too loud or too small, there is a problem in that the voice recognition rate for the user's voice is lowered.

Accordingly, there is a need to find a method for more effective speech recognition.

The present disclosure was devised to solve the above-described problem, and an object of the present disclosure is to amplify and use a signal input to a microphone in performing acoustic echo cancellation, and at this time, an electronic device for controlling a gain used for signal amplification, and It is to provide a control method for this.

An electronic device according to an embodiment of the present disclosure amplifies an acoustic amplification unit, a speaker, a microphone, a microphone sound amplification unit, and an audio signal through the sound amplification unit, outputs the amplified audio signal through the speaker, and the speaker When the audio signal and the user's voice signal output through the microphone are input to the microphone, the signal input to the microphone is amplified through the microphone sound amplifying unit, and acoustic echo cancellation (AEC) for the amplified signal is amplified. And a processor configured to obtain the speech signal by performing a pre-processing for speech recognition on the obtained speech signal, wherein the processor determines a gain of the microphone sound amplifying unit based on a gain of the sound amplifying unit, and , Based on the determined gain, the input signal may be amplified through the microphone sound amplifying unit.

Here, the processor may adjust the gain of the microphone sound amplifying unit so as to be in inverse proportion to the gain of the sound amplifying unit.

In addition, when a user command for adjusting the volume of the electronic device is received, the processor adjusts a gain of the sound amplifying unit based on the user command, and the microphone sound amplifying unit based on the adjusted gain of the sound amplifying unit Gain can be adjusted.

Here, when the gain of the sound amplifying part is increased, the processor may decrease the gain of the microphone sound amplifying part.

In addition, the processor may determine whether clipping has occurred in the acquired voice signal, and adjust a gain of the microphone sound amplifying unit based on whether the clipping has occurred.

Here, the processor may reduce the gain of the microphone sound amplifying unit when clipping occurs in the acquired audio signal, and maintain the gain of the microphone sound amplifying unit when clipping does not occur in the acquired audio signal. have.

In addition, the processor determines the average level of the acquired voice signal, and when the average level is less than or equal to a preset level, the gain of the microphone sound amplification unit is increased so that the average level of the acquired voice signal is higher than the preset level. Can be increased.

Meanwhile, in the control method of an electronic device according to an embodiment of the present disclosure, amplifying an audio signal based on a first gain, outputting the amplified audio signal, and the output audio signal and a user's voice signal are Upon input, amplifying the input signal based on the second gain and performing Acoustic Echo Cancellation (AEC) on the amplified signal to obtain a speech signal, and performing speech recognition on the speech signal. And performing pre-processing for, wherein the amplifying the input signal may determine the second gain based on the first gain, and amplify the input signal based on the determined second gain. have.

Here, in the step of amplifying the input signal, the second gain may be adjusted to be inversely proportional to the first gain.

In addition, in the amplifying the audio signal, when a user command for adjusting the volume of the electronic device is received, adjusting the first gain based on the user command, and amplifying the input signal is the adjustment. The second gain may be adjusted based on the first gain.

Here, in the amplifying the input signal, when the first gain is increased, the second gain may be decreased.

In addition, the control method according to an embodiment of the present disclosure may further include determining whether clipping has occurred in the acquired voice signal, and adjusting the second gain based on whether the clipping has occurred.

Here, in the adjusting step, when clipping occurs in the acquired audio signal, the second gain may be reduced, and when no clipping occurs in the acquired audio signal, the second gain may be maintained.

In addition, the control method according to an embodiment of the present disclosure determines the average level of the acquired voice signal, and when the average level is less than or equal to a preset level, the average level of the acquired voice signal is higher than the preset level. It may further include the step of increasing the second gain so as to be.

According to various embodiments of the present disclosure, an audio signal output from a speaker and input to a microphone or a user's voice signal is not clipped, thereby improving speech recognition performance.

1 is a diagram illustrating the use of an artificial intelligence agent system according to an embodiment of the present disclosure,

2 is a block diagram illustrating a configuration of an electronic device according to an embodiment of the present disclosure;

3 is a block diagram illustrating a component that performs preprocessing for speech recognition according to an embodiment of the present disclosure;

4 is a flowchart illustrating a method of adjusting a gain of a microphone sound amplifying unit according to an embodiment of the present disclosure;

5 is a block diagram illustrating a detailed configuration of an electronic device according to an embodiment of the present disclosure, and

6 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present disclosure.

-

Hereinafter, various embodiments of the present disclosure will be described with reference to the accompanying drawings. However, this is not intended to limit the technology described in the present disclosure to a specific embodiment, it should be understood to include various modifications, equivalents, and/or alternatives of the embodiments of the present disclosure. . In connection with the description of the drawings, similar reference numerals may be used for similar elements.

In the present disclosure, expressions such as "have," "may have," "include," or "may include" are the presence of corresponding features (eg, elements such as numbers, functions, actions, or parts). And does not exclude the presence of additional features.

In the present disclosure, expressions such as "A or B," "at least one of A or/and B," or "one or more of A or/and B" may include all possible combinations of items listed together. . For example, "A or B," "at least one of A and B," or "at least one of A or B" includes (1) at least one A, (2) at least one B, Or (3) it may refer to all cases including both at least one A and at least one B.

Expressions such as "first," "second," "first," or "second," used in the present disclosure may modify various elements regardless of order and/or importance, and It is used to distinguish it from other components and does not limit the components.

Some component (eg, the first component) is “(functionally or communicatively) coupled with/to)” to another component (eg, the second component) or “ When referred to as "connected to", it should be understood that a component can be directly connected to another component, or can be connected through another component (eg, a third component). On the other hand, when a component (eg, a first component) is referred to as being “directly connected” or “directly connected” to another component (eg, a second component), a component different from a component It may be understood that no other component (eg, a third component) exists between the elements.

The expression "configured to (configured to)" used in the present disclosure is, for example, "suitable for," "having the capacity to" depending on the situation. ," "designed to," "adapted to," "made to," or "capable of." The term "configured to (or set)" may not necessarily mean only "specifically designed to" in hardware. Instead, in some situations, the expression "a device configured to" may mean that the device "can" along with other devices or parts. For example, the phrase “a processor configured (or configured) to perform A, B, and C” means a dedicated processor (eg, an embedded processor) for performing the operation, or by executing one or more software programs stored in a memory device. , May mean a generic-purpose processor (eg, a CPU or an application processor) capable of performing corresponding operations.

Hereinafter, the present invention will be described in detail with reference to the drawings.

The artificial intelligence agent system may include an electronic device 100 and a response providing server 10 as shown in FIG. 1. Meanwhile, in FIG. 1, the electronic device 100 is shown to be a speaker-type device, but this is only an example.

Electronic devices according to various embodiments of the present disclosure include, for example, televisions, smart phones, tablet PCs, mobile phones, video phones, e-book readers, desktop PCs, laptop PCs, netbook computers, workstations, servers, PDAs, It may include at least one of a portable multimedia player (PMP), an MP3 player, a medical device, a camera, or a wearable device. Wearable devices include accessory types (e.g. watches, rings, bracelets, anklets, necklaces, glasses, contact lenses, or head-mounted-devices (HMD)), fabric or clothing integrals (e.g. electronic clothing), It may include at least one of a body-attached type (eg, a skin pad or a tattoo), or a bio-implantable circuit In some embodiments, the electronic device is, for example, a digital video disk (DVD) player, an audio device, and a refrigerator , Air conditioner, vacuum cleaner, oven, microwave oven, washing machine, air purifier, set-top box, home automation control panel, security control panel, media box, game console, electronic dictionary, electronic key, camcorder, or electronic frame. Can include.

In another embodiment, the electronic device includes various medical devices (e.g., various portable medical measuring devices (blood glucose meter, heart rate meter, blood pressure meter, or body temperature meter, etc.), magnetic resonance angiography (MRA), magnetic resonance imaging (MRI), CT (computed tomography), camera, or ultrasound), navigation device, global navigation satellite system (GNSS), event data recorder (EDR), flight data recorder (FDR), automobile infotainment device, marine electronic equipment (E.g., navigation devices for ships, gyro compasses, etc.), avionics, security devices, vehicle head units, industrial or home robots, drones, ATMs in financial institutions, point of sale points (POSs) in stores. of sales), or IoT devices (eg, light bulbs, various sensors, sprinkler devices, fire alarms, temperature controllers, street lights, toasters, exercise equipment, hot water tanks, heaters, boilers, etc.).

As such, an electronic device according to various embodiments of the present disclosure may be implemented as various types of electronic devices.

Meanwhile, the electronic device 100 may provide a response to a user's voice to a user using an artificial intelligence agent program.

In this case, before receiving the user voice input, the electronic device 100 may receive a user voice including a trigger word for activating the artificial intelligence agent program. For example, the electronic device 100 may receive a user voice including a trigger word such as "Bixby". When a user voice including a trigger word is input, the electronic device 100 may execute or activate an artificial intelligence agent program and wait for input of the user voice. The artificial intelligence agent program may include a conversation system capable of processing user voices and responses in natural language. In this case, in addition to the trigger word for activating the artificial intelligence agent program, after selecting a specific button provided in the electronic device 100, a user's voice may be input.

Thereafter, the electronic device 100 may receive a user's voice. For example, as shown in FIG. 1, the electronic device 100 may receive a user's voice “how is the weather today”. In this case, the electronic device 100 may determine keywords such as "today" and "weather" from "how is the weather today", and provide the keyword to the response providing server 10.

The response providing server 10 may provide a response to a user's voice based on a keyword received from the electronic device 100. For example, the response providing server 10 may provide a response of “at an temperature of 22° C.” to the electronic device 100. In this case, the response providing server 10 may provide a response including text, but this is only an example and may provide a response in a natural language form.

The electronic device 100 may output a response. In this case, the electronic device 100 may process and output the response in natural language using a conversation system. For example, the electronic device 100 may provide a natural language response of “the temperature today is 22°C”.

Meanwhile, the electronic device 100 may use an artificial intelligence agent to provide a response to the user's voice as described above. At this time, the artificial intelligence agent is a dedicated program for providing AI (Artificial Intelligence)-based services (e.g., voice recognition service, secretary service, translation service, search service, etc.), and is an existing general-purpose processor (e.g., CPU) or a separate AI dedicated processor (eg, GPU, etc.). In particular, the artificial intelligence agent can control various modules (eg, a conversation system).

As described above, according to an embodiment of the present disclosure, the electronic device 100 may receive a user's voice and provide a response thereto.

Meanwhile, the electronic device 100 may include a microphone for receiving a user's voice. In this case, the audio signal input through the microphone may be amplified through the microphone sound amplifying unit of the electronic device 100.

In this case, the microphone sound amplifying unit amplifies the audio signal according to the gain. When the level of the amplified audio signal is out of a certain level range, a portion out of the certain level range may be clipped and the clipped audio signal may be output. As described above, when clipping occurs for an audio signal, a difference between the original audio signal and the clipped audio signal occurs in the clipped portion, and thus a problem in that the user's voice cannot be accurately recognized may occur.

In addition, when the user's voice input through the microphone is too small, even though the audio signal is amplified, there may be a problem in that the user's voice is not accurately recognized.

Accordingly, in order to solve this problem, the electronic device 100 according to an embodiment of the present disclosure may adjust the gain of the microphone sound amplifying unit, which will be described in more detail below.

2 is a block diagram illustrating a configuration of an electronic device according to an embodiment of the present disclosure.

Referring to FIG. 2, the electronic device 100 may include an acoustic amplifier 110, a speaker 120, a microphone 130, a microphone acoustic amplification unit 140, and a processor 150.

The sound amplifying unit 110 may amplify an audio signal. Specifically, the sound amplifying unit 110 may amplify an audio signal to be output through the speaker 120 and transmit the amplified audio signal to the speaker 120.

In this case, the degree to which the audio signal is amplified may be determined according to the gain of the sound amplifying unit 110.

Meanwhile, when the electronic device 100 provides audio content, the audio signal may be an audio signal for audio content. In addition, when the electronic device 100 provides a response to the user's voice as a voice, the corresponding voice is It may be an audio signal for.

The speaker 120 may output an audio signal. Specifically, the speaker 120 may output an audio signal input from the sound amplifying unit 110.

The microphone 130 receives (or receives) an audio signal. Specifically, the microphone 130 may receive an audio signal output through the speaker 120 and a user's voice signal.

The microphone sound amplifying unit 140 may amplify a signal input to the microphone 130. In this case, the degree to which the signal input to the microphone 130 is amplified may be determined according to the gain of the microphone sound amplifying unit 140. In this case, the gain of the microphone sound amplifying unit 140 is set to an initial initial value, and then, may be adjusted by the processor 150.

The processor 150 may control overall operations and functions of the electronic device 100.

Specifically, the processor 150 amplifies the audio signal through the sound amplifying unit 110, outputs the amplified audio signal through the speaker 120, and outputs the audio signal and the user's voice through the speaker 120. When the signal is input to the microphone 130, the signal input to the microphone 130 may be amplified through the microphone sound amplifying unit 140.

Further, the processor 150 may acquire a voice signal by performing acoustic echo cancellation (AEC) on the signal amplified by the microphone sound amplifying unit 140. Here, the voice signal may be a voice signal amplified by the microphone sound amplifying unit 140.

Specifically, when the audio signal output through the speaker 120 is input to the microphone 130 as an echo signal, the recognition rate for the user's voice may be lowered by the echo signal. Accordingly, the processor 150 performs acoustic echo cancellation on the signal amplified by the microphone acoustic amplification unit 140 in order to prevent performance degradation due to the echo signal, thereby removing the echo signal and obtaining a voice signal. I can.

On the other hand, the processor 150 determines the gain of the microphone sound amplifying unit 140 based on the gain of the sound amplifying unit 110, and the microphone 130 through the microphone sound amplifying unit 140 based on the determined gain. The input signal can be amplified.

In this case, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 to be in inverse proportion to the gain of the sound amplifying unit 110.

Specifically, an audio signal output through the speaker 120 may be input to the microphone 130. In this case, the high gain of the sound amplifying unit 110 means that the level of the audio signal output through the speaker 120 is high. When an audio signal having a high level is input to the microphone 130, the corresponding audio signal is However, there is a high possibility that clipping will occur.

Accordingly, when the audio signal output through the speaker 120 is input to the microphone 130, the processor 150 is inversely proportional to the gain of the sound amplifying unit 110 in order to prevent the audio signal from being clipped. The gain of the microphone sound amplifying unit 140 may be adjusted.

For example, when a user command for adjusting the volume of the electronic device 100 is received, the processor 150 may adjust the gain of the sound amplifying unit 120 based on the user command. Here, the user command may include a command for selecting a volume control button provided on the electronic device 100 or a volume control button provided on a remote control for controlling the electronic device 100. In addition, the user command may include a user's voice for adjusting the volume.

Specifically, when a user command for increasing the volume of the electronic device 100 is received, the processor 150 increases the gain of the sound amplifying unit 110 and a user command for decreasing the volume of the electronic device 100 When this is received, the gain of the sound amplifying unit 110 may be reduced.

Accordingly, the processor 150 may output an audio signal through the speaker 120 in a volume corresponding to a user command.

Meanwhile, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 based on the gain of the sound amplifying unit 110.

For example, when the gain of the sound amplifying unit 110 is increased, the processor 150 may decrease the gain of the microphone sound amplifying unit 140.

Specifically, when the gain of the sound amplifying unit 110 is greater than a preset value, the processor 150 performs a microphone sound based on the initial value so that the gain of the sound amplifying unit 110 is inversely proportional to a degree greater than a preset value. The gain of the amplifying unit 140 may be reduced.

That is, as the gain of the sound amplifying unit 110 increases based on a preset value, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 to a relatively smaller value based on the initial value.

Here, the preset value is when the audio signal output through the speaker 120 is input to the microphone 130 and the input audio signal is amplified through the gain of the microphone sound amplifying unit 140 set as an initial value, It may be a maximum gain value of the sound amplifying unit 110 in which clipping does not occur or a value close thereto.

Meanwhile, when the gain of the sound amplifying unit 110 is smaller than a preset value, the processor 150 may set the gain of the microphone sound amplifying unit 140 to an initial value.

In this way, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 according to the gain of the sound amplifying unit 110.

Accordingly, according to an embodiment of the present disclosure, when an audio signal output through the speaker 120 is input to the microphone 130, the audio signal is prevented from being clipped, so that optimal acoustic echo cancellation is performed. You can do it.

That is, when clipping occurs for the audio signal input to the microphone 130, a difference between the audio signal output through the speaker 120 and the clipped audio signal occurs, and accordingly, the echo signal can be effectively removed. There will be no.

Accordingly, in an embodiment of the present disclosure, in order to prevent the audio signal input through the microphone 130 from being clipped, the gain of the microphone sound amplifying unit 140 may be adjusted according to the gain of the sound amplifying unit 110. I can. Accordingly, acoustic echo can be effectively removed and speech recognition performance can be improved.

Meanwhile, the processor 150 may determine whether clipping has occurred in the acquired voice signal, and adjust a gain of the microphone sound amplifying unit 140 based on whether clipping has occurred.

Specifically, when clipping occurs in the acquired voice signal, the processor 150 may reduce the gain of the microphone sound amplifying unit 140.

For example, when the user speaks too loudly or speaks at a location too close to the electronic device 100, a high-level voice signal may be input to the microphone 130. In this case, the voice signal When is amplified, clipping may occur for the audio signal.

Accordingly, the processor 150 may determine whether clipping has occurred in the audio signal, and if clipping occurs, the gain of the microphone sound amplifying unit 140 may be reduced. That is, when the audio signal is clipped, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 to a value smaller than the currently set value. In this case, the processor 150 may reduce the gain of the microphone sound amplifying unit 140 to the extent that clipping does not occur in the audio signal output from the microphone sound amplifying unit 140.

However, the processor 150 may maintain the gain of the microphone sound amplifying unit 140 when clipping does not occur in the acquired voice signal. That is, when the audio signal is not clipped, the processor 150 may maintain the gain of the microphone sound amplifying unit 140 at the currently set value.

In this case, the processor 150 may continuously monitor whether the audio signal is clipped, and perform the above-described operation according to whether or not the audio signal is clipped.

On the other hand, the processor 150 determines the average level of the acquired voice signal, and if the determined average level is less than or equal to a preset level, the microphone sound amplifying unit 140 so that the average level of the acquired left voice signal is higher than the preset level. The gain of can be increased.

In this case, when the voice signal is not clipped, the processor 150 may determine whether the average level of the voice signal is less than or equal to a preset level.

Accordingly, when the average level of the voice signal is higher than the preset level, the processor 150 may maintain the gain of the microphone sound amplifying unit 140 at the currently set value.

However, when the average level of the voice signal is less than or equal to the preset level, the processor 150 obtains the gain of the microphone sound amplifying unit 140 so that the average level of the audio signal output from the microphone sound amplifying unit 140 is higher than the preset level. Can increase.

That is, when the user speaks with an excessively low sound or speaks at a location that is too far from the electronic device 100, the level of the audio signal input to the microphone 130 is small, so that the corresponding audio signal is Even if amplified by 140, effective speech recognition may not be performed.

Accordingly, when the average level of the voice signal is less than or equal to a preset level, the processor 150 may increase the gain of the microphone sound amplifying unit 140 so that more accurate voice recognition can be performed.

In this case, the processor 150 may continuously monitor the average level of the voice signal and perform the above-described operation by comparing the average level with a preset level.

As described above, the processor 150 may perform acoustic echo cancellation on the signal amplified by the microphone acoustic amplification unit 140. In addition, the processor 150 may perform pre-processing for speech recognition on the speech signal.

To this end, various modules may be stored in the memory of the electronic device 100. For example, as shown in FIG. 3, the memory is an acoustic echo cancellation module 31 (Acoustic Echo Cancelation, AEC), and a sound source direction measurement. Module 32 (Sound Source Localization, SSL), Beam Forming Module 33 (Beam Forming, BF), Sound Source Separation Module 34 (Source Separation, SS), Noise Suppression and Voice Improvement Module 35 (Noise Suppression) , NS and Speech Enhancement, SE). Here, the components shown in FIG. 3 may be implemented as software to perform the functions of each component.

In this case, the processor 150 may perform acoustic echo cancellation and pre-processing through these modules.

First, the acoustic echo cancellation module 31 is a module for performing acoustic echo cancellation. Specifically, the acoustic echo cancellation module 31 sets an audio signal to be output through the speaker 120 as an echo reference, and a reference among the audio signals input through the electronic field microphone 130 through frequency analysis. A signal having a frequency characteristic similar to that of data may be determined as an echo signal, and the corresponding signal may be removed or attenuated.

The sound source direction measurement module 32 is a module for measuring a direction in which a sound source exists, that is, a direction in which a user uttering a voice exists.

In this case, the sound source direction measurement module 32 may measure the direction in which the sound source exists through various methods. For example, the sound source direction measurement module 32 may measure the direction in which the sound source exists through a method using a time difference of arrival (TDOA), or the like.

The beamforming module 33 is a module for performing beamforming of the microphone 130 according to the location of the sound source. Specifically, the beamforming module 33 may acquire only an audio signal received from a direction in which a sound source exists, and may exclude a signal received from the other direction.

The sound source separation module 34 is a module for separating a sound source. Specifically, the sound source separation module 34 may separate the sound source from the received audio signal by reversely processing the process of mixing the noise and the audio signal.

The noise suppression and speech improvement module 35 is a module for removing noise. In this case, the warning noise suppression and speech improvement module 35 may remove static noise from the sound source.

On the other hand, the above-described pre-processing step is only an example, and it goes without saying that some steps may be omitted or other steps may be additionally performed according to embodiments.

Meanwhile, the processor 150 may perform speech recognition on a preprocessed speech signal.

Specifically, the processor 150 may convert the user's voice included in the preprocessed voice signal into text, and determine an intent and an entity of the user's voice based on the voice recognition result. In addition, the processor 150 may obtain a keyword based on a result of natural language understanding, and obtain a response to a user's voice through the keyword.

For example, the processor 150 may transmit the keyword to the response providing server (10 in FIG. 1). Accordingly, the response providing server 10 may provide a response to the user's voice based on the acquired keyword. In this case, the response providing server 10 may provide a response in a text format, but this is only an example, and may provide a response in a natural language format.

The response providing server 10 may transmit a response to the user's voice to the electronic device 100.

In this case, the processor 150 may output a response. In this case, the processor 150 may process the response in natural language using a conversation system and output a natural language voice through the speaker 120. However, this is only an example, and the processor 150 may output a response through a display.

Meanwhile, in the above-described example, it has been described that voice recognition for a user's voice is performed by the electronic device 100, but this is only an example. That is, the processor 150 may transmit the preprocessed audio signal to a separate server (not shown) (eg, the response providing server 10).

In this case, the response providing server 10 may perform voice recognition based on the pre-processed audio signal, obtain a response to the user's voice, and transmit it to the electronic device 100. Accordingly, the processor 150 may output a response.

In addition, in the above-described example, it has been described that acoustic echo cancellation is performed, but this is only an example.

4 is a flowchart illustrating a method of adjusting a gain of a microphone sound amplifying unit according to an exemplary embodiment of the present disclosure.

First, the processor 150 may output an audio signal through the speaker 120 (S410).

Further, the processor 150 may receive a user's voice signal and an audio signal output from the speaker 120 through the microphone 130.

In this case, the processor 150 may determine whether a user voice including a trigger word is input through the microphone 130.

Accordingly, when a user voice including a trigger word is input (S420-Y), the processor 150 may adjust the gain of the microphone sound amplifying unit 140 (S430).

Specifically, when the gain of the sound amplifying unit 110 is greater than a preset value, the processor 150 performs a microphone sound based on the initial value so that the gain of the sound amplifying unit 110 is inversely proportional to a degree greater than a preset value. The gain of the amplifying unit 140 may be reduced. However, when the gain of the sound amplifying unit 110 is smaller than a preset value, the processor 150 may set the gain of the microphone sound amplifying unit 140 to an initial value.

Further, the processor 150 performs acoustic echo cancellation on the audio signal output from the microphone sound amplification unit 140, removes the audio signal output from the speaker 120 from the corresponding audio signal, and obtains an audio signal. I can.

Thereafter, the processor 150 may analyze the voice signal and determine an average volume and a peak volume of the voice signal (S440).

In this case, the processor 150 may determine whether clipping has occurred in the voice signal based on the peak volume (S450).

Accordingly, if it is determined that clipping has occurred in the audio signal (S450-Y), the processor 150 may reduce the gain of the microphone sound amplifying unit 140 (S460). In this case, the processor 150 may reduce the gain of the microphone sound amplifying unit 140 so that clipping of the audio signal output from the microphone sound amplifying unit 140 does not occur.

Meanwhile, if it is determined that clipping has not occurred in the voice signal (S450-N), the processor 150 may compare the average volume of the voice signal with a preset level Vth (S470).

Accordingly, the processor 150 may increase the gain of the microphone sound amplifying unit 140 when the average volume of the voice signal is less than or equal to a preset level (S480). In this case, the processor 150 may increase the gain of the microphone sound amplifying unit 140 so that the average volume of the voice signal output from the microphone sound amplifying unit 140 is higher than a preset level.

Meanwhile, when the average volume of the voice signal is greater than a preset level, the processor 150 may maintain the current gain value of the microphone sound amplifying unit 140 (S490).

5 is a block diagram illustrating a detailed configuration of an electronic device according to an embodiment of the present disclosure.

Referring to FIG. 5, the electronic device 100 includes an acoustic amplifying unit 110, a speaker 120, a microphone 130, a microphone acoustic amplifying unit 140, a processor 150, a memory 160, and a display 170. ), a communication interface 180, and an input interface 190. In this case, the processor 150 may be electrically connected to these components to control overall operations and functions of the electronic device 100. Meanwhile, the acoustic amplification unit 110, the speaker 120, the microphone 130, the microphone acoustic amplification unit 140, and the processor 150 shown in FIG. 5 have been described in FIG. 2, and overlapping portions are described. Will be omitted.

The processor 150 may include one or more of a central processing unit, an application processor, and a communication processor. The processor 150 may control the electronic device 100. For example, the processor 150 may control at least one component of the electronic device 100 and perform operations or data processing.

The memory 160 may store instructions or data related to at least one other component of the electronic device 100. The memory 160 may be implemented as a nonvolatile memory, a volatile memory, a flash memory, a hard disk drive (HDD), a solid state drive (SSD), or the like. The memory 160 is accessed by the processor 150, and data read/write/edit/delete/update by the processor 150 may be performed.

The display 170 may display various screens. For example, the display 170 may display various screens related to the operation of the electronic device 100.

In this case, the display 170 may be combined with the touch panel 191 to be implemented as a layered touch screen. The touch screen may have a function of detecting not only a display function, but also a touch input position, a touched area, as well as a touch input pressure, and also has a function of detecting not only a real-touch but also a proximity touch. I can.

The communication interface 180 is a component for performing communication with an external electronic device and a server, and may include a circuit. In this case, the communication connection of the communication interface 180 with the external electronic device may include communication through a third device (eg, a repeater, a hub, an access point, a server, or a gateway).

For example, the communication interface 180 may include components for wireless communication such as LTE and wireless fidelity (WiFi), and wired communication such as universal serial bus (USB) and high definition multimedia interface (HDMI), and the like. , The network in which wireless communication or wired communication is performed may include at least one of a telecommunication network, for example, a computer network (eg, LAN or WAN), the Internet, or a telephone network.

In addition, the communication interface 180 may provide an artificial intelligence agent service by performing communication with an external server.

The input interface 190 is configured to receive various user commands and may include a circuit. In this case, the input interface 190 may transmit the input user command to the processor 150. The input interface 190 may include, for example, a touch panel 191 or a key 192. The touch panel 191 may use at least one of, for example, a capacitive type, a pressure sensitive type, an infrared type, or an ultrasonic type. In addition, the touch panel 191 may further include a control circuit. The touch panel 191 may further include a tactile layer to provide a tactile reaction to a user. The key 192 may include, for example, a physical button, an optical key, or a keypad.

Meanwhile, according to an embodiment of the present disclosure, the processor 150 may provide a response to a user's voice.

Specifically, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 according to the gain of the sound amplifying unit 110. For example, the processor 150 may adjust the gain of the microphone sound amplifying unit 140 so as to be in inverse proportion to the gain of the sound amplifying unit 110.

Accordingly, when an audio signal such as an audio signal output from the speaker 120 and an audio signal uttered by a user is input to the microphone 130, among these signals, the audio signal is output from the speaker 120 and input to the microphone 130. Clipping of the audio signal can be prevented. Accordingly, acoustic echo cancellation for a corresponding audio signal can be effectively performed, and thus, recognition accuracy for a user's voice can be improved.

The processor 150 may perform acoustic echo cancellation by removing the audio signal output from the speaker 120 from the audio signal input to the microphone 130. Accordingly, a voice signal remains among the audio signals input to the microphone 130.

In this case, the processor 150 may determine whether clipping has occurred with respect to the voice signal, and if the clipping has occurred, the gain of the microphone sound amplifying unit 140 may be reduced.

However, when clipping has not occurred, the processor 150 may determine whether the average level of the voice signal is less than or equal to a preset level.

Accordingly, the processor 150 may increase the gain of the microphone sound amplifying unit 140 when the average level of the voice signal is less than or equal to a preset level. In this case, the processor 150 may increase the gain of the microphone sound amplifying unit 140 so that the average level is higher than a preset level.

Meanwhile, the processor 150 may maintain the gain of the microphone sound amplifying unit 140 when the average level of the voice signal is greater than or equal to a preset level.

Thereafter, the processor 150 may perform pre-processing for voice recognition on the voice signal output from the microphone sound amplifying unit 140.

For example, the processor 150 may perform preprocessing such as sound source echo measurement, beamforming, sound source separation, noise suppression, and voice improvement.

In addition, the processor 150 may perform speech recognition on the preprocessed speech signal.

In this case, the processor 150 may provide a response to the user's voice using an artificial intelligence agent program.

Specifically, the processor 150 converts the user's speech into text by performing speech recognition on the speech signal, and is necessary to determine the domain, intention, and intention of the user's speech based on the speech recognition result. A parameter (or slot), etc. can be identified. Further, the processor 150 may perform a search or the like according to the user's intention to obtain a response to the user's voice, process the obtained response in natural language, and output the result to provide a response to the user's voice.

For example, the electronic device 100 retrieves the current weather for a user's voice with an intention to inquire about the weather, obtains a response to the user's voice, processes the response in natural language, and converts the obtained natural language into text to text (TTS). Speech) may be converted into speech and output through the speaker 130 of the electronic device 100.

Accordingly, the conversation system can provide a response to the user's voice, so that the user can perform a conversation with the electronic device 100.

Meanwhile, voice recognition for the user's voice may be performed in a separate server other than the electronic device 100.

In this case, the processor 150 may transmit the preprocessed voice signal to the server through the communication interface 180. The server may perform voice recognition on a voice signal received from the electronic device 100 and provide a response to the voice signal to the electronic device 100. In this case, the server may provide a response including text, but this is only an example and may provide a response in a natural language form.

Accordingly, the processor 150 may output a response through the speaker 130. In this case, the processor 150 may process and output the response in natural language using a conversation system. However, this is only an example, and the processor 150 may display a response through the display 170.

6 is a flowchart illustrating a method of controlling an electronic device according to an exemplary embodiment of the present disclosure

First, the audio signal is amplified based on the first gain (S610). In this case, the electronic device may amplify the audio signal using the sound amplification unit. In this case, the degree to which the audio signal is amplified may be determined according to the first gain.

Thereafter, the amplified audio signal is output (S620). In this case, the electronic device may output an amplified audio signal through the speaker.

When the output audio signal and the user's voice signal are input, the input signal is amplified based on the second gain (S630).

In this case, the electronic device may receive the output audio signal and the user's voice signal through the microphone, and amplify the input signal through the microphone sound amplifier. In this case, information by which the input signal is amplified may be determined according to the second gain.

Then, acoustic echo cancellation is performed on the amplified signal to obtain a speech signal, and preprocessing for speech recognition is performed on the speech signal (S650).

Meanwhile, in step S630, a second gain may be determined based on the first gain, and an input signal may be amplified based on the determined second gain.

In this case, in step S630, the second gain may be adjusted to be inversely proportional to the first gain.

Also, in step S610, when a user command for adjusting the volume of the electronic device is received, a first gain may be adjusted based on the user command, and in step S630, a second gain may be adjusted based on the adjusted first gain.

Meanwhile, in step S630, when the first gain is increased, the second gain may be decreased.

Meanwhile, it is possible to determine whether clipping has occurred in the acquired voice signal, and adjust the second gain based on whether clipping has occurred.

Specifically, when clipping occurs in the acquired audio signal, the second gain may be reduced, and when no clipping occurs in the acquired audio signal, the second gain may be maintained.

Meanwhile, when the average level of the acquired voice signal is determined and the average level is lower than a preset level, the second gain may be increased so that the average level of the acquired voice signal is higher than the preset level.

On the other hand, a specific method of adjusting the gain of the microphone sound amplifying unit has been described above.

Meanwhile, the term "unit" or "module" used in the present disclosure includes a unit composed of hardware, software, or firmware, and may be used interchangeably with terms such as logic, logic blocks, parts, or circuits. I can. The "unit" or "module" may be an integrally configured part or a minimum unit performing one or more functions, or a part thereof. For example, the module may be configured as an application-specific integrated circuit (ASIC).

Various embodiments of the present disclosure may be implemented with software including instructions stored in a machine-readable storage medium (eg, a computer). The device receives instructions stored from the storage medium. A device capable of making a call and operating according to the called command, and may include an electronic device (eg, the electronic device 100) according to the disclosed embodiments. When the command is executed by a processor, the processor directly, Alternatively, a function corresponding to the instruction may be performed using other components under the control of the processor, and the instruction may include a code generated or executed by a compiler or an interpreter. , May be provided in the form of a non-transitory storage medium, where "non-transitory" means that the storage medium does not contain a signal and is tangible. It does not distinguish between being stored semi-permanently or temporarily.

According to an example, a method according to various embodiments disclosed in the present disclosure may be provided by being included in a computer program product. Computer program products can be traded between sellers and buyers as commodities. The computer program product may be distributed in the form of a device-readable storage medium (eg, compact disc read only memory (CD-ROM)) or online through an application store. In the case of online distribution, at least some of the computer program products may be temporarily stored or temporarily generated in a storage medium such as a server of a manufacturer, a server of an application store, or a memory of a relay server.

Each of the constituent elements (eg, modules or programs) according to various embodiments may be composed of a singular or plural entity, and some sub-elements of the aforementioned sub-elements are omitted, or other sub-elements are It may be further included in various embodiments. Alternatively or additionally, some constituent elements (eg, a module or a program) may be integrated into a single entity, and functions performed by each corresponding constituent element prior to the consolidation may be performed identically or similarly. Operations performed by modules, programs, or other components according to various embodiments are sequentially, parallel, repetitively or heuristically executed, at least some operations are executed in a different order, omitted, or other operations are added. Can be.

Claims

In the electronic device,

Sound amplification unit;

speaker;

MIC;

Microphone sound amplification unit; And

When an audio signal is amplified through the sound amplifying unit, the amplified audio signal is output through the speaker, and an audio signal output through the speaker and a user's voice signal are input to the microphone, a signal input to the microphone Is amplified through the microphone sound amplifying unit, acoustic echo cancellation (AEC) is performed on the amplified signal to obtain the speech signal, and preprocessing for speech recognition is performed on the obtained speech signal Including;

The processor,

An electronic device for determining a gain of the microphone sound amplifying unit based on the gain of the sound amplifying unit and amplifying the input signal through the microphone sound amplifying unit based on the determined gain.
The method of claim 1,

The processor,

An electronic device that adjusts the gain of the microphone sound amplifying unit to be in inverse proportion to the gain of the sound amplifying unit.
The method of claim 2,

The processor,

When a user command for adjusting the volume of the electronic device is received, the gain of the sound amplifying part is adjusted based on the user command, and the gain of the microphone sound amplifying part is adjusted based on the adjusted gain of the sound amplifying part. Device.
The method of claim 3,

The processor,

When the gain of the sound amplifying part is increased, an electronic device for reducing the gain of the microphone sound amplifying part.
The method of claim 1,

The processor,

An electronic device that determines whether clipping has occurred in the acquired voice signal and adjusts a gain of the microphone sound amplifying unit based on whether the clipping has occurred.
The method of claim 5,

The processor,

An electronic device configured to reduce the gain of the microphone sound amplifying unit when clipping occurs in the acquired audio signal, and maintain the gain of the microphone sound amplifying unit when clipping does not occur in the acquired audio signal.
The method of claim 1,

The processor,

An electronic device configured to determine an average level of the acquired voice signal and, when the average level is less than or equal to a preset level, increase the gain of the microphone sound amplifying unit such that the average level of the acquired voice signal is higher than the preset level.
In the control method of an electronic device,

Amplifying the audio signal based on the first gain;

Outputting the amplified audio signal;

Amplifying the input signal based on a second gain when the output audio signal and the user's voice signal are input; And

Acquiring a speech signal by performing Acoustic Echo Cancellation (AEC) on the amplified signal, and performing pre-processing for speech recognition on the speech signal; including,

Amplifying the input signal,

A control method of determining the second gain based on the first gain and amplifying the input signal based on the determined second gain.
The method of claim 8,

In the step of amplifying the input signal, the control method of adjusting the second gain so as to be inversely proportional to the first gain.
The method of claim 9,

In the amplifying the audio signal, when a user command for adjusting the volume of the electronic device is received, adjusting the first gain based on the user command,

The step of amplifying the input signal may include adjusting the second gain based on the adjusted first gain.
The method of claim 10,

In the amplifying the input signal, when the first gain is increased, the second gain is decreased.
The method of claim 8,

Determining whether clipping has occurred in the acquired voice signal, and adjusting the second gain based on whether the clipping has occurred.
The method of claim 12,

The adjusting step,

A control method of reducing the second gain when clipping occurs in the acquired audio signal, and maintaining the second gain when clipping does not occur in the acquired audio signal.
The method of claim 8,

Determining an average level of the acquired voice signal, and when the average level is less than or equal to a preset level, increasing the second gain so that the average level of the acquired voice signal is higher than the preset level; How to control.