WO2022139177A1

WO2022139177A1 - Electronic device and control method thereof

Info

Publication number: WO2022139177A1
Application number: PCT/KR2021/016653
Authority: WO
Inventors: 박민규; 이형선; 김호연
Original assignee: 삼성전자주식회사
Priority date: 2020-12-24
Filing date: 2021-11-15
Publication date: 2022-06-30

Abstract

An electronic device and a control method are disclosed. The electronic device comprises a microphone and a processor, and the processor obtains multiple similarities between the phase of an external sound signal input to the microphone and converted to have a frequency band and the phase of a reference sound signal converted to have a frequency band on the basis of one sound signal of the two sound signals, identifies a delay time of the external sound signal on the basis of the reliability of the obtained multiple similarities, and performs compensation for the time delay between the external sound signal and the reference sound signal on the basis of the identified delay time.

Description

Electronic device and its control method

The present disclosure relates to an electronic device and a control method thereof, and more particularly, to an electronic device for performing synchronization between an output sound signal and a sound signal input to a microphone, and a control method thereof.

Recently, electronic devices include a speaker for outputting a sound signal of content, an input/output interface for connecting an external speaker, and a microphone for receiving a user's voice command. Accordingly, an echo phenomenon in which a sound signal output to a speaker is input through a microphone occurs. The electronic device removes an echo component by applying an acoustic echo cancellation (AEC) algorithm. However, in order to apply the AEC algorithm, synchronization between the sound signal input to the microphone and the sound signal output to the speaker is absolutely necessary.

When the electronic device is a TV, the microphone is fixed at a specific position on the TV. On the other hand, the speaker may be output as an internal speaker or an external speaker according to a user setting. In addition, the TV and the external speaker may be connected in a way such as HDMI, Wi-Fi, Bluetooth, Aux.

When the sound signal of the content is output to the internal speaker, since the interval between the internal speaker and the microphone is fixed, a certain time difference occurs until the output signal of the speaker is input into the microphone. However, when the sound signal of the content is output to the external speaker, the distance between the external speaker and the microphone may be changed, so that various time differences may occur until the output signal of the speaker is input into the microphone. Also, depending on the connection method between the TV and the external speaker, a sample unit difference may occur in the digital audio signal due to the influence of a network environment, etc. FIG. In addition, a sample drift phenomenon may occur due to a difference between the processor clock frequency of the TV and the external speaker. As such, when the TV and an external speaker are connected for various reasons, a time difference may continuously occur between the speaker output signal and the microphone input signal.

When the output signal of the content is output to the internal speaker, the manufacturer can solve the problem of time delay between the internal speaker output signal and the microphone input signal by applying an algorithm that compensates for a certain time. However, when an output signal of content is output to an external speaker, synchronization between the speaker output signal and the microphone input signal is very difficult. The existing TV voice recognition service also performs synchronization only when a sound signal is output through an internal speaker.

Accordingly, there is a need for a technique capable of performing synchronization between an output sound signal and an input sound signal even if the sound signal is output to either an internal speaker or an external speaker.

The present disclosure is to solve the above-described problems, and an object of the present disclosure is to synchronize a sound signal output to a speaker and a sound signal input to a microphone to remove an echo component, regardless of the position of the speaker, thereby improving voice recognition performance. To provide an apparatus and a method for controlling the same.

An electronic device according to an embodiment of the present disclosure includes a microphone and a processor, wherein the processor based on one of an external sound signal input into the microphone converted into a frequency band and a reference sound signal converted into a frequency band Acquire a plurality of similarities between phases of two sound signals, identify delay times of the external sound signals based on reliability of the obtained plurality of similarities, and based on the identified delay times Time delay compensation is performed between the external sound signal and the reference sound signal.

A method of controlling an electronic device according to an embodiment of the present disclosure provides a plurality of phases between two sound signals based on one of an external sound signal converted to a frequency band and a reference sound signal converted to a frequency band. obtaining a degree of similarity of , identifying a delay time of the external sound signal based on reliability of the plurality of obtained degrees of similarity, and based on the identified delay time, the external sound signal and the reference sound signal and performing time delay compensation.

1 is a diagram illustrating an electronic device connected to an external speaker according to an embodiment of the present disclosure.

2 is a block diagram illustrating a configuration of an electronic device according to an embodiment of the present disclosure.

3 is a block diagram illustrating a detailed configuration of an electronic device according to an embodiment of the present disclosure.

4 is a view for explaining a process of removing an echo component according to an embodiment of the present disclosure.

5 is an embodiment of determining reliability of a time delay when phase information is insufficient in a sound signal input into a microphone.

6 is an embodiment of determining reliability of a time delay for a sound signal inputted after being echoed.

7 is an embodiment of determining reliability of a time delay with respect to a sound signal input with external noise.

8 is a view for explaining an operation of an electronic device according to an embodiment of the present disclosure.

9 is a flowchart illustrating a method of controlling an electronic device according to an embodiment of the present disclosure.

10 is a flowchart illustrating a specific electronic device control process according to an embodiment of the present disclosure.

Hereinafter, various embodiments will be described in more detail with reference to the accompanying drawings. The embodiments described herein may be variously modified. Certain embodiments may be depicted in the drawings and described in detail in the detailed description. However, the specific embodiments disclosed in the accompanying drawings are only provided to facilitate understanding of the various embodiments. Accordingly, the technical spirit is not limited by the specific embodiments disclosed in the accompanying drawings, and it should be understood to include all equivalents or substitutes included in the spirit and scope of the disclosure.

Terms including an ordinal number such as 1st, 2nd, etc. may be used to describe various components, but these components are not limited by the above-mentioned terms. The above terminology is used only for the purpose of distinguishing one component from another component.

In this specification, terms such as "comprises" or "have" are intended to designate that the features, numbers, steps, operations, components, parts, or combinations thereof described in the specification exist, but one or more other features It should be understood that this does not preclude the existence or addition of numbers, steps, operations, components, parts, or combinations thereof. When an element is referred to as being “connected” or “connected” to another element, it is understood that it may be directly connected or connected to the other element, but other elements may exist in between. it should be On the other hand, when it is said that a certain element is "directly connected" or "directly connected" to another element, it should be understood that the other element does not exist in the middle.

Meanwhile, as used herein, a “module” or “unit” for a component performs at least one function or operation. In addition, a “module” or “unit” may perform a function or operation by hardware, software, or a combination of hardware and software. In addition, a plurality of “modules” or a plurality of “units” other than a “module” or “unit” to be performed in specific hardware or to be executed in at least one processor may be integrated into at least one module. The singular expression includes the plural expression unless the context clearly dictates otherwise.

In the description of the present disclosure, the order of each step should be understood as non-limiting unless the preceding step must be logically and temporally performed before the subsequent step. In other words, except for the above exceptional cases, even if the process described as the subsequent step is performed before the process described as the preceding step, the essence of the disclosure is not affected, and the scope of rights should also be defined regardless of the order of the steps. And, in the present specification, "A or B" is defined as meaning not only selectively pointing to any one of A and B, but also including both A and B. In addition, in the present specification, the term "comprising" has the meaning of encompassing the inclusion of other components in addition to the elements listed as being included.

In this specification, only essential components necessary for the description of the present disclosure are described, and components not related to the essence of the present disclosure are not mentioned. And it should not be construed in an exclusive meaning including only the mentioned components, but should be interpreted in a non-exclusive meaning that may also include other components.

In addition, in describing the present disclosure, if it is determined that a detailed description of a related known function or configuration may unnecessarily obscure the subject matter of the present disclosure, the detailed description thereof will be abbreviated or omitted. Meanwhile, each embodiment may be implemented or operated independently, but each embodiment may be implemented or operated in combination.

Referring to FIG. 1 , an electronic device 100 and an external speaker 10 are illustrated. For example, the electronic device 100 may include a digital TV, a desktop computer, a laptop computer, a smart phone, a tablet PC, a navigation system, a slate PC, a wearable device, a set-top box, a kiosk, and the like. That is, the electronic device 100 may include an internal display such as a TV or a laptop computer, but may include a device that transmits an image signal to an external display device such as a desktop computer or a set-top box. Also, in the case of the electronic device 100 including a display, an image signal may be transmitted to the external display device by being connected to the external display device using various wired/wireless methods.

The electronic device 100 may include a microphone and an internal (or built-in) speaker. Even if the electronic device 100 includes an internal speaker, it may be connected to the external speaker 10 to output a sound signal through the external speaker 10 . Although an embodiment in which the electronic device 100 is wirelessly connected to the external speaker 10 is illustrated in FIG. 1 , the electronic device 100 may be connected to the external speaker 10 in various wired and wireless methods. For example, the electronic device 100 may be connected to the external speaker 10 in a manner such as HDMI, Wi-Fi, Bluetooth, or Aux.

When the electronic device 100 is connected to the external speaker 10 , a sound signal may be output through the external speaker 10 . A sound signal output through the external speaker 10 may be input into a microphone of the electronic device 100 . The sound signal input into the microphone may be an echo component. The echo component input to the electronic device 100 may interfere with recognizing the voice input through the microphone by the electronic device 100 . As an embodiment, the electronic device 100 may perform a voice recognition function. The electronic device 100 may recognize the user's voice input through the microphone and perform an operation corresponding to the recognized voice. When the electronic device 100 receives a user's voice, as described above, an echo component (or a sound signal output through a speaker) may be input together. The electronic device 100 may not accurately recognize the user's voice due to the input echo component. Accordingly, the electronic device 100 should include a function of removing an echo component.

The algorithm for canceling the echo component may be referred to as Acoustic Echo Cancellation (AEC). In order to apply AEC in the electronic device 100 , synchronization between a sound signal input to a microphone and a sound signal output to a speaker is essential.

The positions of the microphone and the internal speaker of the electronic device 100 are fixed. Accordingly, when the electronic device 100 outputs a sound signal through the internal speaker, a predetermined time difference may occur between the sound signal input into the microphone and the sound signal output through the speaker. Since a certain time difference is generated between the input sound signal and the output sound signal, synchronization between the input sound signal and the output sound signal is possible.

However, when the electronic device 100 outputs a sound signal through the external speaker 10 , the position of the external speaker 10 is variable, and the input sound signal and the output sound according to a communication method connected to the external speaker 10 . The time difference between signals may be variously changed.

The present disclosure describes a method and apparatus for synchronizing an output sound signal with an input sound signal. The electronic device 100 receives the sound signal output through the speaker. For example, the speaker may be the external speaker 10 . The electronic device 100 converts the external sound signal input into the microphone and the reference sound signal into a frequency band. For example, the reference sound signal may be an output sound signal. The electronic device 100 shifts the other sound signal based on one of the converted external sound signal and the reference sound signal. The electronic device 100 acquires a similarity between phases of two sound signals by shifting another sound signal based on one sound signal. For example, the electronic device 100 may obtain similarity by performing a convolution operation on two sound signals. The electronic device 100 may shift one sound signal to a plurality of different times and acquire a plurality of similarities according to each time. Meanwhile, the electronic device 100 may obtain a preset value by applying a preset weight to a similarity in which the largest value among the plurality of acquired similarities appears. For example, the preset weight may be a value between 0 and 1. The electronic device 100 estimates a time difference at which a similarity greater than or equal to a preset value among the plurality of obtained similarities appears as a candidate group of the time difference between the external sound signal and the reference sound signal, and determines reliability.

As an embodiment, the electronic device 100 determines that the reliability of the estimated candidate group is not recognized when the interval between the time difference at which the maximum similarity value appears and the time difference at which the minimum similarity value appears among the estimated candidate groups exceeds a preset value. can judge In this case, the electronic device 100 may ignore the data of the time difference of the estimated candidate group. Alternatively, when a degree of similarity equal to or greater than a preset value appears periodically among the estimated candidate groups, the electronic device 100 may identify a time difference at which the degree of similarity equal to or greater than a preset value first appears as a time difference for which reliability is recognized. Alternatively, when the degree of similarity equal to or greater than a preset value repeatedly appears, the electronic device 100 may estimate a time difference at which the greatest similarity appears as a delay time, and obtain the variance of the estimated delay times for each of a plurality of preset time units. have. When the obtained variance is equal to or less than a preset value, the electronic device 100 may identify a time difference for which reliability is recognized. If the obtained variance exceeds a preset value, the electronic device 100 may determine that the reliability of the estimated delay time is not recognized and ignore the data of the estimated delay time. The electronic device 100 may repeat the above-described process at regular time intervals. For example, the electronic device 100 may repeat the above-described process in units of blocks of a sound signal with an interval of 1 second. Accordingly, if the electronic device 100 determines that the reliability of the time difference is not recognized in one block, it may identify the time difference in which the reliability is recognized by repeating the above-described process in the next block.

When reliability is recognized, the electronic device 100 identifies a time difference between one of the candidate groups as a delay time of an external sound signal. Then, the electronic device 100 performs time delay compensation between the external sound signal and the reference sound signal based on the identified delay time. That is, the electronic device 100 may perform synchronization between the external sound signal input into the microphone and the reference sound signal and execute an algorithm for removing the echo component.

Hereinafter, the configuration of the electronic device 100 will be described.

Referring to FIG. 2 , the electronic device 100 includes a microphone 110 and a processor 120 .

The microphone 110 receives the sound signal output through the speaker. For example, the speaker may be an internal speaker of the electronic device 100 or an external speaker separate from the electronic device 100 . Also, the microphone 110 may receive a user's voice. The processor 120 may recognize a control command based on the input voice and perform a control operation corresponding to the recognized control command.

The processor 120 controls each configuration of the electronic device 100 . In addition, the processor 120 may perform synchronization by acquiring a time difference between the sound signal input into the microphone and the sound signal output through the speaker. The processor 120 may perform an echo cancellation algorithm for removing an echo component after performing synchronization between the input sound signal and the output sound signal. Specifically, the processor 120 converts the sound signal input into the microphone and the output sound signal into a frequency band. Since the output sound signal is a basic sound signal output from the electronic device 100 , it may be referred to as a reference sound signal. That is, the electronic device 100 converts the input sound signal and the reference sound signal into a frequency band.

The processor 120 acquires a similarity between the phases of the two sound signals by shifting the other sound signal based on one of the converted input sound signal and the reference sound signal. For example, the processor 120 may shift the input sound signal with respect to the reference sound signal. The processor 120 may perform the above-described process in units of blocks of the sound signal. In addition, the processor 120 may sample the input sound signal and the reference sound signal, and may shift the sample in units of samples. As an embodiment, the block unit of the sound signal may be a sound signal for 1 second. In addition, the processor 120 may sample in units of 62.5 us. In this case, the processor 120 may acquire 16000 samples of the input sound signal and the reference sound signal, respectively. In addition, the processor 120 may shift the input sound signal by one sample. The processor 120 may acquire the similarity between the input sound signal and the reference sound signal every time one sample is shifted. For example, the processor 120 may obtain the similarity by performing a convolution operation on the input sound signal and the reference sound signal. Accordingly, the processor 120 may acquire a plurality of similarities. In addition, the time difference between the respective similarities may be 62.5 us. That is, when shifted by one sample interval, the time difference may be 62.5 us, and when shifted by two sample intervals, the time difference may be 125 us.

The processor 120 may obtain a preset value by applying a preset weight to the similarity in which the largest value appears among the plurality of acquired similarities. For example, the preset weight may be a value between 0 and 1. The processor 120 applies a weight to the similarity of the largest value, but since the weight is between 0 and 1, the weighted similarity may be smaller than the original value. Accordingly, a similarity of a value greater than a preset value to which a weight is applied to the similarity may exist. The processor 120 estimates a time difference at which a similarity greater than or equal to a preset value among the plurality of obtained similarities appears as a candidate group of the time difference between the external sound signal and the reference sound signal, and determines reliability.

The input sound signal may be input in various forms according to the external environment. Accordingly, the processor 120 may determine reliability according to characteristics of input sound signals input in various forms. As an embodiment, the processor 120 determines that the reliability of the estimated candidate group is not recognized when the interval between the time difference at which the maximum similarity appears and the time difference at which the minimum similarity appears among the estimated candidate groups exceeds a preset value. can do. In this case, the processor 120 may ignore the data of the time difference of the estimated candidate group. Alternatively, when a degree of similarity equal to or greater than a preset value periodically appears among the estimated candidate groups, the processor 120 may identify a time difference at which the degree of similarity equal to or greater than a preset value first appears as a time difference for which reliability is recognized. Alternatively, when a degree of similarity equal to or greater than a preset value repeatedly appears, the processor 120 may estimate a time difference at which the greatest similarity appears as a delay time, and obtain a variance of the estimated delay time for each of a plurality of preset time units. . When the obtained variance is less than or equal to a preset value, the processor 120 may identify a time difference for which reliability is recognized. If the obtained variance exceeds a preset value, the processor 120 may determine that the reliability of the estimated delay time is not recognized and ignore the data of the estimated delay time. The processor 120 may repeat the above-described process in units of blocks of the sound signal. Accordingly, if the electronic device 100 determines that the reliability of the time difference is not recognized in one block, it may identify the time difference in which the reliability is recognized by repeating the above-described process in the next block.

If the reliability is recognized, the processor 120 identifies a time difference between one of the candidate groups as a delay time of the external sound signal. Then, the processor 120 may cancel the echo component based on the identified delay time.

Referring to FIG. 3 , the electronic device 100a includes a microphone 110 , a processor 120 , an input interface 130 , a communication interface 140 , a camera 150 , a sensor 160 , a display 170 , and a speaker. 180 and a memory 190 . Since the microphone 110 is the same as described in FIG. 2 , a detailed description thereof will be omitted.

The input interface 130 may receive a command input from the user. Alternatively, the input interface 130 may receive or output data including an input/output port. For example, the input interface 130 may be connected to an external speaker and output a sound signal to the external speaker. When the input interface 130 includes an input/output port, the input/output port is HDMI (High-Definition Multimedia Interface), DP (DisplayPort), RGB, DVI (Digital Visual Interface), USB (Universal Serial Bus), Thunderbolt, LAN , and may include ports such as AUX. The input interface 130 performs a function of receiving a command or data from the outside, and may be referred to as an input unit, an input module, or the like. When the input interface 130 performs an input/output function, it may be referred to as an input/output unit, an input/output module, or the like.

The communication interface 140 may communicate with an external device. The communication interface 140 may transmit/receive data to and from an external device using a wired/wireless communication method. For example, the communication interface 140 may include 3G, Long Term Evolution (LTE), 5G, Wi-Fi, Bluetooth, Digital Multimedia Broadcasting (DMB), Advanced Television Systems Committee (ATSC), Digital Video Broadcasting (DVB), and Local Area Network (LAN). It may include a module capable of performing communication in a manner such as an area network). The communication interface 140 performing communication with an external device may be referred to as a communication unit, a communication module, a transceiver, or the like. The communication interface 140 may receive content or data from an external device.

The camera 150 may photograph the surrounding environment of the electronic device 100a. Alternatively, the camera 150 may photograph the user's facial expression or motion. The processor 120 may recognize a control command based on the captured user's facial expression or motion, and perform a control operation corresponding to the recognized control command. For example, the camera 150 may include a CCD sensor or a CMOS sensor. Also, the camera 140 may include an RGB camera and a depth camera.

The sensor 160 may detect an object around the electronic device 100a. The processor 120 may recognize a control command based on the sensed signal and perform a control operation corresponding to the recognized control command. In addition, the sensor 160 may sense surrounding environment information of the electronic device 100a. The processor 120 may perform a corresponding control operation based on the surrounding environment information sensed by the sensor 160 . For example, the sensor 160 may include an acceleration sensor, a gravity sensor, a gyro sensor, a geomagnetic sensor, a direction sensor, a motion recognition sensor, a proximity sensor, a voltmeter, an ammeter, a barometer, a hygrometer, a thermometer, an illuminance sensor, a heat sensor, and a touch sensor. , an infrared sensor, an ultrasonic sensor, and the like.

The display 170 may output the data processed by the processor 120 as an image. For example, the display 170 may be implemented as a liquid crystal display (LCD), an organic light emitting diode (OLED), a quantum dot light emitting diode (QLED), a micro LED, a flexible display, a touch screen, etc. have. Meanwhile, the display 170 may be classified into an AM (Active Matrix) (e.g. AM-OLED) method or a PM (Passive Matrix) (e.g. PM-OLED) method according to a driving method.

When the display 170 is implemented as a touch screen, the electronic device 100a may receive a control command through the touch screen.

The speaker 180 (internal speaker) outputs an audio signal on which audio processing has been performed. Although the present disclosure describes an embodiment in which the electronic device 100a and an external speaker are connected, the present disclosure may also be applied to outputting a sound signal through the internal speaker. Meanwhile, the speaker 180 may output a user's input command, status-related information or operation-related information of the electronic device 100a as a voice or a notification sound.

The memory 190 may store data, algorithms, etc. that perform a function of the electronic device 100a , and may store programs and commands driven by the electronic device 100a . The algorithm stored in the memory 190 may be loaded into the processor 120 under the control of the processor 120 to identify a delay time between the input sound signal and the reference sound signal, and to remove the echo component. For example, the memory 190 may be implemented as a type of ROM, RAM, HDD, SSD, memory card, or the like.

The electronic device 100a may include all of the above-described components or may include only some components. In addition, the electronic device 100a may further include other components that perform various functions in addition to the above-described components. So far, the configuration of the electronic device 100a has been described. Hereinafter, a method of determining the reliability of the estimated delay time will be described.

Referring to FIG. 4 , the processor may receive a sound signal input through a microphone and a reference sound signal ( S410 and S420 ). The processor may estimate the delay time based on the input sound signal and the reference sound signal (S430). Specifically, the processor may convert the input sound signal and the reference sound signal into a frequency band. In addition, the processor may obtain similarity by shifting another sound signal based on one sound signal. The processor may obtain a preset value by applying a weight to the degree of similarity in which the largest value appears. For example, the weight may be a value between 0 and 1. The processor may estimate a similarity of a value greater than the obtained preset value as a candidate group for the time difference. The processor may convert to a time band for the candidate group and identify peak values. The processor may estimate a time difference of one of the candidate groups as a delay time between the input sound signal and the output sound signal, and determine reliability of the estimated delay time. If the processor determines that the reliability of the estimated delay time is not acceptable, it may ignore the data. On the other hand, if the processor determines that reliability of the estimated delay time is recognized, it may store the value in the buffer of the delay time compensation module (S440).

The processor may perform an echo component cancellation process based on the delay time for which reliability is recognized ( S450 ).

The sound signal input to the microphone may not include enough phase information to calculate the delay time. For example, when the output volume of the speaker is small or the external noise is greater than the output of the speaker, sufficient phase information may not be included in the input sound signal. If the input sound signal does not have sufficient phase information, the highest similarity may not represent a high value compared to other similarities. Accordingly, when a preset value is obtained by applying the above-described weight to the similarity of the largest value, a greater degree of similarity than the preset value may appear. That is, there is no clear similarity between the input sound signal and the reference sound signal, and a similarity greater than a preset value may frequently appear as shown in FIG. 5 . In this case, the interval between the time difference at which the maximum degree of similarity appears and the time difference at which the minimum degree of similarity appears among similarities greater than or equal to a preset value may be very wide. Accordingly, when the interval between the time difference at which the maximum degree of similarity appears and the time difference at which the minimum level of similarity appears exceeds a preset value, the electronic device determines that sufficient phase information is not included in the input sound signal, and the estimated delay time can be judged to be unreliable. The electronic device may ignore data determined to be unreliable.

As described above, the electronic device may estimate the delay time in units of blocks of a predetermined sound signal and determine reliability. Accordingly, even if the electronic device determines that there is no reliability in the corresponding block and ignores the data, the electronic device may determine the reliability in the next block and calculate the delay time.

When the electronic device is located in the interior space, a sound signal output from the speaker may be reflected on a wall or the like and input into a microphone. That is, when the electronic device is located in a space having a large reverberation effect, the sound signal output to the speaker may be periodically input through the microphone. When the similarity is significantly measured by the reflected input sound signal, the electronic device may estimate an erroneous delay time. Accordingly, in the present disclosure, when the echo input sound signal is input based on the input sound shape, the initial input sound signal may be identified as a delay time.

The electronic device may obtain a preset value in the same manner as described above, and identify a time difference in which a similarity greater than the preset value appears as a candidate group for delay time. The echoed sound signal may appear with a high degree of similarity at regular intervals. Accordingly, when a similarity greater than a preset value periodically appears, as shown in FIG. 6 , the electronic device may identify a time difference τ at which the earliest similarity appears as a delay time.

When periodic external noise is included with the sound signal through the microphone, the calculated similarity may be increased according to the period of the noise. However, the similarity increased due to noise is not an accurate time delay. Accordingly, the electronic device may calculate the variance of the delay times measured in each sound block. And, when the calculated variance is equal to or less than a certain value, the electronic device may determine the delay time between the input sound signal and the reference sound signal. That is, the electronic device may recognize reliability of the estimated delay time only when the delay times between the input sound signal and the reference sound signal converge.

6 and 7 are different in whether the input sound signal is periodic with respect to the reference sound signal. For example, in the case of FIG. 6 , the echoed sound signal may also be periodically input into the microphone.

However, since the reflected sound signal is periodically input according to the output sound time point and path, a high similarity may appear periodically. As an embodiment, if the output sound signal is output at 0 seconds, 1 second, and 2 seconds, and the echo sound signal is input to the microphone after 0.7 seconds, the similarity of a large value every 0.7 seconds, 1.7 seconds, and 2.7 seconds of the reference sound signal may appear That is, in the case of FIG. 6 , a large degree of similarity may appear at a constant period based on the reference sound signal. Accordingly, the electronic device may identify a time difference in which a similarity greater than a preset value appears as a delay time.

On the other hand, the external sound signal may appear independent of the reference sound signal. As an embodiment, if the output sound signal is output at 0 sec, 1 sec, and 2 sec, and periodic external noise is input into the microphone every 0.7 sec, the similarity of a large value every 0.7 sec, 1.4 sec, and 2.1 sec of the reference sound signal may appear. That is, in the case of FIG. 7 , a similarity of a large value may appear aperiodically with respect to the reference sound signal. Accordingly, when a similarity equal to or greater than a preset value is repeatedly displayed, the electronic device may estimate a time difference at which the greatest similarity appears as a delay time, and may obtain a variance of the estimated delay time for each of a plurality of preset time units. And, when the obtained variance is less than or equal to a preset value, the electronic device may identify a time difference for which reliability is recognized. Alternatively, when the obtained variance exceeds a preset value, the electronic device may determine that the reliability of the estimated delay time is not recognized and ignore the data of the estimated delay time. Through the above-described process, the electronic device may acquire an accurate delay time regardless of a connection method or location of an external speaker.

Referring to FIG. 8 , the electronic device 100 connected to the external speaker 10 is illustrated. The electronic device 100 may output a sound signal through the external speaker 10 . The sound signal output to the external speaker 10 may be input to the microphone of the electronic device 100 . The electronic device 100 may identify a delay time between the reference sound signal and the input sound signal. The electronic device 100 may acquire a plurality of similarities between phases of the two sound signals by shifting the other sound signal based on one of the reference sound signal and the input sound signal. The electronic device 100 may estimate a time difference at which a similarity greater than or equal to a preset value among the plurality of obtained similarities appears as a candidate group of the time difference between the input sound signal and the reference sound signal. The electronic device 100 may determine the reliability of the estimated candidate group, and when the reliability is recognized, the electronic device 100 may identify a time difference between one of the candidate groups as a delay time.

If the input sound signal does not sufficiently include phase information, a similarity greater than a preset value may frequently appear. When the interval between the time difference at which the maximum degree of similarity appears and the time difference at which the minimum degree of similarity appears among the similarities greater than or equal to a preset value exceeds a preset value, the electronic device 100 determines that there is no reliability and collects data related to the time difference. can be ignored

As shown in FIG. 8 , when a sound signal reflected through a microphone is input, the input sound signal may exhibit periodicity based on the reference sound signal. The electronic device may identify a time difference at which a similarity greater than a preset value first appears as a delay time.

Alternatively, when periodic noise is included with the input sound signal, a similarity greater than or equal to a preset value may repeatedly appear. The electronic device 100 may estimate a time difference in which the greatest similarity occurs as a delay time, and may obtain a variance of the estimated delay times for each block of a plurality of sound signals. When it is determined that the estimated delay time converges, the electronic device 100 may determine a time difference for which reliability is recognized and obtain the delay time. Alternatively, when determining that the estimated delay time diverges, the electronic device 100 may determine that reliability of the estimated delay time is not recognized and ignore data related to the estimated time difference. The electronic device 100 may perform time delay compensation between the reference sound signal and the input sound signal based on the identified delay time.

The electronic device 100 may receive a voice command from the user (S810). The electronic device 100 may remove the input sound signal (echo component) through the above-described process (S820). That is, since the electronic device 100 receives the user's voice signal and the input sound signal and removes the input sound signal, the electronic device 100 may accurately recognize the user's voice command. The electronic device 100 may perform an operation corresponding to the recognized user's voice command.

So far, various examples of removing the echo component have been described. Hereinafter, a method of controlling the electronic device will be described.

Referring to FIG. 9 , the electronic device acquires a plurality of similarities between phases of two sound signals based on one of an external sound signal converted to a frequency band and a reference sound signal converted to a frequency band ( S910 ) . For example, the electronic device may sample the converted external sound signal and the converted reference sound signal. In addition, the electronic device may shift the other sound signal based on one of the converted external sound signal and the converted reference sound signal. The electronic device may obtain a similarity between phases of the two sound signals by performing a convolution operation on the two sound signals. The electronic device may shift one sound signal to a plurality of different times and acquire a plurality of similarities according to each time.

The electronic device identifies the delay time of the external sound signal based on the obtained reliability of the plurality of similarities ( S920 ). For example, the electronic device may estimate a time difference at which a similarity greater than or equal to a preset value among the plurality of obtained similarities appears as a candidate group of the time difference between the external sound signal and the reference sound signal. The electronic device may acquire a preset value by applying a preset weight to a similarity in which the largest value appears among the plurality of acquired similarities. As an embodiment, the preset weight may be a value between 0 and 1.

The electronic device may determine the reliability of the estimated candidate group. For example, the electronic device may determine an interval between a time difference in which a maximum value of similarity appears and a time difference in which a minimum value of similarity appears among the estimated candidate groups. When the determined interval exceeds a preset value, the electronic device may determine that the reliability of the estimated candidate group is not recognized, and ignore data of the time difference of the estimated candidate group. Alternatively, when the degree of similarity equal to or greater than a preset value periodically appears among the estimated candidate groups, the electronic device may identify a time difference at which the degree of similarity equal to or greater than the preset value first appears as a time difference for which reliability is recognized.

Alternatively, the electronic device may acquire a plurality of similarities based on a preset time unit. For example, the preset time unit may be a block of a sound signal of a constant time. In addition, when the degree of similarity equal to or greater than a preset value repeatedly appears, the electronic device may estimate a time difference at which the greatest degree of similarity appears as a delay time. In addition, the electronic device may obtain a variance of the estimated delay time for each of a plurality of preset time units. When the obtained variance is less than or equal to a preset value, the electronic device may identify a time difference for which reliability is recognized. On the other hand, when the obtained variance exceeds a preset value, the electronic device may determine that the reliability of the estimated delay time is not recognized and ignore the data of the estimated delay time.

Through the above-described process, the electronic device may identify a delay time for which reliability is recognized. The electronic device performs time delay compensation between the external sound signal and the reference sound signal based on the identified delay time (S930).

Referring to FIG. 10 , the electronic device may receive an external sound signal through a microphone ( S1010 ). The external sound signal input through the microphone may be a sound signal output through the speaker. The electronic device may convert the external sound signal and the reference sound signal into a frequency band for each preset time unit (S1020). The reference sound signal may be a sound signal output from the electronic device. That is, the external sound signal and the reference sound signal may be the same sound signal having a difference in magnitude or time. As an embodiment, the electronic device may convert the external sound signal and the reference sound signal into a frequency band in units of blocks of the sound signal. And, the block of the sound signal may be set at an interval of 1 second.

The electronic device may acquire a plurality of similarities between the phases of the two sound signals converted to the frequency band ( S1030 ). The electronic device may obtain similarity by shifting one sound signal and convolving the two sound signals. As described above, since two sound signals are the same signal having a time difference, when the time difference is 0, the two sound signals may have the greatest similarity.

The electronic device may estimate a time difference at which a similarity greater than or equal to a preset value among the plurality of obtained similarities appears as a candidate group of the time difference between the external sound signal and the reference sound signal (S1040), and determine the reliability of the estimated candidate group (S1050) ). As described above, the electronic device may determine reliability based on the interval between the time difference at which the maximum similarity appears and the time difference at which the minimum similarity appears. Alternatively, the electronic device may determine reliability based on the periodicity or repeatability of a similarity greater than or equal to a preset value among the estimated candidate groups.

If it is determined that the reliability is not recognized, the electronic device may ignore (or remove) the data of the estimated candidate group (S1060). The electronic device may perform the delay time identification process by repeating the above-described process for the sound signal of the next preset time unit.

When the electronic device determines that reliability is recognized, the electronic device identifies a time difference of one of the candidate groups as the delay time of the external sound signal (S1070), and performs time delay compensation between the external sound signal and the reference sound signal based on the identified delay time You can (S1080). Time delay compensation may refer to synchronization between an external sound signal and a reference sound signal. That is, the electronic device may execute an algorithm for performing synchronization between the external sound signal and the reference sound signal and removing the echo component.

The method for controlling an electronic device according to the above-described various embodiments may be provided as a computer program product. The computer program product may include the S/W program itself or a non-transitory computer readable medium in which the S/W program is stored.

The non-transitory readable medium refers to a medium that stores data semi-permanently, rather than a medium that stores data for a short moment, such as a register, cache, memory, and the like, and can be read by a device. Specifically, the various applications or programs described above may be provided by being stored in a non-transitory readable medium such as a CD, DVD, hard disk, Blu-ray disk, USB, memory card, ROM, and the like.

In addition, although preferred embodiments of the present disclosure have been illustrated and described above, the present disclosure is not limited to the specific embodiments described above, and the technical field to which the disclosure belongs without departing from the gist of the present disclosure as claimed in the claims Various modifications may be made by those of ordinary skill in the art, and these modifications should not be individually understood from the technical spirit or perspective of the present disclosure.

Claims

MIC; and

processor; including;

The processor is

obtaining a plurality of similarities between phases of two sound signals based on one of an external sound signal input into the microphone converted into a frequency band and a reference sound signal converted into a frequency band,

identify a delay time of the external sound signal based on the reliability of the obtained plurality of similarities;

and performing time delay compensation between the external sound signal and the reference sound signal based on the identified delay time.
According to claim 1,

The processor is

and shifting the other sound signal based on one of the converted external sound signal and the converted reference sound signal to obtain a plurality of similarities between phases of the two sound signals.
According to claim 1,

The processor is

Estimating a time difference at which a similarity greater than or equal to a preset value among the obtained plurality of similarities appears as a candidate group for a time difference between an external sound signal and a reference sound signal, determining the reliability of the estimated candidate group, and if the reliability is recognized, the An electronic device for identifying a time difference of one of the candidate groups as a delay time of the external sound signal.
4. The method of claim 3,

The processor is

and obtaining the preset value by applying a preset weight to a degree of similarity in which a largest value appears among the plurality of obtained similarities.
5. The method of claim 4,

The preset weight is between 0 and 1, the electronic device.
4. The method of claim 3,

The processor is

When the interval between the time difference at which the maximum similarity value appears and the time difference at the minimum similarity level appears among the estimated candidate groups exceeds a preset value, it is determined that the reliability of the estimated candidate group is not recognized and the time difference of the estimated candidate group Ignoring the data of the electronic device.
4. The method of claim 3,

The processor is

When the degree of similarity equal to or greater than the preset value periodically appears among the estimated candidate group, a time difference in which the degree of similarity equal to or greater than the preset value first appears is identified as a time difference in which reliability is recognized.
4. The method of claim 3,

The processor is

The electronic device is configured to acquire the plurality of similarities based on a preset time unit.
9. The method of claim 8,

The processor is

When the degree of similarity equal to or greater than the preset value repeatedly appears, a time difference at which the greatest similarity appears is estimated as the delay time, and a variance of the estimated delay time is obtained for each of a plurality of preset time units, and the obtained variance is When the value is less than or equal to a set value, the electronic device identifies a time difference for which reliability is recognized.
10. The method of claim 9,

The processor is

When the obtained variance exceeds a preset value, it is determined that reliability of the estimated delay time is not recognized and the data of the estimated delay time is ignored.
According to claim 1,

The processor is

Sample the converted external sound signal and the converted reference sound signal, and obtain the plurality of similarities based on the sampled external sound signal and the sampled reference sound signal.
The method of claim 1,

The processor is

output a sound signal to the speaker,

and the speaker includes an external speaker located outside the electronic device.
obtaining a plurality of similarities between phases of two sound signals based on one of an external sound signal converted into a frequency band and a reference sound signal converted into a frequency band;

identifying a delay time of the external sound signal based on the obtained reliability of the plurality of similarities; and

and performing time delay compensation between the external sound signal and the reference sound signal based on the identified delay time.
14. The method of claim 13,

The step of obtaining the plurality of similarities includes:

and shifting the other sound signal based on one of the converted external sound signal and the converted reference sound signal to obtain a plurality of similarities between phases of the two sound signals.
14. The method of claim 13,

The step of identifying the delay time of the external sound signal comprises:

Estimate a time difference at which a similarity greater than or equal to a preset value among the obtained plurality of similarities appears as a candidate group for the time difference between an external sound signal and a reference sound signal, determine the reliability of the estimated candidate group, and if the reliability is recognized, the A method for controlling an electronic device, wherein a time difference of one of the candidate groups is identified as a delay time of the external sound signal.