WO2021091063A1

WO2021091063A1 - Electronic device and control method thereof

Info

Publication number: WO2021091063A1
Application number: PCT/KR2020/011937
Authority: WO
Inventors: 김가을; 최찬희
Original assignee: 삼성전자(주)
Priority date: 2019-11-05
Filing date: 2020-09-04
Publication date: 2021-05-14
Also published as: KR20210054246A

Abstract

An electronic device according to one embodiment of the present invention may comprise: a microphone; a storage unit; a communication unit that communicates with an external device; and a processor that requests to output a first sound to the external device through the communication unit, identifies the location of the external device on the basis of the direction in which the first sound is received by the microphone, stores, in the storage unit, information on the identified location of the external device, removes, on the basis of the stored information, a noise component, corresponding to a sound received from the location of the external device, from a signal of a second sound received by the microphone, and recognizes user utterances on the basis of the signal from which the noise component is removed.

Description

Electronic device and its control method

The present invention relates to an electronic device and a control method thereof, and more particularly, to an electronic device that performs a voice recognition function by removing noise from a surrounding environment, and a control method thereof.

Due to the recent development of voice recognition technology, most electronic devices are equipped with voice recognition technology to facilitate interaction between devices. Accordingly, a plurality of electronic devices used in the same space are subject to interference by audio signals each generated during speech recognition. In this case, when the user speaks in a noisy environment, the electronic device capable of recognizing voice uses a beamforming technique to extract the user's speech. Beamforming works by extracting the audio signal from a specific direction and removing the audio component from the other direction, creating a spatial filter. By extracting the signal transmitted in the user's speech direction from the entire input audio signal and filtering the signal in the other direction, only the user's speech can pass through the speech recognition system.

However, the beamforming technology currently used for speech recognition is well applied in an ideal situation. However, if multiple electronic devices are used in a limited space, each device does not have prior information about signals generated by other devices. The incidence of recognition errors increases.

At this time, the user must lower all the output levels of the other devices that they do not want to use, or raise the voice of the utterance to set the Signal-to-Noise Ratio (SNR) above the standard that the device can recognize. This increases the user's fatigue in the short term and reduces the use of the voice recognition function in the long term.

An object of the present invention is to improve the accuracy of user speech recognition by removing noise from surrounding environments included in sound acquired by an electronic device in a predetermined environment.

In an electronic device according to an embodiment of the present invention, there is provided a microphone; Storage; A communication unit that communicates with an external device; And a request to output a first sound to the external device through the communication unit, identify the location of the external device based on the direction in which the first sound is received by the microphone, and Information is stored in the storage unit, and based on the stored information, a noise component corresponding to the sound received from the location of the external device is removed from the signal of the second sound received by the microphone, and the noise component is removed. It may include a processor that recognizes the user's speech based on the generated signal.

The processor may identify whether the first sound is received based on a characteristic predefined for the first sound.

The electronic device according to an embodiment of the present invention further includes a storage unit, wherein the processor identifies a location of the external device based on a direction in which the first sound is received by the microphone, and Information about the location may be stored in the storage unit.

The characteristic may include information related to guidance on a location identification operation of the external device.

The characteristic may include an inaudible frequency band.

The processor may request the external device to output the first sound having the characteristic.

The processor may receive information on the characteristic from a server through the communication unit and store the received information in the storage unit.

The electronic device according to an embodiment of the present invention further includes a user input unit, and the processor may identify the location of the external device based on a user command input to the user input unit.

The storage unit stores information on a time point at which the location identification of the external device is performed, and the processor may identify the location of the external device at the time point at which the external device is executed based on the stored information.

The processor may receive information on the external device from a server through the communication unit and identify the location of the external device based on the received information.

The electronic device according to an embodiment of the present invention further includes a speaker, wherein the processor receives a request for outputting a third sound for identifying the location of the electronic device from the external device through the communication unit, and the speaker The third sound may be output.

A method for controlling an electronic device according to an embodiment of the present invention, comprising: requesting to output a first sound to the external device by communicating with an external device through a communication unit; Identifying a location of the external device based on a direction in which the first sound is received by a microphone; Storing information on the location of the identified external device in a storage unit; Removing a noise component corresponding to the sound received from the location of the external device from the signal of the second sound received by the microphone based on the stored information; And recognizing the user's speech based on the signal from which the noise component has been removed.

Storing information on a predefined characteristic of the first sound; And identifying whether the first sound is received based on the predefined characteristic.

Identifying a location of the external device based on a direction in which the first sound is received by the microphone; It may include storing information on the location of the identified external device in a storage unit.

It may include the step of requesting the external device to output the first sound having the characteristic.

Receiving information on the characteristic from a server through the communication unit; And storing the received information in the storage unit.

It may include the step of identifying the location of the external device based on the user's command input to the user input unit.

Storing information on a time point at which the location identification of the external device is performed; It may include the step of identifying the location of the external device at the time of execution based on the stored information.

Receiving information of the external device from a server through the communication unit; It may include the step of identifying the location of the external device based on the received information.

Receiving a request for outputting a third sound for identifying the location of the electronic device from the external device through the communication unit; It may include controlling the speaker to output the third sound.

According to the present invention, even when a plurality of electronic devices are being used at the same time, the accuracy of speech recognition for a user's speech can be improved.

In addition, it is efficient because it can avoid cumbersome processes such as lowering and raising the volume of other electronic devices for voice recognition.

1 is a diagram showing an entire system according to an embodiment of the present invention.

2 is a block diagram showing the configuration of an electronic device according to an embodiment of the present invention.

3 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.

4 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.

5 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.

6 is a diagram showing a utterance list according to an embodiment of the present invention.

7 is a diagram illustrating communication between a server and a device according to an embodiment of the present invention.

8 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.

9 is a diagram illustrating a state of identifying a location of an external device according to an embodiment of the present invention.

10 is a diagram showing information on an external device according to an embodiment of the present invention.

11 is a diagram illustrating a situation in which an electronic device processes a received sound according to an embodiment of the present invention.

12 is a diagram showing a flowchart of an operation performed by the electronic device of the present embodiment.

13 is a diagram illustrating a noise removal block for processing sound by the electronic device of the present embodiment.

14 is a diagram illustrating an entire system according to an embodiment of the present invention.

15 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.

16 is a diagram illustrating a system after speech processing according to an embodiment of the present invention.

Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numbers or reference numerals refer to components that perform substantially the same function, and the size of each component in the drawings may be exaggerated for clarity and convenience of description. However, the technical idea of the present invention and its core configuration and operation are not limited to the configuration or operation described in the following embodiments. In describing the present invention, when it is determined that a detailed description of a known technology or configuration related to the present invention may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted.

In an embodiment of the present invention, terms including ordinal numbers such as first and second are used only for the purpose of distinguishing one component from other components, and the expression of the singular number is plural unless the context clearly indicates otherwise. Includes the expression of. In addition, in the embodiment of the present invention, terms such as'consist of','include','have', etc. are used for the presence of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. Or it should be understood that the possibility of addition is not precluded. In addition, in the embodiment of the present invention, the'module' or'unit' performs at least one function or operation, and may be implemented as hardware or software, or as a combination of hardware and software, and is integrated into at least one module. Can be implemented. In addition, in an embodiment of the present invention, at least one of the plurality of elements refers not only to all of the plurality of elements, but also to each one or a combination thereof excluding the rest of the plurality of elements.

1 is a diagram showing an entire system according to an embodiment of the present invention. As illustrated in FIG. 1, the

electronic devices

100, 110, and 120 may be implemented as a display device capable of displaying an image, or may be implemented as a device without a display. For example, the

electronic devices

100, 110, 120 may include a TV, an AI assistance device (AI speaker, etc.), a computer, a smart phone, a tablet, a portable media player, a wearable device, a video wall, an electronic frame, and the like. have. In addition, the

electronic devices

100, 110, and 120 are various types of devices such as image processing devices such as set-top boxes that do not have a display, household appliances such as refrigerators, Bluetooth speakers, washing machines, and information processing devices such as a computer body. Can be implemented. Hereinafter, for convenience of explanation, it is assumed that the electronic device 100 is implemented as a TV, and the

external devices

110 and 120 are respectively implemented as a speaker and a refrigerator. However, the electronic device and the external device of the present invention They are not limited thereto, and the present invention is established even if the roles of any one external device and an electronic device are changed.

According to an embodiment of the present invention, as shown in FIG. 1, the electronic device 100 and a plurality of

external devices

110 and 120 are placed in a use space. In this case, when the user 130 attempts to use the voice recognition function of the electronic device 100, the spoken voice of the user 130 and the sound from the

external devices

110 and 120 may be mixed. If so, when the electronic device 100 processes the acquired sound, it becomes difficult to distinguish which sound signal is a signal caused by the user's utterance. Accordingly, in the present invention, when the user 130 utters to use the voice recognition function of the electronic device 100, the external device 110, in order to obtain a signal by the user's utterance input to the electronic device 100, 120). And, among the sound signals input to the electronic device 100, a sound signal coming from the identified position is removed. In this case, since the sound signal generated from the

other devices

110 and 120 other than the electronic device 100 requiring voice recognition can be removed, the electronic device 100 recognizes only the spoken voice of the user 130 Accurate voice recognition is possible.

2 is a block diagram showing the configuration of an electronic device according to an embodiment of the present invention. As shown in FIG. 2, the electronic device 100 includes a communication unit 210, a signal input/output unit 220, and a broadcast receiving unit. 230, a display unit 240, a user input unit 250, a storage unit 260, a microphone 270, a speaker 280, and a processor 290. The electronic device 100 shown in FIG. 2 shows an example in which the communication unit 210, the signal input/output unit 220, the broadcast reception unit 230, and the like are implemented separately, but this is only an example, and in case Accordingly, for example, the broadcast receiving unit 230 may be included in the communication unit 210 or the signal input/output unit 220 to be implemented. In addition, the electronic device 100 may be implemented including all the configurations shown in FIG. 2, but as another example, an omitted implementation of any one or more of them may be possible. For example, as an example of a device without a network function, an implementation without the communication unit 210 is also possible. A more specific configuration will be described in detail below.

Hereinafter, the configuration of the electronic device 100 will be described. In the present embodiment, a case where the electronic device 100 is a TV is described, but since the electronic device 100 may be implemented as various types of devices, the present embodiment does not limit the configuration of the electronic device 100. It is also possible that the electronic device 100 is not implemented as a display device, and in this case, the electronic device 100 may not include components for image display such as the display unit 240. For example, when the electronic device 100 is implemented as a set-top box, the electronic device 100 may output an image signal or the like to a display device such as an external TV through the signal input/output unit 220.

The communication unit 210 is a two-way communication circuit including at least one or more of components such as a communication module and a communication chip corresponding to various types of wired and wireless communication protocols. For example, the communication unit 210 is a LAN card wired to a router or gateway via Ethernet, a wireless communication module that performs wireless communication with an AP according to a Wi-Fi method, or a one-to-one direct wireless device such as Bluetooth. It may be implemented as a wireless communication module that performs communication. The communication unit 210 communicates with a server on a network to transmit and receive data packets with the server. As another embodiment, the communication unit 210 may be connected to other

external devices

110 and 120 other than the server, and receive various data including video/audio data from other external devices, or video/audio data to other external devices. You can transmit various data including. When receiving voice or sound through the microphone 270 provided in the electronic device 100, the communication unit 210 digitizes an analog voice signal (or sound signal) and transmits it to the processor 290, and transmits it to the processor 290. When receiving a voice signal, the analog voice signal is digitized and transmitted to the communication unit 210 using data transmission communication such as Bluetooth or Wi-Fi.

The signal input/output unit 220 is connected to an external device such as a set-top box, an optical media player, or an external display device or a speaker in a 1:1 or 1:N (N is a natural number) method, thereby providing video from the external device. /Receive an audio signal or output a video/audio signal to the external device. The signal input/output unit 120 includes, for example, a connector or port according to a preset transmission standard, such as an HDMI port, a DisplayPort, a DVI port, a Thunderbolt, a USB port, and the like. At this time, for example, HDMI port, DP, Thunderbolt, etc. are connectors or ports capable of simultaneously transmitting video/audio signals, and as another embodiment, the signal input/output unit 220 transmits video/audio signals separately. It may also include a connector or port.

The broadcast receiver 230 may be implemented in various ways in accordance with the standard of the received image signal and the implementation form of the electronic device 100. Since the video signal is a broadcast signal, the broadcast receiving unit 230 includes a tuner that tunes the broadcast signal for each channel. The input signal may be input from an external device, and may be input from an external device such as a PC, AV device, TV, smart phone, smart pad, or the like. Also, the input signal may be derived from data received through a network such as the Internet. In this case, the broadcast receiving unit 230 may include a network communication unit that communicates with an external device.

The broadcast receiver 230 may use wired or wireless communication as a communication method. The broadcast receiving unit 230 is embedded in the electronic device 100 according to the present embodiment, but may be implemented in the form of a dongle or a module and attached to and detached from the connector of the electronic device 100. The broadcast receiving unit 230 receives a wired digital signal including a clock signal of a preset frequency (clock frequency) when including a wired communication unit, and receives a wireless digital signal of a preset frequency (carrier frequency) when including a wireless communication unit. Receive. Among the input signals input through the broadcast receiving unit 230, a preset frequency signal (clock signal or carrier frequency signal) may be processed by passing through the filter unit. The type of the input signal received by the broadcast receiver 230 is not limited, and for example, at least one of a wired digital signal, a wireless digital signal, and an analog signal may be received. Here, when the broadcast receiver 230 receives an analog signal, it may receive an input signal to which a preset frequency signal is added.

The display unit 240 includes a display panel capable of displaying an image on a screen. The display panel is provided with a light-receiving structure such as a liquid crystal method or a self-luminous structure such as an OLED method. The display unit 240 may additionally include an additional component according to the structure of the display panel. For example, if the display panel is a liquid crystal type, the display unit 240 includes a liquid crystal display panel and a backlight unit that supplies light. And, it includes a panel driving substrate for driving the liquid crystal of the liquid crystal display panel.

The user input unit 250 includes various types of input interface related circuits provided to perform user input. The user input unit 250 can be configured in various forms according to the type of the electronic device 100, for example, a mechanical or electronic button unit of the electronic device 100, a remote controller separated from the electronic device 100, There are a touch pad, a touch screen installed on the display unit 240, and the like.

The storage unit 260 stores digitized data. The storage unit 260 loads storage of nonvolatile properties that can store data regardless of whether or not power is provided, and data to be processed by the processor 290 is loaded, and if power is not provided, data is stored. Includes memory of volatile properties that cannot be done. Storage includes flash-memory, hard-disc drive (HDD), solid-state drive (SSD) read-only memory (ROM), etc., and buffers and random access memory (RAM) in memory Etc.

The microphone 270 collects sounds of an external environment including user speech. The microphone 270 transmits the collected sound signal to the processor 290. The electronic device 100 may include a microphone 270 that collects a user's voice, or may receive a voice signal from an external device such as a remote controller or a smart phone having a microphone. A remote controller application may be installed in an external device to control the electronic device 100 or perform functions such as voice recognition. In the case of an external device with such an application installed, the user's voice can be received, and the external device can transmit and receive data and control data using the electronic device 100 and Wi-Fi/BT or infrared light. A plurality of communication units 210 that can be implemented may exist in the electronic device.

The speaker 280 outputs audio data processed by the processor 290 as sound. The speaker 280 includes a unit speaker provided to correspond to audio data of one audio channel, and may include a plurality of unit speakers to respectively correspond to audio data of a plurality of audio channels. In the present invention, when the electronic device 100 serves as an external device of another device, the speaker 280 has a meaning of outputting a sound to inform another device of its location. As another embodiment, the speaker 280 may be provided separately from the electronic device 100, and in this case, the electronic device 100 may transmit audio data to the speaker 280 through the signal input/output unit 220. .

The processor 290 includes one or more hardware processors 290 implemented as CPUs, chipsets, buffers, circuits, etc. mounted on a printed circuit board, and may be implemented as a system on chip (SOC) depending on a design method. . When the electronic device 100 is implemented as a display device, the processor 290 includes modules corresponding to various processes such as a demultiplexer, a decoder, a scaler, an audio digital signal processor (DSP), and an amplifier. Here, some or all of these modules may be implemented as an SOC. For example, modules related to image processing such as a demultiplexer, decoder, and scaler may be implemented as an image processing SOC, and an audio DSP may be implemented as an SOC and a separate chipset.

The processor 290 converts the voice signal acquired by the microphone 270 or the like into voice data, and processes the converted voice data. Thereafter, the processor 290 performs voice recognition based on the processed voice data, identifies a command indicated by the voice data, and performs an operation according to the identified command. The voice data may be text data obtained through a speech-to-text (STT) process for converting a voice signal into text data. When the STT processing process is passed, a server that is different from the STT server or a server that also serves as an STT server, or a server that processes data in the server may perform a specific function based on the information/data transmitted to the electronic device. Both the voice data processing process and the command identification and execution process may be performed in the electronic device 100. However, in this case, since the system load and required storage capacity required for the electronic device 100 are relatively large, at least some of the processes are performed by at least one server that is communicatively connected to the electronic device 100 through a network. Can be done.

According to an embodiment of the present invention, the processor 290 receives the spoken voice of the user 130 by the microphone 270 or the like. However, when receiving the user's uttered voice, the electronic device 100 of the present invention includes, in addition to the user's uttered voice, the sound from other

external devices

110 and 120 installed around the electronic device 100, That is, noise can be received together. The processor 290 removes these noises in the process of processing the received sound and controls to perform an operation corresponding to the user's spoken voice. The process of removing noise will be described in detail later.

The processor 290 according to the present invention may call and execute at least one command of software stored in a storage medium that can be read by a machine such as the electronic device 100. This enables a device such as the electronic device 100 to be operated to perform at least one function according to the called at least one command. The one or more instructions may include code generated by a compiler or code executable by an interpreter. A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here,'non-transient' only means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic wave), and this term is used when data is semi-permanently stored in the storage medium and temporarily stored. It does not distinguish between cases.

On the other hand, the processor 290 receives the sound of another external device, that is, noise, along with the spoken voice of the user 130 by the microphone 270, and removes these noises from the entire received sound to improve the speech voice of the user. At least one of machine learning, neural network, or deep learning algorithm is used as rule-based or artificial intelligence algorithm for at least part of data analysis, processing, and generation of result information to perform the corresponding operation. You can do it.

For example, the processor 290 may perform functions of a learning unit and a recognition unit together. The learning unit may perform a function of generating a learned neural network network, and the recognition unit may perform a function of recognizing (or inferring, predicting, estimating, or determining) data using the learned neural network network. The learning unit can create or update a neural network network. The learning unit may acquire training data to create a neural network. For example, the learning unit may acquire the learning data from the storage unit 260 or externally. The training data may be data used for training a neural network network, and the neural network network may be trained by using the data obtained by performing the above-described operation as training data.

The learning unit may perform pre-processing on the acquired training data before training the neural network network using the training data, or may select data to be used for training from among a plurality of training data. For example, the learning unit may process the training data into a preset format, filter it, or add/remove noise to process the training data into a form suitable for learning. The learning unit may generate a neural network network configured to perform the above-described operation by using the preprocessed training data.

The learned neural network network may be composed of a plurality of neural network networks (or layers). Nodes of a plurality of neural network networks have weights, and the plurality of neural network networks may be connected to each other so that an output value of one neural network is used as an input value of another neural network. Examples of neural network networks include CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN (Bidirectional Recurrent Deep Neural Network), and It may include models such as Deep Q-Networks.

Meanwhile, the recognition unit may acquire target data to perform the above-described operation. The target data may be obtained from the storage unit 260 or externally. The target data may be data to be recognized by the neural network. The recognition unit may perform pre-processing on the acquired target data or select data to be used for recognition from among a plurality of target data before applying the target data to the learned neural network network. For example, the recognition unit may process target data into a preset format, filter, or add/remove noise to process the target data into a form suitable for recognition. The recognition unit may obtain an output value output from the neural network network by applying the preprocessed target data to the neural network network. The recognition unit may acquire a probability value or a reliability value together with the output value.

For example, the control method of the electronic device 100 according to the present invention may be provided by being included in a computer program product. Computer program products can be traded between sellers and buyers as commodities. The computer program product is distributed in the form of a device-readable storage medium (e.g., CD-ROM), or through an application store (e.g., Play StoreTM) or between two user devices (e.g., smartphones). It can be distributed directly, online (eg, downloaded or uploaded). In the case of online distribution, at least a part of the computer program product may be temporarily stored or temporarily generated in a storage medium that can be read by a device such as a server of a manufacturer, a server of an application store, or a memory of a relay server.

3 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention. In this embodiment, the electronic device 100 shows a flow chart for identifying the locations of the

external devices

110 and 120 in order to remove sound generated from the

external devices

110 and 120 in order to more accurately recognize the user's utterance. do. Accordingly, the electronic device 100 may request the communication unit 210 to output a sound (first sound) informing the external device 110 of its location (S310). Here, the sound that announces your location means, for example, if the external device 110 is an AI speaker, "I am the AI speaker Galaxy Home." It could be the sound of the back. Such a sound may be a sentence constructed as a meaning to inform the user that an operation to identify a location between devices is being performed, but may be made of an inaudible frequency for the user's convenience, and is not limited to any one. In addition, the request of operation S310 may be initiated based on a user's command input to the user input unit 250. The user's command for identifying the location between devices may be performed, for example, by inputting a button of a remote controller or touching a display screen. When the external device 110 outputs the first sound, the electronic device 100 identifies the location of the external device 110 based on the direction in which the first sound is received (S320). A process by which the electronic device 100 identifies the location of the external device 110 will be described later. After identifying the location of the external device 110, the processor 290 stores information on the location of the identified external device 110 in the storage unit 260 (S330). The process according to an embodiment of the present invention can be applied equally to other external devices 120.

4 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention. According to an embodiment of the present invention, the electronic device 100 may have various times (hereinafter, also referred to as “external device location identification”) at which an operation to identify the location of the external device 110 is performed. The storage unit 260 of the electronic device 100 may store information on a time point for performing location identification of an external device. The processor 290 may determine a time point for performing external device location identification by referring to information stored in the storage unit 260 (S410). The information stored in the storage unit 260 of the present embodiment may represent the following time point as a time point for performing the location identification of the external device. For example, the electronic device 100 performs initial setup of voice recognition during the initial installation process. In this process, the user connects the power of the electronic device 100, connects it to the home network, and repeatedly reads the previously set sentences aloud to make initial settings for using the voice recognition function of the electronic device 100. . In this case, it is assumed that most home appliances such as TVs and refrigerators do not change their location frequently after installing them once due to their size and weight. Accordingly, when the electronic device 100 is initially installed and connected to a network, the location of the external device can be identified. As another embodiment, the time point indicated by the information stored in the storage unit 260 may indicate a case in which either the electronic device 100 or the external device 110 is not connected to the Internet for a long time, or the power has been turned off. . That is, if the electronic device 100 or the external device 110 is not connected or powered off for a long time, it can be predicted that their location has changed, and thus the location of the external device can be newly identified. As another example, when a new device is connected to a network, it may be detected and the location of the new external device may be identified. Referring again to FIG. 4, when the time point indicated by the information stored in the storage unit 260 arrives (Yes in S420), the electronic device 100 enters operation S310 in FIG. 3, and the same process as described above. Run. According to the present embodiment, since the electronic device 100 can identify the location of the external device 110 in various environments, the utilization is high.

5 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention. This embodiment is a more specific example of the operation (S320) of identifying the location of the external device 110 based on the direction in which the first sound is received, described with reference to FIG. 3. According to the present embodiment, the storage unit 260 of the electronic device 100 stores a predefined characteristic of the received first sound. Here, the characteristic of the first sound may be a waveform of the sound such as amplitude, frequency, and period of the first sound. In another embodiment, the characteristic of the first sound is identification information of the external device 110 such as the name and manufacturer of the external device 110, or a list of utterances included in the first sound output from the external device 110. It can be information about. Accordingly, when the electronic device 100 receives the sound (S520), the processor 290 identifies whether the received sound corresponds to a predefined characteristic of the first sound stored in the storage unit 260. (S530). If the received sound is identified as the first sound corresponding to a predefined characteristic (S540), the processor 290 may then perform an operation of identifying the location of the external device 110 generating the first sound. have. In another embodiment, the processor 290 transmits the first sound having the characteristics of the first sound to the external device 110 through the communication unit 210 based on the characteristics of the first sound stored in the storage unit 260. You can ask to print.

6 is a diagram showing a utterance list according to an embodiment of the present invention. In this embodiment, a utterance list may be stored as one of the predefined characteristics described in FIG. 5. The utterance list is a list consisting of sentences for the external device 110 to inform the electronic device 100 of its location, and the electronic device 100 listens to the sound of the external device 110 and the location of the external device 110 Can be identified. In addition, the utterance list may be used as a phrase to inform the user that an operation of identifying a location between devices is being performed. As illustrated in FIG. 4, even though information on the time point for performing the location identification of the external device is stored, the user may not be aware of this, and thus the utterance list composed of sentences may be stored. For example, if the external device 110 is a speaker, "I am an AI speaker Galaxy Home. Please call me when you want to listen to music." " You can make the sound of your back repeatedly. The electronic device 100 may listen to the exemplified sound of the external device 110 and identify the location of the electronic device in consideration of amplitude, frequency, and period.

7 is a diagram illustrating communication between a server and a device according to an embodiment of the present invention, and FIG. 8 is a diagram illustrating an operation flowchart of an electronic device according to an embodiment of the present invention. According to the present embodiment, the processor 290 receives information on the characteristics of the external device 110 from another device such as the server 710 through the communication unit 210 (S810), and stores the received information in the storage unit 260 ) Can be saved. In this case, in a case where it is difficult to obtain information on the characteristics of each other because the manufacturers of the devices are different, information on the characteristics may be received through a server or the like. The electronic device 100 may more easily identify the location of the external device 110 based on the received and stored information (S820).

9 is a diagram illustrating a state of identifying a location of an external device according to an embodiment of the present invention. There are several methods of estimating the direction of a sound source from which sound is generated. Among them, according to FIG. 9, the location of the sound source can be identified through a difference in time when the sound reaches a specific region. According to the present embodiment, when sound is generated from the external device 110, there is a difference in time for the generated sound to reach any two points A and B of the electronic device, respectively. At this time, the distance from the external device to A and B can be known using the speed of the sound and the time it takes to reach each point. At this time, assuming that the distance d between A and B is very small compared to the distance r between the electronic device 100 and the external device 110 (d≪r), the distance between the points A and B of the electronic device and the external device The difference Δl can be known, and through this, the angle θ between the electronic device 100 and the external device 110 can be known. Accordingly, the electronic device 100 includes a plurality of microphones 270 disposed to be spaced apart from each other, such as A and B, and the external device 110 through a distance r and an angle θ from the external device 110 Location can be identified. However, the method of identifying the location of the external device 110 illustrated in FIG. 9 is only an example, and methods of identifying the location of the external device 110 may be various according to the present disclosure.

10 is a diagram showing information on an external device according to an embodiment of the present invention. According to an embodiment, the processor 290 may identify the location of the external device 110 and store information on the location of the identified external device 110 in the storage unit 260 in the form of a table 1010. have. In this case, the processor 290 may map and store information about the location of the external device 110 by mapping the name, distance, direction, and connection status of the external device 110. For example, in the case of the external device 1, the distance to the electronic device is r1, and is located at an azimuth angle θ1 with respect to the reference direction of the electronic device 100. In the case of the external device 2, the distance from the electronic device 100 is r2, and is located at an azimuth angle θ2. The location of the external device 110 of the present embodiment is indicated by an azimuth with respect to the reference direction of the electronic device 100, but this is only an example, and information indicating the location of the external device 110 according to the present disclosure is various can do. In addition, when the external device 110 is not connected, since the sound heard in the direction in which the external device 110 is located is not to be removed, information about whether the external device 110 is connected to a network or power source may also be stored. Other information such as the name of the external device 110 may be received from the external device 110 through the communication unit 210.

11 is a diagram illustrating a situation in which an electronic device processes a received sound according to an embodiment of the present invention, and FIG. 12 is a flowchart of an operation performed by the electronic device according to the present embodiment. It shows the noise reduction block in which the device processes sound. Referring to FIG. 11, when the user 130 wants to use the voice recognition function of the electronic device 100, the sound generated from the

external devices

110 and 120 in addition to the voice S1 caused by the user 130's utterance Assume that (S2, S3) exists. First, referring to FIG. 12, the electronic device 100 receives a sound (hereinafter, also referred to as “second sound”) through a microphone 270 (S1210). In this case, as shown in FIGS. 11 and 13, the electronic device 100 includes a second sound (S1) of the user 130 and the sound (S2, S3) of the

external devices

110 and 120 combined. S) is obtained from the microphone 270.

Next, referring to FIG. 12, the processor 290 of the electronic device 100 determines a noise component corresponding to the sound received from the location of the

external devices

110 and 120 in the received second sound S signal. Remove (S1220). At this time, based on the information on the location of the

external devices

110 and 120 identified by the electronic device 100 (see 1010 of FIG. 10), the processor 290 of the electronic device 100 The location of the

external devices

110 and 120 in which the noise components S2 and S3 included in the signal of the second sound S are generated may be determined. Accordingly, the processor 290 may separate and remove the noise components S2 and S3 of the

external devices

110 and 120 from the obtained signal of the second sound S.

In order to separate and remove the noise components S2 and S3 of the

external devices

110 and 120 from the signal of the second sound S, as shown in FIG. 13, the processor 290 of the electronic device 100 is A removal block 1310 may be included. The noise removal block 1310 may be implemented by a combination of hardware and/or software. The noise removal block 1310 of the processor 290 separates the noise components (S2, S3) of the

external devices

110 and 120 from the signal of the second sound (S) using beamforming technology to (S1) can be extracted. Specifically, the noise removal block 1310 divides the signal of the second sound (S) into a certain frequency range using a local Fourier transform in the frequency domain, and then removes the overlapping frequency range among signals coming from different directions. Separate The processor 290 refers to the table 1010 as illustrated in FIG. 10 to check whether

external devices

110 and 120 that may generate noise exist. For example, in the table 1010, the processor 290 confirms that external devices 1 and 2 (110, 120) to which a network and power are connected exist. Subsequently, as shown in FIG. 13, the processor 290 uses the location information (θ1, θ2) of the external devices 1 and 2 (110, 120), and the external device 110 among the signals of the second sound (S). By removing the frequency range corresponding to the noise components S2 and S3 of 120), the user's speech voice S1 may be extracted from the signal of the second sound S. Finally, referring to FIG. 12 again, the processor 290 recognizes the user's speech speech based on the signal S1 from which the noise components S2 and S3 have been removed (S1230).

According to an embodiment of the present invention, since the electronic device 100 identifies the presence and location of the external device and classifies the sound generated from the direction in which the external device is present as noise, the obtained second sound S After discriminating and removing the signal that becomes noise, the user's speech voice (S1) can be obtained. That is, according to an embodiment, by discriminating which of the acquired signals of the second sound (S) is the user's speech voice or the sound generated by an external device, a loud sound is generated simply by using the difference in the loudness of the sound. Compared to the existing technology that separates the user's spoken voice by setting the direction to the effective direction and focusing thereon, the accuracy in recognizing the user's spoken voice can be improved, and the location of the external device is identified in advance so that the voice processing can be performed. The speed is fast.

14 is a diagram illustrating an entire system according to an embodiment of the present invention, and FIG. 15 is a diagram illustrating an operation flowchart of an electronic device according to the corresponding system. According to the previous embodiment, the electronic device 100 received sound generated from the

external devices

110 and 120 and identified their locations, but in this drawing, not only the electronic device 100 but also the

external devices

110 and 120 They also identify the locations of the rest of the devices in their respective locations so that they can see all of their locations. When this occurs at the time when the external device location identification described in FIG. 4 is performed, the locations of each device can be identified. According to an embodiment of the present invention, the processor 290 of the electronic device 100 may receive a list of external devices connected to the network through the communication unit 210 and store the list in the storage unit 260. Yes (S1510). The processor may identify and store the location of the external device existing in the list (S1520). After completing this process, the processor 290 determines whether the locations of all external devices existing in the list have been identified by referring to a list of external devices connected to the previously stored network (S1530). If the locations of all external devices in the list are identified (Yes in S1530), the processor 290 ends the operation. If there is an external device that has not identified the location (No in S1530), the processor 290 performs a process of identifying the location of the external device that has not been identified again. When such a process is completed for the electronic device, it is similarly performed for each external device. Accordingly, all devices existing in a limited space and connected to the network can identify the location of external devices other than themselves, and the present invention can be applied.

16 is a diagram illustrating a system after speech processing according to an embodiment of the present invention. According to the present embodiment, as shown in FIG. 13, after a voice pre-processing process of acquiring the user's spoken voice S1 in the noise removal block 1310, the communication unit of the electronic device 100 is a Bluetooth module that connects the devices. A control command is transmitted to the communication unit of the external device to adjust the volume of the

external devices

110 and 120 using (1610). The controller 1620 of the external device adjusts the volume of the external device accordingly. In this case, when the user's voice recognition is completed in the electronic device 100, the volume of the other device is automatically restored to its original state. In the case of a Bluetooth module, it can be easily replaced with Wi-Fi. However, the present embodiment is applicable not only to the speech obtained through the voice preprocessing process as shown in FIG. 13, but also to a case where it is confirmed that the electronic device does not have an influence of noise from an external device, and is not limited to any one.

Claims

In the electronic device,

microphone;

A communication unit that communicates with an external device; And

Request to output the first sound to the external device through the communication unit,

Corresponds to the sound received from the location of the external device in the signal of the second sound received by the microphone based on information on the location of the external device identified based on the direction in which the first sound is received by the microphone Remove the noise component,

An electronic device comprising a processor for recognizing a user's speech based on the signal from which the noise component has been removed.
The electronic device of claim 1, wherein the processor identifies whether the first sound is received based on a predefined characteristic of the first sound.
The method of claim 1,

Further comprising a storage unit,

The processor identifies the location of the external device based on the direction in which the first sound is received by the microphone,

An electronic device that stores information on the location of the identified external device in the storage unit.
The method of claim 2,

The characteristic is an electronic device including information related to guidance on a location identification operation of the external device.
The method of claim 2,

The characteristic includes an inaudible frequency band.
The method of claim 2,

The processor is an electronic device that requests the external device to output the first sound having the characteristic.
The method of claim 2,

The processor is an electronic device that receives the characteristic from a server through the communication unit.
The method of claim 1,

Further comprising a user input unit,

The processor is an electronic device that identifies the location of the external device based on a user command input to the user input unit.
The method of claim 1,

The storage unit stores information on a time point at which the location identification of the external device is performed,

The processor is an electronic device that identifies the location of the external device at the execution time based on the stored information.
The method of claim 1,

The processor,

Receiving information of the external device from the server through the communication unit,

An electronic device that identifies the location of the external device based on the received information.
The method of claim 1,

Including more speakers,

The processor,

Receiving a request for outputting a third sound for identifying the location of the electronic device from the external device through the communication unit,

An electronic device that controls the speaker to output the third sound.
In the control method of an electronic device,

Communicating with an external device through a communication unit and requesting to output a first sound to the external device;

Noise corresponding to the sound received from the location of the external device in the signal of the second sound received by the microphone based on information on the location of the external device identified based on the direction in which the first sound is received by the microphone Removing the ingredients; And

And recognizing a user's speech based on the signal from which the noise component has been removed.
The method of claim 12,

Storing a predefined characteristic for the first sound; And

And identifying whether the first sound is received based on the predefined characteristic.
The method of claim 12,

Identifying a location of the external device based on a direction in which the first sound is received by the microphone;

And storing information on the location of the identified external device in a storage unit.
The method of claim 12,

The characteristic is a control method of an electronic device including information related to guidance on a location identification operation of the external device.