WO2021091063A1 - Electronic device and control method thereof - Google Patents

Electronic device and control method thereof Download PDF

Info

Publication number
WO2021091063A1
WO2021091063A1 PCT/KR2020/011937 KR2020011937W WO2021091063A1 WO 2021091063 A1 WO2021091063 A1 WO 2021091063A1 KR 2020011937 W KR2020011937 W KR 2020011937W WO 2021091063 A1 WO2021091063 A1 WO 2021091063A1
Authority
WO
WIPO (PCT)
Prior art keywords
external device
sound
electronic device
location
received
Prior art date
Application number
PCT/KR2020/011937
Other languages
French (fr)
Korean (ko)
Inventor
김가을
최찬희
Original Assignee
삼성전자(주)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자(주) filed Critical 삼성전자(주)
Publication of WO2021091063A1 publication Critical patent/WO2021091063A1/en

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01NINVESTIGATING OR ANALYSING MATERIALS BY DETERMINING THEIR CHEMICAL OR PHYSICAL PROPERTIES
    • G01N29/00Investigating or analysing materials by the use of ultrasonic, sonic or infrasonic waves; Visualisation of the interior of objects by transmitting ultrasonic or sonic waves through the object
    • G01N29/22Details, e.g. general constructional or apparatus details
    • G01N29/26Arrangements for orientation or scanning by relative movement of the head and the sensor
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • G01S3/808Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • G06F3/167Audio in a user interface, e.g. using voice commands for navigating, audio feedback
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0208Noise filtering
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones

Definitions

  • the present invention relates to an electronic device and a control method thereof, and more particularly, to an electronic device that performs a voice recognition function by removing noise from a surrounding environment, and a control method thereof.
  • the electronic device capable of recognizing voice uses a beamforming technique to extract the user's speech. Beamforming works by extracting the audio signal from a specific direction and removing the audio component from the other direction, creating a spatial filter. By extracting the signal transmitted in the user's speech direction from the entire input audio signal and filtering the signal in the other direction, only the user's speech can pass through the speech recognition system.
  • the user must lower all the output levels of the other devices that they do not want to use, or raise the voice of the utterance to set the Signal-to-Noise Ratio (SNR) above the standard that the device can recognize. This increases the user's fatigue in the short term and reduces the use of the voice recognition function in the long term.
  • SNR Signal-to-Noise Ratio
  • An object of the present invention is to improve the accuracy of user speech recognition by removing noise from surrounding environments included in sound acquired by an electronic device in a predetermined environment.
  • a microphone In an electronic device according to an embodiment of the present invention, there is provided a microphone; Storage; A communication unit that communicates with an external device; And a request to output a first sound to the external device through the communication unit, identify the location of the external device based on the direction in which the first sound is received by the microphone, and Information is stored in the storage unit, and based on the stored information, a noise component corresponding to the sound received from the location of the external device is removed from the signal of the second sound received by the microphone, and the noise component is removed. It may include a processor that recognizes the user's speech based on the generated signal.
  • the processor may identify whether the first sound is received based on a characteristic predefined for the first sound.
  • the electronic device further includes a storage unit, wherein the processor identifies a location of the external device based on a direction in which the first sound is received by the microphone, and Information about the location may be stored in the storage unit.
  • the characteristic may include information related to guidance on a location identification operation of the external device.
  • the characteristic may include an inaudible frequency band.
  • the processor may request the external device to output the first sound having the characteristic.
  • the processor may receive information on the characteristic from a server through the communication unit and store the received information in the storage unit.
  • the electronic device further includes a user input unit, and the processor may identify the location of the external device based on a user command input to the user input unit.
  • the storage unit stores information on a time point at which the location identification of the external device is performed, and the processor may identify the location of the external device at the time point at which the external device is executed based on the stored information.
  • the processor may receive information on the external device from a server through the communication unit and identify the location of the external device based on the received information.
  • the electronic device further includes a speaker, wherein the processor receives a request for outputting a third sound for identifying the location of the electronic device from the external device through the communication unit, and the speaker The third sound may be output.
  • a method for controlling an electronic device comprising: requesting to output a first sound to the external device by communicating with an external device through a communication unit; Identifying a location of the external device based on a direction in which the first sound is received by a microphone; Storing information on the location of the identified external device in a storage unit; Removing a noise component corresponding to the sound received from the location of the external device from the signal of the second sound received by the microphone based on the stored information; And recognizing the user's speech based on the signal from which the noise component has been removed.
  • Identifying a location of the external device based on a direction in which the first sound is received by the microphone may include storing information on the location of the identified external device in a storage unit.
  • It may include the step of identifying the location of the external device based on the user's command input to the user input unit.
  • Storing information on a time point at which the location identification of the external device is performed may include the step of identifying the location of the external device at the time of execution based on the stored information.
  • Receiving information of the external device from a server through the communication unit may include the step of identifying the location of the external device based on the received information.
  • the accuracy of speech recognition for a user's speech can be improved.
  • FIG. 1 is a diagram showing an entire system according to an embodiment of the present invention.
  • FIG. 2 is a block diagram showing the configuration of an electronic device according to an embodiment of the present invention.
  • FIG. 3 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
  • FIG. 4 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
  • FIG. 5 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
  • FIG. 6 is a diagram showing a utterance list according to an embodiment of the present invention.
  • FIG. 7 is a diagram illustrating communication between a server and a device according to an embodiment of the present invention.
  • FIG. 8 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
  • FIG. 9 is a diagram illustrating a state of identifying a location of an external device according to an embodiment of the present invention.
  • FIG. 10 is a diagram showing information on an external device according to an embodiment of the present invention.
  • FIG. 11 is a diagram illustrating a situation in which an electronic device processes a received sound according to an embodiment of the present invention.
  • FIG. 12 is a diagram showing a flowchart of an operation performed by the electronic device of the present embodiment.
  • FIG. 13 is a diagram illustrating a noise removal block for processing sound by the electronic device of the present embodiment.
  • FIG. 14 is a diagram illustrating an entire system according to an embodiment of the present invention.
  • 15 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
  • 16 is a diagram illustrating a system after speech processing according to an embodiment of the present invention.
  • the electronic devices 100, 110, and 120 may be implemented as a display device capable of displaying an image, or may be implemented as a device without a display.
  • the electronic devices 100, 110, 120 may include a TV, an AI assistance device (AI speaker, etc.), a computer, a smart phone, a tablet, a portable media player, a wearable device, a video wall, an electronic frame, and the like.
  • the electronic devices 100, 110, and 120 are various types of devices such as image processing devices such as set-top boxes that do not have a display, household appliances such as refrigerators, Bluetooth speakers, washing machines, and information processing devices such as a computer body.
  • the electronic device 100 is implemented as a TV
  • the external devices 110 and 120 are respectively implemented as a speaker and a refrigerator.
  • the electronic device and the external device of the present invention They are not limited thereto, and the present invention is established even if the roles of any one external device and an electronic device are changed.
  • the electronic device 100 and a plurality of external devices 110 and 120 are placed in a use space.
  • the spoken voice of the user 130 and the sound from the external devices 110 and 120 may be mixed. If so, when the electronic device 100 processes the acquired sound, it becomes difficult to distinguish which sound signal is a signal caused by the user's utterance.
  • the external device 110 when the user 130 utters to use the voice recognition function of the electronic device 100, the external device 110, in order to obtain a signal by the user's utterance input to the electronic device 100, 120).
  • a sound signal coming from the identified position is removed.
  • the electronic device 100 recognizes only the spoken voice of the user 130 Accurate voice recognition is possible.
  • the electronic device 100 includes a communication unit 210, a signal input/output unit 220, and a broadcast receiving unit. 230, a display unit 240, a user input unit 250, a storage unit 260, a microphone 270, a speaker 280, and a processor 290.
  • the electronic device 100 shown in FIG. 2 shows an example in which the communication unit 210, the signal input/output unit 220, the broadcast reception unit 230, and the like are implemented separately, but this is only an example, and in case Accordingly, for example, the broadcast receiving unit 230 may be included in the communication unit 210 or the signal input/output unit 220 to be implemented.
  • the electronic device 100 may be implemented including all the configurations shown in FIG. 2, but as another example, an omitted implementation of any one or more of them may be possible.
  • an implementation without the communication unit 210 is also possible. A more specific configuration will be described in detail below.
  • the configuration of the electronic device 100 will be described.
  • the electronic device 100 is a TV is described, but since the electronic device 100 may be implemented as various types of devices, the present embodiment does not limit the configuration of the electronic device 100.
  • the electronic device 100 is not implemented as a display device, and in this case, the electronic device 100 may not include components for image display such as the display unit 240.
  • the electronic device 100 may output an image signal or the like to a display device such as an external TV through the signal input/output unit 220.
  • the communication unit 210 is a two-way communication circuit including at least one or more of components such as a communication module and a communication chip corresponding to various types of wired and wireless communication protocols.
  • the communication unit 210 is a LAN card wired to a router or gateway via Ethernet, a wireless communication module that performs wireless communication with an AP according to a Wi-Fi method, or a one-to-one direct wireless device such as Bluetooth. It may be implemented as a wireless communication module that performs communication.
  • the communication unit 210 communicates with a server on a network to transmit and receive data packets with the server.
  • the communication unit 210 may be connected to other external devices 110 and 120 other than the server, and receive various data including video/audio data from other external devices, or video/audio data to other external devices. You can transmit various data including.
  • the communication unit 210 digitizes an analog voice signal (or sound signal) and transmits it to the processor 290, and transmits it to the processor 290.
  • the analog voice signal is digitized and transmitted to the communication unit 210 using data transmission communication such as Bluetooth or Wi-Fi.
  • the signal input/output unit 220 is connected to an external device such as a set-top box, an optical media player, or an external display device or a speaker in a 1:1 or 1:N (N is a natural number) method, thereby providing video from the external device. /Receive an audio signal or output a video/audio signal to the external device.
  • the signal input/output unit 120 includes, for example, a connector or port according to a preset transmission standard, such as an HDMI port, a DisplayPort, a DVI port, a Thunderbolt, a USB port, and the like. At this time, for example, HDMI port, DP, Thunderbolt, etc. are connectors or ports capable of simultaneously transmitting video/audio signals, and as another embodiment, the signal input/output unit 220 transmits video/audio signals separately. It may also include a connector or port.
  • the broadcast receiver 230 may be implemented in various ways in accordance with the standard of the received image signal and the implementation form of the electronic device 100. Since the video signal is a broadcast signal, the broadcast receiving unit 230 includes a tuner that tunes the broadcast signal for each channel.
  • the input signal may be input from an external device, and may be input from an external device such as a PC, AV device, TV, smart phone, smart pad, or the like. Also, the input signal may be derived from data received through a network such as the Internet.
  • the broadcast receiving unit 230 may include a network communication unit that communicates with an external device.
  • the broadcast receiver 230 may use wired or wireless communication as a communication method.
  • the broadcast receiving unit 230 is embedded in the electronic device 100 according to the present embodiment, but may be implemented in the form of a dongle or a module and attached to and detached from the connector of the electronic device 100.
  • the broadcast receiving unit 230 receives a wired digital signal including a clock signal of a preset frequency (clock frequency) when including a wired communication unit, and receives a wireless digital signal of a preset frequency (carrier frequency) when including a wireless communication unit.
  • a preset frequency signal may be processed by passing through the filter unit.
  • the type of the input signal received by the broadcast receiver 230 is not limited, and for example, at least one of a wired digital signal, a wireless digital signal, and an analog signal may be received.
  • the broadcast receiver 230 may receive an input signal to which a preset frequency signal is added.
  • the display unit 240 includes a display panel capable of displaying an image on a screen.
  • the display panel is provided with a light-receiving structure such as a liquid crystal method or a self-luminous structure such as an OLED method.
  • the display unit 240 may additionally include an additional component according to the structure of the display panel.
  • the display panel is a liquid crystal type
  • the display unit 240 includes a liquid crystal display panel and a backlight unit that supplies light. And, it includes a panel driving substrate for driving the liquid crystal of the liquid crystal display panel.
  • the user input unit 250 includes various types of input interface related circuits provided to perform user input.
  • the user input unit 250 can be configured in various forms according to the type of the electronic device 100, for example, a mechanical or electronic button unit of the electronic device 100, a remote controller separated from the electronic device 100, There are a touch pad, a touch screen installed on the display unit 240, and the like.
  • the storage unit 260 stores digitized data.
  • the storage unit 260 loads storage of nonvolatile properties that can store data regardless of whether or not power is provided, and data to be processed by the processor 290 is loaded, and if power is not provided, data is stored. Includes memory of volatile properties that cannot be done. Storage includes flash-memory, hard-disc drive (HDD), solid-state drive (SSD) read-only memory (ROM), etc., and buffers and random access memory (RAM) in memory Etc.
  • HDD hard-disc drive
  • SSD solid-state drive
  • RAM random access memory
  • the microphone 270 collects sounds of an external environment including user speech.
  • the microphone 270 transmits the collected sound signal to the processor 290.
  • the electronic device 100 may include a microphone 270 that collects a user's voice, or may receive a voice signal from an external device such as a remote controller or a smart phone having a microphone.
  • a remote controller application may be installed in an external device to control the electronic device 100 or perform functions such as voice recognition.
  • the user's voice can be received, and the external device can transmit and receive data and control data using the electronic device 100 and Wi-Fi/BT or infrared light.
  • a plurality of communication units 210 that can be implemented may exist in the electronic device.
  • the speaker 280 outputs audio data processed by the processor 290 as sound.
  • the speaker 280 includes a unit speaker provided to correspond to audio data of one audio channel, and may include a plurality of unit speakers to respectively correspond to audio data of a plurality of audio channels.
  • the speaker 280 when the electronic device 100 serves as an external device of another device, the speaker 280 has a meaning of outputting a sound to inform another device of its location.
  • the speaker 280 may be provided separately from the electronic device 100, and in this case, the electronic device 100 may transmit audio data to the speaker 280 through the signal input/output unit 220. .
  • the processor 290 includes one or more hardware processors 290 implemented as CPUs, chipsets, buffers, circuits, etc. mounted on a printed circuit board, and may be implemented as a system on chip (SOC) depending on a design method. .
  • the processor 290 includes modules corresponding to various processes such as a demultiplexer, a decoder, a scaler, an audio digital signal processor (DSP), and an amplifier.
  • SOC system on chip
  • the processor 290 includes modules corresponding to various processes such as a demultiplexer, a decoder, a scaler, an audio digital signal processor (DSP), and an amplifier.
  • DSP audio digital signal processor
  • modules related to image processing such as a demultiplexer, decoder, and scaler may be implemented as an image processing SOC
  • an audio DSP may be implemented as an SOC and a separate chipset.
  • the processor 290 converts the voice signal acquired by the microphone 270 or the like into voice data, and processes the converted voice data. Thereafter, the processor 290 performs voice recognition based on the processed voice data, identifies a command indicated by the voice data, and performs an operation according to the identified command.
  • the voice data may be text data obtained through a speech-to-text (STT) process for converting a voice signal into text data.
  • STT speech-to-text
  • a server that is different from the STT server or a server that also serves as an STT server, or a server that processes data in the server may perform a specific function based on the information/data transmitted to the electronic device.
  • Both the voice data processing process and the command identification and execution process may be performed in the electronic device 100. However, in this case, since the system load and required storage capacity required for the electronic device 100 are relatively large, at least some of the processes are performed by at least one server that is communicatively connected to the electronic device 100 through a network. Can be done.
  • the processor 290 receives the spoken voice of the user 130 by the microphone 270 or the like.
  • the electronic device 100 of the present invention includes, in addition to the user's uttered voice, the sound from other external devices 110 and 120 installed around the electronic device 100, That is, noise can be received together.
  • the processor 290 removes these noises in the process of processing the received sound and controls to perform an operation corresponding to the user's spoken voice. The process of removing noise will be described in detail later.
  • the processor 290 may call and execute at least one command of software stored in a storage medium that can be read by a machine such as the electronic device 100. This enables a device such as the electronic device 100 to be operated to perform at least one function according to the called at least one command.
  • the one or more instructions may include code generated by a compiler or code executable by an interpreter.
  • a storage medium that can be read by a device may be provided in the form of a non-transitory storage medium.
  • a signal e.g., electromagnetic wave
  • the processor 290 receives the sound of another external device, that is, noise, along with the spoken voice of the user 130 by the microphone 270, and removes these noises from the entire received sound to improve the speech voice of the user.
  • another external device that is, noise
  • At least one of machine learning, neural network, or deep learning algorithm is used as rule-based or artificial intelligence algorithm for at least part of data analysis, processing, and generation of result information to perform the corresponding operation. You can do it.
  • the processor 290 may perform functions of a learning unit and a recognition unit together.
  • the learning unit may perform a function of generating a learned neural network network
  • the recognition unit may perform a function of recognizing (or inferring, predicting, estimating, or determining) data using the learned neural network network.
  • the learning unit can create or update a neural network network.
  • the learning unit may acquire training data to create a neural network.
  • the learning unit may acquire the learning data from the storage unit 260 or externally.
  • the training data may be data used for training a neural network network, and the neural network network may be trained by using the data obtained by performing the above-described operation as training data.
  • the learning unit may perform pre-processing on the acquired training data before training the neural network network using the training data, or may select data to be used for training from among a plurality of training data. For example, the learning unit may process the training data into a preset format, filter it, or add/remove noise to process the training data into a form suitable for learning. The learning unit may generate a neural network network configured to perform the above-described operation by using the preprocessed training data.
  • the learned neural network network may be composed of a plurality of neural network networks (or layers). Nodes of a plurality of neural network networks have weights, and the plurality of neural network networks may be connected to each other so that an output value of one neural network is used as an input value of another neural network.
  • Examples of neural network networks include CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN (Bidirectional Recurrent Deep Neural Network), and It may include models such as Deep Q-Networks.
  • the recognition unit may acquire target data to perform the above-described operation.
  • the target data may be obtained from the storage unit 260 or externally.
  • the target data may be data to be recognized by the neural network.
  • the recognition unit may perform pre-processing on the acquired target data or select data to be used for recognition from among a plurality of target data before applying the target data to the learned neural network network.
  • the recognition unit may process target data into a preset format, filter, or add/remove noise to process the target data into a form suitable for recognition.
  • the recognition unit may obtain an output value output from the neural network network by applying the preprocessed target data to the neural network network.
  • the recognition unit may acquire a probability value or a reliability value together with the output value.
  • control method of the electronic device 100 may be provided by being included in a computer program product.
  • Computer program products can be traded between sellers and buyers as commodities.
  • the computer program product is distributed in the form of a device-readable storage medium (e.g., CD-ROM), or through an application store (e.g., Play StoreTM) or between two user devices (e.g., smartphones). It can be distributed directly, online (eg, downloaded or uploaded).
  • online distribution at least a part of the computer program product may be temporarily stored or temporarily generated in a storage medium that can be read by a device such as a server of a manufacturer, a server of an application store, or a memory of a relay server.
  • the electronic device 100 shows a flow chart for identifying the locations of the external devices 110 and 120 in order to remove sound generated from the external devices 110 and 120 in order to more accurately recognize the user's utterance. do. Accordingly, the electronic device 100 may request the communication unit 210 to output a sound (first sound) informing the external device 110 of its location (S310).
  • the sound that announces your location means, for example, if the external device 110 is an AI speaker, "I am the AI speaker Galaxy Home.” It could be the sound of the back.
  • Such a sound may be a sentence constructed as a meaning to inform the user that an operation to identify a location between devices is being performed, but may be made of an inaudible frequency for the user's convenience, and is not limited to any one.
  • the request of operation S310 may be initiated based on a user's command input to the user input unit 250.
  • the user's command for identifying the location between devices may be performed, for example, by inputting a button of a remote controller or touching a display screen.
  • the external device 110 outputs the first sound
  • the electronic device 100 identifies the location of the external device 110 based on the direction in which the first sound is received (S320). A process by which the electronic device 100 identifies the location of the external device 110 will be described later.
  • the processor 290 stores information on the location of the identified external device 110 in the storage unit 260 (S330). The process according to an embodiment of the present invention can be applied equally to other external devices 120.
  • the electronic device 100 may have various times (hereinafter, also referred to as “external device location identification”) at which an operation to identify the location of the external device 110 is performed.
  • the storage unit 260 of the electronic device 100 may store information on a time point for performing location identification of an external device.
  • the processor 290 may determine a time point for performing external device location identification by referring to information stored in the storage unit 260 (S410).
  • the information stored in the storage unit 260 of the present embodiment may represent the following time point as a time point for performing the location identification of the external device.
  • the electronic device 100 performs initial setup of voice recognition during the initial installation process.
  • the user connects the power of the electronic device 100, connects it to the home network, and repeatedly reads the previously set sentences aloud to make initial settings for using the voice recognition function of the electronic device 100.
  • the location of the external device can be identified.
  • the time point indicated by the information stored in the storage unit 260 may indicate a case in which either the electronic device 100 or the external device 110 is not connected to the Internet for a long time, or the power has been turned off. .
  • the electronic device 100 or the external device 110 is not connected or powered off for a long time, it can be predicted that their location has changed, and thus the location of the external device can be newly identified.
  • a new device is connected to a network, it may be detected and the location of the new external device may be identified.
  • the electronic device 100 enters operation S310 in FIG. 3, and the same process as described above. Run.
  • the utilization is high.
  • the storage unit 260 of the electronic device 100 stores a predefined characteristic of the received first sound.
  • the characteristic of the first sound may be a waveform of the sound such as amplitude, frequency, and period of the first sound.
  • the characteristic of the first sound is identification information of the external device 110 such as the name and manufacturer of the external device 110, or a list of utterances included in the first sound output from the external device 110. It can be information about.
  • the processor 290 identifies whether the received sound corresponds to a predefined characteristic of the first sound stored in the storage unit 260. (S530). If the received sound is identified as the first sound corresponding to a predefined characteristic (S540), the processor 290 may then perform an operation of identifying the location of the external device 110 generating the first sound. have. In another embodiment, the processor 290 transmits the first sound having the characteristics of the first sound to the external device 110 through the communication unit 210 based on the characteristics of the first sound stored in the storage unit 260. You can ask to print.
  • a utterance list may be stored as one of the predefined characteristics described in FIG. 5.
  • the utterance list is a list consisting of sentences for the external device 110 to inform the electronic device 100 of its location, and the electronic device 100 listens to the sound of the external device 110 and the location of the external device 110 Can be identified.
  • the utterance list may be used as a phrase to inform the user that an operation of identifying a location between devices is being performed. As illustrated in FIG. 4, even though information on the time point for performing the location identification of the external device is stored, the user may not be aware of this, and thus the utterance list composed of sentences may be stored.
  • the electronic device 100 may listen to the exemplified sound of the external device 110 and identify the location of the electronic device in consideration of amplitude, frequency, and period.
  • FIG. 7 is a diagram illustrating communication between a server and a device according to an embodiment of the present invention
  • FIG. 8 is a diagram illustrating an operation flowchart of an electronic device according to an embodiment of the present invention.
  • the processor 290 receives information on the characteristics of the external device 110 from another device such as the server 710 through the communication unit 210 (S810), and stores the received information in the storage unit 260 ) Can be saved.
  • information on the characteristics may be received through a server or the like.
  • the electronic device 100 may more easily identify the location of the external device 110 based on the received and stored information (S820).
  • FIG. 9 is a diagram illustrating a state of identifying a location of an external device according to an embodiment of the present invention.
  • the location of the sound source can be identified through a difference in time when the sound reaches a specific region.
  • the present embodiment when sound is generated from the external device 110, there is a difference in time for the generated sound to reach any two points A and B of the electronic device, respectively. At this time, the distance from the external device to A and B can be known using the speed of the sound and the time it takes to reach each point.
  • the electronic device 100 includes a plurality of microphones 270 disposed to be spaced apart from each other, such as A and B, and the external device 110 through a distance r and an angle ⁇ from the external device 110 Location can be identified.
  • the method of identifying the location of the external device 110 illustrated in FIG. 9 is only an example, and methods of identifying the location of the external device 110 may be various according to the present disclosure.
  • the processor 290 may identify the location of the external device 110 and store information on the location of the identified external device 110 in the storage unit 260 in the form of a table 1010. have.
  • the processor 290 may map and store information about the location of the external device 110 by mapping the name, distance, direction, and connection status of the external device 110. For example, in the case of the external device 1, the distance to the electronic device is r1, and is located at an azimuth angle ⁇ 1 with respect to the reference direction of the electronic device 100. In the case of the external device 2, the distance from the electronic device 100 is r2, and is located at an azimuth angle ⁇ 2.
  • the location of the external device 110 of the present embodiment is indicated by an azimuth with respect to the reference direction of the electronic device 100, but this is only an example, and information indicating the location of the external device 110 according to the present disclosure is various can do.
  • information about whether the external device 110 is connected to a network or power source may also be stored.
  • Other information such as the name of the external device 110 may be received from the external device 110 through the communication unit 210.
  • FIG. 11 is a diagram illustrating a situation in which an electronic device processes a received sound according to an embodiment of the present invention
  • FIG. 12 is a flowchart of an operation performed by the electronic device according to the present embodiment. It shows the noise reduction block in which the device processes sound.
  • the electronic device 100 receives a sound (hereinafter, also referred to as “second sound”) through a microphone 270 (S1210).
  • the electronic device 100 includes a second sound (S1) of the user 130 and the sound (S2, S3) of the external devices 110 and 120 combined. S) is obtained from the microphone 270.
  • the processor 290 of the electronic device 100 determines a noise component corresponding to the sound received from the location of the external devices 110 and 120 in the received second sound S signal. Remove (S1220). At this time, based on the information on the location of the external devices 110 and 120 identified by the electronic device 100 (see 1010 of FIG. 10), the processor 290 of the electronic device 100 The location of the external devices 110 and 120 in which the noise components S2 and S3 included in the signal of the second sound S are generated may be determined. Accordingly, the processor 290 may separate and remove the noise components S2 and S3 of the external devices 110 and 120 from the obtained signal of the second sound S.
  • the processor 290 of the electronic device 100 is A removal block 1310 may be included.
  • the noise removal block 1310 may be implemented by a combination of hardware and/or software.
  • the noise removal block 1310 of the processor 290 separates the noise components (S2, S3) of the external devices 110 and 120 from the signal of the second sound (S) using beamforming technology to (S1) can be extracted.
  • the noise removal block 1310 divides the signal of the second sound (S) into a certain frequency range using a local Fourier transform in the frequency domain, and then removes the overlapping frequency range among signals coming from different directions.
  • the processor 290 refers to the table 1010 as illustrated in FIG. 10 to check whether external devices 110 and 120 that may generate noise exist. For example, in the table 1010, the processor 290 confirms that external devices 1 and 2 (110, 120) to which a network and power are connected exist. Subsequently, as shown in FIG. 13, the processor 290 uses the location information ( ⁇ 1, ⁇ 2) of the external devices 1 and 2 (110, 120), and the external device 110 among the signals of the second sound (S). By removing the frequency range corresponding to the noise components S2 and S3 of 120), the user's speech voice S1 may be extracted from the signal of the second sound S. Finally, referring to FIG. 12 again, the processor 290 recognizes the user's speech speech based on the signal S1 from which the noise components S2 and S3 have been removed (S1230).
  • the electronic device 100 since the electronic device 100 identifies the presence and location of the external device and classifies the sound generated from the direction in which the external device is present as noise, the obtained second sound S After discriminating and removing the signal that becomes noise, the user's speech voice (S1) can be obtained. That is, according to an embodiment, by discriminating which of the acquired signals of the second sound (S) is the user's speech voice or the sound generated by an external device, a loud sound is generated simply by using the difference in the loudness of the sound.
  • the accuracy in recognizing the user's spoken voice can be improved, and the location of the external device is identified in advance so that the voice processing can be performed.
  • the speed is fast.
  • FIG. 14 is a diagram illustrating an entire system according to an embodiment of the present invention
  • FIG. 15 is a diagram illustrating an operation flowchart of an electronic device according to the corresponding system.
  • the electronic device 100 received sound generated from the external devices 110 and 120 and identified their locations, but in this drawing, not only the electronic device 100 but also the external devices 110 and 120 They also identify the locations of the rest of the devices in their respective locations so that they can see all of their locations. When this occurs at the time when the external device location identification described in FIG. 4 is performed, the locations of each device can be identified.
  • the processor 290 of the electronic device 100 may receive a list of external devices connected to the network through the communication unit 210 and store the list in the storage unit 260.
  • the processor may identify and store the location of the external device existing in the list (S1520). After completing this process, the processor 290 determines whether the locations of all external devices existing in the list have been identified by referring to a list of external devices connected to the previously stored network (S1530). If the locations of all external devices in the list are identified (Yes in S1530), the processor 290 ends the operation. If there is an external device that has not identified the location (No in S1530), the processor 290 performs a process of identifying the location of the external device that has not been identified again. When such a process is completed for the electronic device, it is similarly performed for each external device. Accordingly, all devices existing in a limited space and connected to the network can identify the location of external devices other than themselves, and the present invention can be applied.
  • the communication unit of the electronic device 100 is a Bluetooth module that connects the devices.
  • a control command is transmitted to the communication unit of the external device to adjust the volume of the external devices 110 and 120 using (1610).
  • the controller 1620 of the external device adjusts the volume of the external device accordingly.
  • the volume of the other device is automatically restored to its original state.
  • a Bluetooth module it can be easily replaced with Wi-Fi.
  • the present embodiment is applicable not only to the speech obtained through the voice preprocessing process as shown in FIG. 13, but also to a case where it is confirmed that the electronic device does not have an influence of noise from an external device, and is not limited to any one.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Acoustics & Sound (AREA)
  • Computational Linguistics (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Signal Processing (AREA)
  • General Engineering & Computer Science (AREA)
  • Quality & Reliability (AREA)
  • Radar, Positioning & Navigation (AREA)
  • Remote Sensing (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Chemical & Material Sciences (AREA)
  • Analytical Chemistry (AREA)
  • Biochemistry (AREA)
  • Immunology (AREA)
  • Pathology (AREA)
  • Telephone Function (AREA)

Abstract

An electronic device according to one embodiment of the present invention may comprise: a microphone; a storage unit; a communication unit that communicates with an external device; and a processor that requests to output a first sound to the external device through the communication unit, identifies the location of the external device on the basis of the direction in which the first sound is received by the microphone, stores, in the storage unit, information on the identified location of the external device, removes, on the basis of the stored information, a noise component, corresponding to a sound received from the location of the external device, from a signal of a second sound received by the microphone, and recognizes user utterances on the basis of the signal from which the noise component is removed.

Description

전자장치 및 그 제어방법Electronic device and its control method
본 발명은 전자장치 및 그 제어방법에 관한 것으로서, 보다 상세하게는 주변 환경의 노이즈를 제거하여 음성인식기능을 수행하는 전자장치 및 그 제어방법에 관한 것이다.The present invention relates to an electronic device and a control method thereof, and more particularly, to an electronic device that performs a voice recognition function by removing noise from a surrounding environment, and a control method thereof.
최근 음성인식 기술의 발달로 인해 대부분의 전자장치에 음성인식 기술이 탑재되어 장치간 상호작용이 용이해졌다. 따라서 동일한 공간에서 사용하는 다수의 전자장치는 음성인식 시, 각자 발생하는 오디오 신호에 의한 간섭을 받게 된다. 이 때, 사용자가 소음이 섞인 환경에서 발화하는 경우, 음성인식 가능한 전자장치는 사용자의 발화를 추출하기 위해 빔포밍(Beamforming) 기술을 이용한다. 빔포밍은 특정한 방향으로부터의 오디오 신호를 추출하고 나머지 방향으로부터의 오디오 성분을 제거하여, 공간 필터를 만드는 방식으로 작동한다. 입력되는 전체 오디오 신호에서 사용자 발화 방향에서 전달되는 신호를 추출하고 나머지 방향의 신호를 필터링 함으로써 사용자 발화만이 음성인식 시스템을 통과할 수 있도록 한다.Due to the recent development of voice recognition technology, most electronic devices are equipped with voice recognition technology to facilitate interaction between devices. Accordingly, a plurality of electronic devices used in the same space are subject to interference by audio signals each generated during speech recognition. In this case, when the user speaks in a noisy environment, the electronic device capable of recognizing voice uses a beamforming technique to extract the user's speech. Beamforming works by extracting the audio signal from a specific direction and removing the audio component from the other direction, creating a spatial filter. By extracting the signal transmitted in the user's speech direction from the entire input audio signal and filtering the signal in the other direction, only the user's speech can pass through the speech recognition system.
다만, 현재 음성인식에 사용되고 있는 빔포밍 기술은 이상적인 상황에서는 잘 적용이 되나, 한정된 공간에서 다수의 전자장치를 사용하면 각 장치는 다른 장치에서 발생하는 신호에 대한 사전정보가 없기 때문에 사용자 발화 시 음성인식 에러 발생률이 증가한다. However, the beamforming technology currently used for speech recognition is well applied in an ideal situation. However, if multiple electronic devices are used in a limited space, each device does not have prior information about signals generated by other devices. The incidence of recognition errors increases.
이 때, 사용자는 사용을 원하지 않는 나머지 장치의 출력수준을 모두 낮추거나 발화의 목소리를 높여 SNR (Signal-to-Noise Ratio) 을 장치가 인식 가능한 기준 이상으로 맞춰야 한다. 이는 단기적으로는 사용자의 피로도를 증가시키며, 장기적으로는 음성인식 기능의 사용을 감소시킨다. At this time, the user must lower all the output levels of the other devices that they do not want to use, or raise the voice of the utterance to set the Signal-to-Noise Ratio (SNR) above the standard that the device can recognize. This increases the user's fatigue in the short term and reduces the use of the voice recognition function in the long term.
본 발명의 목적은 소정의 환경에서 전자장치가 획득한 소리에 포함된 주변 환경의 노이즈를 제거하여 사용자 발화 음성 인식의 정확도를 높이는 것이다.An object of the present invention is to improve the accuracy of user speech recognition by removing noise from surrounding environments included in sound acquired by an electronic device in a predetermined environment.
본 발명의 일 실시예에 따른 전자장치에 있어서, 마이크로폰; 저장부; 외부기기와 통신하는 통신부; 및 상기 통신부를 통하여 상기 외부기기에 제1소리를 출력하도록 요청하고, 상기 마이크로폰에 상기 제1소리가 수신되는 방향에 기초하여 상기 외부기기의 위치를 식별하고, 상기 식별된 외부기기의 위치에 관한 정보를 상기 저장부에 저장하고, 상기 저장된 정보에 기초하여, 상기 마이크로폰에 수신되는 제2소리의 신호에서 상기 외부기기의 위치로부터 수신되는 소리에 대응하는 노이즈 성분을 제거하고, 상기 노이즈 성분이 제거된 신호에 기초하여 사용자 발화를 인식하는 프로세서를 포함할 수 있다.In an electronic device according to an embodiment of the present invention, there is provided a microphone; Storage; A communication unit that communicates with an external device; And a request to output a first sound to the external device through the communication unit, identify the location of the external device based on the direction in which the first sound is received by the microphone, and Information is stored in the storage unit, and based on the stored information, a noise component corresponding to the sound received from the location of the external device is removed from the signal of the second sound received by the microphone, and the noise component is removed. It may include a processor that recognizes the user's speech based on the generated signal.
상기 프로세서는, 상기 제1소리에 대하여 미리 정의된 특성에 기초하여 상기 제1소리가 수신되는지 여부를 식별할 수 있다.The processor may identify whether the first sound is received based on a characteristic predefined for the first sound.
본 발명의 일 실시예에 따른 전자장치는 저장부를 더 포함하고, 상기 프로세서는, 상기 마이크로폰에 상기 제1소리가 수신되는 방향에 기초하여 상기 외부기기의 위치를 식별하고, 상기 식별된 외부기기의 위치에 관한 정보를 상기 저장부에 저장할 수 있다.The electronic device according to an embodiment of the present invention further includes a storage unit, wherein the processor identifies a location of the external device based on a direction in which the first sound is received by the microphone, and Information about the location may be stored in the storage unit.
상기 특성은, 상기 외부기기의 위치 식별 동작에 관한 안내 관련 정보를 포함할 수 있다.The characteristic may include information related to guidance on a location identification operation of the external device.
상기 특성은, 비가청 주파수 대역을 포함할 수 있다.The characteristic may include an inaudible frequency band.
상기 프로세서는, 상기 외부기기에 상기 특성을 가지는 제1소리를 출력하도록 요청할 수 있다.The processor may request the external device to output the first sound having the characteristic.
상기 프로세서는, 상기 통신부를 통해 서버로부터 상기 특성에 관한 정보를 수신하고, 상기 수신된 정보를 상기 저장부에 저장할 수 있다.The processor may receive information on the characteristic from a server through the communication unit and store the received information in the storage unit.
본 발명의 일 실시예에 따른 전자장치는 사용자입력부를 더 포함하고, 상기 프로세서는, 상기 사용자입력부에 입력된 사용자의 명령에 기초하여, 상기 외부기기의 위치를 식별할 수 있다.The electronic device according to an embodiment of the present invention further includes a user input unit, and the processor may identify the location of the external device based on a user command input to the user input unit.
상기 저장부는 상기 외부기기의 위치 식별을 실행하는 시점에 관한 정보를 저장하고, 상기 프로세서는, 상기 저장된 정보에 기초하여 상기 실행하는 시점에 상기 외부기기의 위치를 식별할 수 있다.The storage unit stores information on a time point at which the location identification of the external device is performed, and the processor may identify the location of the external device at the time point at which the external device is executed based on the stored information.
상기 프로세서는, 상기 통신부를 통해 서버로부터 상기 외부기기의 정보를 수신하고, 상기 수신된 정보에 기초하여, 상기 외부기기의 위치를 식별할 수 있다.The processor may receive information on the external device from a server through the communication unit and identify the location of the external device based on the received information.
본 발명의 일 실시예에 따른 전자장치는 스피커를 더 포함하고, 상기 프로세서는, 상기 통신부를 통하여 상기 외부기기로부터 상기 전자장치의 위치 식별을 위한 제3소리의 출력 요청을 수신하고, 상기 스피커가 상기 제3소리를 출력할 수 있다.The electronic device according to an embodiment of the present invention further includes a speaker, wherein the processor receives a request for outputting a third sound for identifying the location of the electronic device from the external device through the communication unit, and the speaker The third sound may be output.
본 발명의 일 실시예에 따른 전자장치의 제어방법에 있어서, 통신부를 통해 외부기기와 통신하여 상기 외부기기에 제1소리를 출력하도록 요청하는 단계; 마이크로폰에 상기 제1소리가 수신되는 방향에 기초하여 상기 외부기기의 위치를 식별하는 단계; 상기 식별된 외부기기의 위치에 관한 정보를 저장부에 저장하는 단계; 상기 저장된 정보에 기초하여, 상기 마이크로폰에 수신되는 제2소리의 신호에서 상기 외부기기의 위치로부터 수신되는 소리에 대응하는 노이즈 성분을 제거하는 단계; 및 상기 노이즈 성분이 제거된 신호에 기초하여 사용자 발화를 인식하는 단계를 포함할 수 있다.A method for controlling an electronic device according to an embodiment of the present invention, comprising: requesting to output a first sound to the external device by communicating with an external device through a communication unit; Identifying a location of the external device based on a direction in which the first sound is received by a microphone; Storing information on the location of the identified external device in a storage unit; Removing a noise component corresponding to the sound received from the location of the external device from the signal of the second sound received by the microphone based on the stored information; And recognizing the user's speech based on the signal from which the noise component has been removed.
상기 제1소리에 대하여 미리 정의된 특성에 관한 정보를 저장하는 단계; 및 상기 미리 정의된 특성에 기초하여 상기 제1소리가 수신되는지 여부를 식별하는 단계를 포함할 수 있다.Storing information on a predefined characteristic of the first sound; And identifying whether the first sound is received based on the predefined characteristic.
상기 마이크로폰에 상기 제1소리가 수신되는 방향에 기초하여 상기 외부기기의 위치를 식별하는 단계; 상기 식별된 외부기기의 위치에 관한 정보를 저장부에 저장하는 단계를 포함할 수 있다.Identifying a location of the external device based on a direction in which the first sound is received by the microphone; It may include storing information on the location of the identified external device in a storage unit.
상기 외부기기에 상기 특성을 가지는 제1소리를 출력하도록 요청하는 단계를 포함할 수 있다.It may include the step of requesting the external device to output the first sound having the characteristic.
상기 통신부를 통해 서버로부터 상기 특성에 관한 정보를 수신하는 단계; 상기 수신된 정보를 상기 저장부에 저장하는 단계를 포함할 수 있다.Receiving information on the characteristic from a server through the communication unit; And storing the received information in the storage unit.
사용자입력부에 입력된 사용자의 명령에 기초하여, 상기 외부기기의 위치를 식별하는 단계를 포함할 수 있다.It may include the step of identifying the location of the external device based on the user's command input to the user input unit.
상기 외부기기의 위치 식별을 실행하는 시점에 관한 정보를 저장하는 단계; 상기 저장된 정보에 기초하여 상기 실행하는 시점에 상기 외부기기의 위치를 식별하는 단계를 포함할 수 있다.Storing information on a time point at which the location identification of the external device is performed; It may include the step of identifying the location of the external device at the time of execution based on the stored information.
상기 통신부를 통해 서버로부터 상기 외부기기의 정보를 수신하는 단계; 상기 수신된 정보에 기초하여, 상기 외부기기의 위치를 식별하는 단계를 포함할 수 있다.Receiving information of the external device from a server through the communication unit; It may include the step of identifying the location of the external device based on the received information.
상기 통신부를 통하여 상기 외부기기로부터 상기 전자장치의 위치 식별을 위한 제3소리의 출력 요청을 수신하는 단계; 스피커가 상기 제3소리를 출력하도록 제어하는 단계를 포함할 수 있다.Receiving a request for outputting a third sound for identifying the location of the electronic device from the external device through the communication unit; It may include controlling the speaker to output the third sound.
본 발명은 다수의 전자장치가 동시에 사용중인 경우에도, 사용자의 발화에 대한 음성인식의 정확도를 높일 수 있다.According to the present invention, even when a plurality of electronic devices are being used at the same time, the accuracy of speech recognition for a user's speech can be improved.
또한, 음성인식을 위해 다른 전자장치의 볼륨을 낮추고 다시 높이는 등의 번거로운 과정을 피할 수 있어 효율적이다.In addition, it is efficient because it can avoid cumbersome processes such as lowering and raising the volume of other electronic devices for voice recognition.
도 1은 본 발명의 일 실시예에 의한 전체 시스템을 도시한 도면이다.1 is a diagram showing an entire system according to an embodiment of the present invention.
도 2는 본 발명의 일 실시예에 의한 전자장치의 구성을 도시한 블록도이다.2 is a block diagram showing the configuration of an electronic device according to an embodiment of the present invention.
도 3은 본 발명의 일 실시예에 따른 전자장치의 동작 흐름도를 도시한 도면이다.3 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
도 4는 본 발명의 일 실시예에 따른 전자장치의 동작 흐름도를 도시한 도면이다.4 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
도 5는 본 발명의 일 실시예에 따른 전자장치의 동작 흐름도를 도시한 도면이다.5 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
도 6은 본 발명의 일 실시예에 따른 발화리스트를 도시한 도면이다.6 is a diagram showing a utterance list according to an embodiment of the present invention.
도 7은 본 발명의 일 실시예에 따른 서버와 장치간 통신하는 것을 도시한 도면이다.7 is a diagram illustrating communication between a server and a device according to an embodiment of the present invention.
도 8은 본 발명의 일 실시예에 따른 전자장치의 동작 흐름도를 도시한 도면이다.8 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
도 9는 본 발명의 일 실시예에 따른 외부기기의 위치를 식별하는 모습을 도시한 도면이다.9 is a diagram illustrating a state of identifying a location of an external device according to an embodiment of the present invention.
도 10은 본 발명의 일 실시예에 따른 외부기기에 관한 정보를 도시한 도면이다.10 is a diagram showing information on an external device according to an embodiment of the present invention.
도 11은 본 발명의 일 실시예에 따른 전자장치가 수신되는 소리를 처리하는 상황을 도시한 도면이다.11 is a diagram illustrating a situation in which an electronic device processes a received sound according to an embodiment of the present invention.
도 12는 본 실시예의 전자장치가 수행하는 동작의 흐름도를 도시한 도면이다.12 is a diagram showing a flowchart of an operation performed by the electronic device of the present embodiment.
도 13은 본 실시예의 전자장치가 소리를 처리하는 노이즈 제거 블록을 도시한 도면이다.13 is a diagram illustrating a noise removal block for processing sound by the electronic device of the present embodiment.
도 14는 본 발명의 일 실시예에 따른 전체 시스템을 도시한 도면이다.14 is a diagram illustrating an entire system according to an embodiment of the present invention.
도 15는 본 발명의 일 실시예에 따른 전자장치의 동작 흐름도를 도시한 도면이다.15 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention.
도 16은 본 발명의 일 실시예에 따른 음성처리 후 시스템을 도시한 도면이다.16 is a diagram illustrating a system after speech processing according to an embodiment of the present invention.
이하에서는 첨부 도면을 참조하여 본 발명의 실시예들을 상세히 설명한다. 도면에서 동일한 참조번호 또는 부호는 실질적으로 동일한 기능을 수행하는 구성요소를 지칭하며, 도면에서 각 구성요소의 크기는 설명의 명료성과 편의를 위해 과장되어 있을 수 있다. 다만, 본 발명의 기술적 사상과 그 핵심 구성 및 작용이 이하의 실시예에 설명된 구성 또는 작용으로만 한정되지는 않는다. 본 발명을 설명함에 있어서 본 발명과 관련된 공지 기술 또는 구성에 대한 구체적인 설명이 본 발명의 요지를 불필요하게 흐릴 수 있다고 판단되는 경우에는 그 상세한 설명을 생략하기로 한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. In the drawings, the same reference numbers or reference numerals refer to components that perform substantially the same function, and the size of each component in the drawings may be exaggerated for clarity and convenience of description. However, the technical idea of the present invention and its core configuration and operation are not limited to the configuration or operation described in the following embodiments. In describing the present invention, when it is determined that a detailed description of a known technology or configuration related to the present invention may unnecessarily obscure the subject matter of the present invention, a detailed description thereof will be omitted.
본 발명의 실시예에서, 제1, 제2 등과 같이 서수를 포함하는 용어는 하나의 구성요소를 다른 구성요소로부터 구별하는 목적으로만 사용되며, 단수의 표현은 문맥상 명백하게 다르게 뜻하지 않는 한, 복수의 표현을 포함한다. 또한, 본 발명의 실시예에서, '구성되다', '포함하다', '가지다' 등의 용어는 하나 또는 그 이상의 다른 특징들이나 숫자, 단계, 동작, 구성요소, 부품 또는 이들을 조합한 것들의 존재 또는 부가 가능성을 미리 배제하지 않는 것으로 이해되어야 한다. 또한, 본 발명의 실시예에서, '모듈' 혹은 '부'는 적어도 하나의 기능이나 동작을 수행하며, 하드웨어 또는 소프트웨어로 구현되거나 하드웨어와 소프트웨어의 결합으로 구현될 수 있으며, 적어도 하나의 모듈로 일체화되어 구현될 수 있다. 또한, 본 발명의 실시예에서, 복수의 요소 중 적어도 하나(at least one)는, 복수의 요소 전부뿐만 아니라, 복수의 요소 중 나머지를 배제한 각 하나 혹은 이들의 조합 모두를 지칭한다.In an embodiment of the present invention, terms including ordinal numbers such as first and second are used only for the purpose of distinguishing one component from other components, and the expression of the singular number is plural unless the context clearly indicates otherwise. Includes the expression of. In addition, in the embodiment of the present invention, terms such as'consist of','include','have', etc. are used for the presence of one or more other features, numbers, steps, actions, components, parts, or combinations thereof. Or it should be understood that the possibility of addition is not precluded. In addition, in the embodiment of the present invention, the'module' or'unit' performs at least one function or operation, and may be implemented as hardware or software, or as a combination of hardware and software, and is integrated into at least one module. Can be implemented. In addition, in an embodiment of the present invention, at least one of the plurality of elements refers not only to all of the plurality of elements, but also to each one or a combination thereof excluding the rest of the plurality of elements.
도 1은 본 발명의 일 실시예에 의한 전체 시스템을 도시한 도면이다. 도 1에 도시된 바와 같이, 전자장치들(100, 110, 120)은 영상을 표시할 수 있는 디스플레이장치로 구현되거나, 디스플레이를 구비하지 않는 장치로 구현될 수 있다. 일 예로, 전자장치들(100, 110, 120)은 TV, AI어시스턴스기기(AI스피커 등), 컴퓨터, 스마트 폰, 태블릿, 휴대용 미디어 플레이어, 웨어러블 디바이스, 비디오 월, 전자액자 등을 포함할 수 있다. 또한, 전자장치들(100, 110, 120)은 디스플레이를 구비하지 않는 셋탑박스 등의 영상처리장치, 냉장고, 블루투스 스피커, 세탁기 등의 생활가전, 컴퓨터본체와 같은 정보처리장치 등 다양한 종류의 장치로 구현될 수 있다. 이하에서는 설명의 편의를 위해 TV로 구현되는 경우는 전자장치(100), 스피커와 냉장고로 구현되는 경우는 각각 외부기기(110, 120)들로 가정하여 설명하나, 본 발명의 전자장치와 외부기기들은 이에 한정되지 않으며, 어느 하나의 외부기기와 전자장치의 역할을 바꾸어도 본 발명은 성립된다.1 is a diagram showing an entire system according to an embodiment of the present invention. As illustrated in FIG. 1, the electronic devices 100, 110, and 120 may be implemented as a display device capable of displaying an image, or may be implemented as a device without a display. For example, the electronic devices 100, 110, 120 may include a TV, an AI assistance device (AI speaker, etc.), a computer, a smart phone, a tablet, a portable media player, a wearable device, a video wall, an electronic frame, and the like. have. In addition, the electronic devices 100, 110, and 120 are various types of devices such as image processing devices such as set-top boxes that do not have a display, household appliances such as refrigerators, Bluetooth speakers, washing machines, and information processing devices such as a computer body. Can be implemented. Hereinafter, for convenience of explanation, it is assumed that the electronic device 100 is implemented as a TV, and the external devices 110 and 120 are respectively implemented as a speaker and a refrigerator. However, the electronic device and the external device of the present invention They are not limited thereto, and the present invention is established even if the roles of any one external device and an electronic device are changed.
본 발명의 일 실시예에 따르면, 도 1에 도시된 바와 같이 사용 공간에 전자장치(100)와 복수의 외부기기(110, 120)가 놓여있다. 이 때, 사용자(130)가 전자장치(100)의 음성인식 기능을 사용하려고 할 때, 사용자(130)의 발화음성과 외부기기(110, 120)로부터 나오는 소리가 섞이게 될 수 있다. 그렇다면, 전자장치(100)는 획득한 소리를 처리할 때, 어느 소리의 신호가 사용자의 발화에 의한 신호인지 구별하기 어렵게 된다. 따라서 본 발명은, 사용자(130)가 전자장치(100)의 음성인식 기능을 사용하기 위해 발화한 경우, 전자장치(100)에 입력되는 사용자의 발화에 의한 신호를 획득하기 위해 외부기기(110, 120)의 위치를 식별한다. 그리고 전자장치(100)에 입력된 소리신호 중 식별된 위치로부터 오는 소리신호를 제거한다. 이 경우, 음성인식이 요구되는 전자장치(100)외의 나머지 장치들(110, 120)로부터 발생하는 소리신호를 제거할 수 있으므로, 전자장치(100)는 사용자(130)의 발화 음성만을 인식하여 보다 정확한 음성인식이 가능하다.According to an embodiment of the present invention, as shown in FIG. 1, the electronic device 100 and a plurality of external devices 110 and 120 are placed in a use space. In this case, when the user 130 attempts to use the voice recognition function of the electronic device 100, the spoken voice of the user 130 and the sound from the external devices 110 and 120 may be mixed. If so, when the electronic device 100 processes the acquired sound, it becomes difficult to distinguish which sound signal is a signal caused by the user's utterance. Accordingly, in the present invention, when the user 130 utters to use the voice recognition function of the electronic device 100, the external device 110, in order to obtain a signal by the user's utterance input to the electronic device 100, 120). And, among the sound signals input to the electronic device 100, a sound signal coming from the identified position is removed. In this case, since the sound signal generated from the other devices 110 and 120 other than the electronic device 100 requiring voice recognition can be removed, the electronic device 100 recognizes only the spoken voice of the user 130 Accurate voice recognition is possible.
도 2는 본 발명의 일 실시예에 의한 전자장치의 구성을 도시한 블록도이다.도 2에 도시된 바와 같이, 전자장치(100)는 통신부(210), 신호입출력부(220), 방송수신부(230), 디스플레이부(240), 사용자입력부(250), 저장부(260), 마이크로폰(270), 스피커(280), 프로세서(290)를 포함할 수 있다. 도 2에 도시된 전자장치(100)는, 통신부(210), 신호입출력부(220), 방송수신부(230) 등이 각각 별도로 구현되는 예를 도시하나, 이는 하나의 예시에 불과하며, 경우에 따라서는, 예컨대, 방송수신부(230)가 통신부(210) 혹은 신호입출력부(220)에 포함되어 구현될 수도 있다. 또한, 전자장치(100)는 도 2에 도시된 모든 구성을 포함한 구현도 가능하나, 다른 예로서, 이들 중 어느 하나 이상의 구성인 생략된 구현도 가능하다. 예컨대, 네트워크 기능이 없는 장치의 예로서, 통신부(210)가 없는 구현 등도 가능하다. 보다 구체적인 구성에 대해서는 아래에서 상술한다.2 is a block diagram showing the configuration of an electronic device according to an embodiment of the present invention. As shown in FIG. 2, the electronic device 100 includes a communication unit 210, a signal input/output unit 220, and a broadcast receiving unit. 230, a display unit 240, a user input unit 250, a storage unit 260, a microphone 270, a speaker 280, and a processor 290. The electronic device 100 shown in FIG. 2 shows an example in which the communication unit 210, the signal input/output unit 220, the broadcast reception unit 230, and the like are implemented separately, but this is only an example, and in case Accordingly, for example, the broadcast receiving unit 230 may be included in the communication unit 210 or the signal input/output unit 220 to be implemented. In addition, the electronic device 100 may be implemented including all the configurations shown in FIG. 2, but as another example, an omitted implementation of any one or more of them may be possible. For example, as an example of a device without a network function, an implementation without the communication unit 210 is also possible. A more specific configuration will be described in detail below.
이하, 전자장치(100)의 구성에 관해 설명한다. 본 실시예에서는 전자장치(100)가 TV인 경우에 관해 설명하지만, 전자장치(100)는 다양한 종류의 장치로 구현될 수 있으므로, 본 실시예가 전자장치(100)의 구성을 한정하는 것은 아니다. 전자장치(100)가 디스플레이장치로 구현되지 않는 경우도 가능하며, 이 경우의 전자장치(100)는 디스플레이부(240)와 같은 영상 표시를 위한 구성요소들을 포함하지 않을 수 있다. 예를 들면 전자장치(100)가 셋탑박스로 구현되는 경우에, 전자장치(100)는 신호입출력부(220)를 통해 외부의 TV 등과 같은 디스플레이장치에 영상신호 등을 출력할 수 있다.Hereinafter, the configuration of the electronic device 100 will be described. In the present embodiment, a case where the electronic device 100 is a TV is described, but since the electronic device 100 may be implemented as various types of devices, the present embodiment does not limit the configuration of the electronic device 100. It is also possible that the electronic device 100 is not implemented as a display device, and in this case, the electronic device 100 may not include components for image display such as the display unit 240. For example, when the electronic device 100 is implemented as a set-top box, the electronic device 100 may output an image signal or the like to a display device such as an external TV through the signal input/output unit 220.
통신부(210)는 다양한 종류의 유선 및 무선 통신 프로토콜에 대응하는 통신모듈, 통신칩 등의 구성요소들 중 적어도 하나 이상을 포함하는 양방향 통신회로이다. 예를 들면, 통신부(210)는 이더넷으로 라우터 또는 게이트웨이에 유선 접속된 랜카드나, 와이파이(Wi-Fi) 방식에 따라서 AP와 무선통신을 수행하는 무선통신모듈이나, 블루투스 등과 같은 1대 1 다이렉트 무선통신을 수행하는 무선통신모듈 등으로 구현될 수 있다. 통신부(210)는 네트워크 상의 서버와 통신함으로써, 서버와의 사이에 데이터 패킷을 송수신할 수 있다. 다른 실시예로서, 통신부(210)는 서버 외의 다른 외부기기(110, 120)와 연결될 수 있으며, 다른 외부기기로부터 비디오/오디오 데이터를 비롯한 각종 데이터를 수신하거나, 혹은 다른 외부기기로 비디오/오디오 데이터를 비롯한 각종 데이터를 전송할 수 있다. 전자장치(100)에 구비된 마이크로폰(270)으로 음성이나 소리를 수신하는 경우, 통신부(210)는 아날로그 형태의 음성신호(혹은 소리신호)를 디지털화하여 프로세서(290)로 전송하고, 외부기기로부터 음성신호를 수신하는 경우, 아날로그 형태의 음성신호를 디지털화 하여 블루투스나 Wi-Fi 등 데이터 전송 통신을 이용하여 통신부(210)로 전송한다.The communication unit 210 is a two-way communication circuit including at least one or more of components such as a communication module and a communication chip corresponding to various types of wired and wireless communication protocols. For example, the communication unit 210 is a LAN card wired to a router or gateway via Ethernet, a wireless communication module that performs wireless communication with an AP according to a Wi-Fi method, or a one-to-one direct wireless device such as Bluetooth. It may be implemented as a wireless communication module that performs communication. The communication unit 210 communicates with a server on a network to transmit and receive data packets with the server. As another embodiment, the communication unit 210 may be connected to other external devices 110 and 120 other than the server, and receive various data including video/audio data from other external devices, or video/audio data to other external devices. You can transmit various data including. When receiving voice or sound through the microphone 270 provided in the electronic device 100, the communication unit 210 digitizes an analog voice signal (or sound signal) and transmits it to the processor 290, and transmits it to the processor 290. When receiving a voice signal, the analog voice signal is digitized and transmitted to the communication unit 210 using data transmission communication such as Bluetooth or Wi-Fi.
신호입출력부(220)는 셋탑박스, 광학미디어 재생장치와 같은 외부기기, 또는 외부 디스플레이장치나, 스피커 등과 1:1 또는 1:N(N은 자연수) 방식으로 유선 접속됨으로써, 해당 외부기기로부터 비디오/오디오 신호를 수신하거나 또는 해당 외부기기에 비디오/오디오 신호를 출력한다. 신호입출력부(120)는 예를 들면 HDMI 포트, DisplayPort, DVI 포트, 썬더볼트, USB 포트 등과 같이, 기 설정된 전송규격에 따른 커넥터 또는 포트 등을 포함한다. 이 때, 예컨대, HDMI 포트, DP, 썬더볼트 등은 비디오/오디오 신호를 동시에 전송할 수 있는 커넥터 또는 포트이고, 다른 실시예로서, 신호입출력부(220)는, 비디오/오디오 신호를 각각 별개로 전송하는 커넥터 또는 포트를 포함할 수도 있다.The signal input/output unit 220 is connected to an external device such as a set-top box, an optical media player, or an external display device or a speaker in a 1:1 or 1:N (N is a natural number) method, thereby providing video from the external device. /Receive an audio signal or output a video/audio signal to the external device. The signal input/output unit 120 includes, for example, a connector or port according to a preset transmission standard, such as an HDMI port, a DisplayPort, a DVI port, a Thunderbolt, a USB port, and the like. At this time, for example, HDMI port, DP, Thunderbolt, etc. are connectors or ports capable of simultaneously transmitting video/audio signals, and as another embodiment, the signal input/output unit 220 transmits video/audio signals separately. It may also include a connector or port.
방송수신부(230)는 수신하는 영상신호의 규격 및 전자장치(100)의 구현 형태에 대응하여 다양한 방식으로 구현될 수 있다. 방송수신부(230)는 영상신호가 방송신호이므로, 이 방송신호를 채널 별로 튜닝하는 튜너(tuner)를 포함한다. 입력신호는 외부기기로부터 입력될 수 있으며, 예컨대, PC, AV기기, TV, 스마트폰, 스마트패드 등과 같은 외부기기로부터 입력될 수 있다. 또한, 입력신호는 인터넷 등과 같은 네트워크를 통해 수신되는 데이터로부터 기인한 것일 수 있다. 이 경우, 방송수신부(230)는 외부기기와 통신을 수행하는 네트워크 통신부를 포함할 수 있다. The broadcast receiver 230 may be implemented in various ways in accordance with the standard of the received image signal and the implementation form of the electronic device 100. Since the video signal is a broadcast signal, the broadcast receiving unit 230 includes a tuner that tunes the broadcast signal for each channel. The input signal may be input from an external device, and may be input from an external device such as a PC, AV device, TV, smart phone, smart pad, or the like. Also, the input signal may be derived from data received through a network such as the Internet. In this case, the broadcast receiving unit 230 may include a network communication unit that communicates with an external device.
방송수신부(230)는 통신방식으로 유선 또는 무선통신을 사용할 수 있다. 방송수신부(230)는 본 실시예에 따르면 전자장치(100)에 내장되나, 동글(dongle) 또는 모듈(module) 형태로 구현되어 전자장치(100)의 커넥터에 착탈될 수도 있다. 방송수신부(230)는 유선 통신부를 포함하는 경우 기 설정된 주파수(클럭 주파수)의 클럭 신호를 포함하는 유선 디지털 신호를 수신하며, 무선 통신부를 포함하는 경우 기설정된 주파수(캐리어 주파수)의 무선 디지털 신호를 수신한다. 방송수신부(230)를 통해 입력된 입력신호 중 기설정된 주파수 신호(클럭 신호 또는 캐리어 주파수 신호)는 필터부를 통과하여 처리될 수 있다. 방송수신부(230)에서 수신되는 입력신호의 종류는 한정되지 않으며, 예를 들어, 유선 디지털 신호, 무선 디지털 신호 및 아날로그 신호 중 적어도 하나를 수신 가능할 수 있다. 여기서, 방송수신부(230)가 아날로그 신호를 수신하는 경우, 기 설정된 주파수 신호가 추가된 입력신호를 수신할 수 있다.The broadcast receiver 230 may use wired or wireless communication as a communication method. The broadcast receiving unit 230 is embedded in the electronic device 100 according to the present embodiment, but may be implemented in the form of a dongle or a module and attached to and detached from the connector of the electronic device 100. The broadcast receiving unit 230 receives a wired digital signal including a clock signal of a preset frequency (clock frequency) when including a wired communication unit, and receives a wireless digital signal of a preset frequency (carrier frequency) when including a wireless communication unit. Receive. Among the input signals input through the broadcast receiving unit 230, a preset frequency signal (clock signal or carrier frequency signal) may be processed by passing through the filter unit. The type of the input signal received by the broadcast receiver 230 is not limited, and for example, at least one of a wired digital signal, a wireless digital signal, and an analog signal may be received. Here, when the broadcast receiver 230 receives an analog signal, it may receive an input signal to which a preset frequency signal is added.
디스플레이부(240)는 화면 상에 영상을 표시할 수 있는 디스플레이 패널을 포함한다. 디스플레이 패널은 액정 방식과 같은 수광 구조 또는 OLED 방식과 같은 자발광 구조로 마련된다. 디스플레이부(240)는 디스플레이 패널의 구조에 따라서 부가적인 구성을 추가로 포함할 수 있는데, 예를 들면 디스플레이 패널이 액정 방식이라면, 디스플레이부(240)는 액정 디스플레이 패널과, 광을 공급하는 백라이트유닛과, 액정 디스플레이 패널의 액정을 구동시키는 패널구동기판을 포함한다.The display unit 240 includes a display panel capable of displaying an image on a screen. The display panel is provided with a light-receiving structure such as a liquid crystal method or a self-luminous structure such as an OLED method. The display unit 240 may additionally include an additional component according to the structure of the display panel. For example, if the display panel is a liquid crystal type, the display unit 240 includes a liquid crystal display panel and a backlight unit that supplies light. And, it includes a panel driving substrate for driving the liquid crystal of the liquid crystal display panel.
사용자입력부(250)는 사용자의 입력을 수행하기 위해 마련된 다양한 종류의 입력 인터페이스 관련 회로를 포함한다. 사용자입력부(250)는 전자장치(100)의 종류에 따라서 여러 가지 형태의 구성이 가능하며, 예를 들면 전자장치(100)의 기계적 또는 전자적 버튼부, 전자장치(100)와 분리된 리모트 컨트롤러, 터치패드, 디스플레이부(240)에 설치된 터치스크린 등이 있다.The user input unit 250 includes various types of input interface related circuits provided to perform user input. The user input unit 250 can be configured in various forms according to the type of the electronic device 100, for example, a mechanical or electronic button unit of the electronic device 100, a remote controller separated from the electronic device 100, There are a touch pad, a touch screen installed on the display unit 240, and the like.
저장부(260)는 디지털화된 데이터를 저장한다. 저장부(260)는 전원의 제공 유무와 무관하게 데이터를 보존할 수 있는 비휘발성 속성의 스토리지(storage)와, 프로세서(290)에 의해 처리되기 위한 데이터가 로딩되며 전원이 제공되지 않으면 데이터를 보존할 수 없는 휘발성 속성의 메모리(memory)를 포함한다. 스토리지에는 플래시메모리(flash-memory), HDD(hard-disc drive), SSD(solid-state drive) ROM(Read Only Memory) 등이 있으며, 메모리에는 버퍼(buffer), 램(RAM; Random Access Memory) 등이 있다.The storage unit 260 stores digitized data. The storage unit 260 loads storage of nonvolatile properties that can store data regardless of whether or not power is provided, and data to be processed by the processor 290 is loaded, and if power is not provided, data is stored. Includes memory of volatile properties that cannot be done. Storage includes flash-memory, hard-disc drive (HDD), solid-state drive (SSD) read-only memory (ROM), etc., and buffers and random access memory (RAM) in memory Etc.
마이크로폰(270)은 사용자 발화를 비롯한 외부 환경의 소리를 수집한다. 마이크로폰(270)은 수집된 소리의 신호를 프로세서(290)에 전달한다. 전자장치(100)는 사용자 음성을 수집하는 마이크로폰(270)을 구비하거나, 또는 마이크로폰을 가진 리모트 컨트롤러, 스마트폰 등의 외부장치로부터 음성신호를 수신할 수 있다. 외부장치에 리모트 컨트롤러 어플리케이션을 설치하여 전자장치(100)를 제어하거나 음성인식 등의 기능을 수행할 수도 있다. 이와 같은 어플리케이션이 설치된 외부장치의 경우, 사용자 음성을 수신할 수 있으며, 외부장치는 전자장치(100)와 Wi-Fi/BT 또는 적외선 등을 이용하여 데이터 송수신 및 제어가 가능한 바, 상기 통신 방식을 구현할 수 있는 복수의 통신부(210)가 전자장치 내에 존재할 수 있다.The microphone 270 collects sounds of an external environment including user speech. The microphone 270 transmits the collected sound signal to the processor 290. The electronic device 100 may include a microphone 270 that collects a user's voice, or may receive a voice signal from an external device such as a remote controller or a smart phone having a microphone. A remote controller application may be installed in an external device to control the electronic device 100 or perform functions such as voice recognition. In the case of an external device with such an application installed, the user's voice can be received, and the external device can transmit and receive data and control data using the electronic device 100 and Wi-Fi/BT or infrared light. A plurality of communication units 210 that can be implemented may exist in the electronic device.
스피커(280)는 프로세서(290)에 의해 처리되는 오디오 데이터를 소리로 출력한다. 스피커(280)는 어느 한 오디오 채널의 오디오 데이터에 대응하게 마련된 단위 스피커를 포함하며, 복수 오디오 채널의 오디오 데이터에 각기 대응하도록 복수의 단위 스피커를 포함할 수 있다. 본 발명에서 스피커(280)는 전자장치(100)가 다른 장치의 외부기기로서의 역할을 하는 경우, 자신의 위치를 다른 장치에 알리기 위해 소리를 출력하는 의미를 가진다. 다른 실시예로서, 스피커(280)는 전자장치(100)와 분리되어 마련될 수 있으며, 이 경우 전자장치(100)는 오디오 데이터를 신호입출력부(220)를 통하여 스피커(280)로 전달할 수 있다.The speaker 280 outputs audio data processed by the processor 290 as sound. The speaker 280 includes a unit speaker provided to correspond to audio data of one audio channel, and may include a plurality of unit speakers to respectively correspond to audio data of a plurality of audio channels. In the present invention, when the electronic device 100 serves as an external device of another device, the speaker 280 has a meaning of outputting a sound to inform another device of its location. As another embodiment, the speaker 280 may be provided separately from the electronic device 100, and in this case, the electronic device 100 may transmit audio data to the speaker 280 through the signal input/output unit 220. .
프로세서(290)는 인쇄회로기판 상에 장착되는 CPU, 칩셋, 버퍼, 회로 등으로 구현되는 하나 이상의 하드웨어 프로세서(290)를 포함하며, 설계 방식에 따라서는 SOC(system on chip)로 구현될 수도 있다. 프로세서(290)는 전자장치(100)가 디스플레이장치로 구현되는 경우에 디멀티플렉서, 디코더, 스케일러, 오디오 DSP(Digital Signal Processor), 앰프 등의 다양한 프로세스에 대응하는 모듈들을 포함한다. 여기서, 이러한 모듈들 중 일부 또는 전체가 SOC로 구현될 수 있다. 예를 들면, 디멀티플렉서, 디코더, 스케일러 등 영상처리와 관련된 모듈이 영상처리 SOC로 구현되고, 오디오 DSP는 SOC와 별도의 칩셋으로 구현되는 것이 가능하다.The processor 290 includes one or more hardware processors 290 implemented as CPUs, chipsets, buffers, circuits, etc. mounted on a printed circuit board, and may be implemented as a system on chip (SOC) depending on a design method. . When the electronic device 100 is implemented as a display device, the processor 290 includes modules corresponding to various processes such as a demultiplexer, a decoder, a scaler, an audio digital signal processor (DSP), and an amplifier. Here, some or all of these modules may be implemented as an SOC. For example, modules related to image processing such as a demultiplexer, decoder, and scaler may be implemented as an image processing SOC, and an audio DSP may be implemented as an SOC and a separate chipset.
프로세서(290)는 마이크로폰(270) 등에 의해 획득한 음성신호를 음성데이터로 변환하고, 변환된 음성데이터를 처리한다. 그 후, 프로세서(290)는 처리된 음성데이터에 기초하여 음성 인식을 수행하고 음성데이터가 나타내는 커맨드를 식별하고, 식별된 커맨드에 따라서 동작을 수행한다. 음성데이터는 음성신호를 텍스트 데이터로 변환하는 STT(Speech-to-Text) 처리 과정을 통해 얻어진 텍스트 데이터일 수 있다. STT 처리 과정을 거친 경우, STT서버와 다른 서버 또는 STT서버 역할도 하는 서버, 해당 서버에서 데이터를 처리하여 전자장치로 전송한 정보/데이터를 기반으로 특정 기능을 수행할 수도 있다. 음성데이터 처리 과정과, 커맨드 식별 및 수행 과정은, 전자장치(100)에서 모두 실행될 수도 있다. 그러나, 이 경우에 전자장치(100)에 필요한 시스템 부하 및 소요 저장용량이 상대적으로 커지게 되므로, 적어도 일부의 과정은 네트워크를 통해 전자장치(100)와 통신 가능하게 접속되는 적어도 하나의 서버에 의해 수행될 수 있다. The processor 290 converts the voice signal acquired by the microphone 270 or the like into voice data, and processes the converted voice data. Thereafter, the processor 290 performs voice recognition based on the processed voice data, identifies a command indicated by the voice data, and performs an operation according to the identified command. The voice data may be text data obtained through a speech-to-text (STT) process for converting a voice signal into text data. When the STT processing process is passed, a server that is different from the STT server or a server that also serves as an STT server, or a server that processes data in the server may perform a specific function based on the information/data transmitted to the electronic device. Both the voice data processing process and the command identification and execution process may be performed in the electronic device 100. However, in this case, since the system load and required storage capacity required for the electronic device 100 are relatively large, at least some of the processes are performed by at least one server that is communicatively connected to the electronic device 100 through a network. Can be done.
본 발명의 일 실시예에 따르면, 프로세서(290)는 마이크로폰(270) 등에 의해 사용자(130)의 발화음성을 수신한다. 다만, 사용자(130)의 발화음성을 수신할 때, 본 발명의 전자장치(100)는 사용자의 발화음성 외에도, 전자장치(100)의 주변에 설치된 다른 외부기기(110, 120)로부터 나오는 소리, 즉, 노이즈를 함께 수신할 수 있다. 프로세서(290)는 수신한 소리를 처리하는 과정에서 이러한 노이즈들을 제거하여 사용자의 발화음성에 대응하는 동작을 수행하도록 제어한다. 노이즈를 제거하는 과정에 대해서는 뒤에서 자세히 설명한다.According to an embodiment of the present invention, the processor 290 receives the spoken voice of the user 130 by the microphone 270 or the like. However, when receiving the user's uttered voice, the electronic device 100 of the present invention includes, in addition to the user's uttered voice, the sound from other external devices 110 and 120 installed around the electronic device 100, That is, noise can be received together. The processor 290 removes these noises in the process of processing the received sound and controls to perform an operation corresponding to the user's spoken voice. The process of removing noise will be described in detail later.
본 발명에 따른 프로세서(290)는 전자장치(100)와 같은 기기(Machine)가 읽을 수 있는 저장 매체(Storage Medium)에 저장된 소프트웨어의 명령어들 중 적어도 하나의 명령을 호출하고, 그것을 실행할 수 있다. 이것은 전자장치(100)와 같은 기기가 상기 호출된 적어도 하나의 명령어에 따라 적어도 하나의 기능을 수행하도록 운영되는 것을 가능하게 한다. 상기 하나 이상의 명령어들은 컴파일러에 의해 생성된 코드 또는 인터프리터에 의해 실행될 수 있는 코드를 포함할 수 있다. 기기로 읽을 수 있는 저장매체는, 비일시적(Non-transitory) 저장매체의 형태로 제공될 수 있다. 여기서, ‘비일시적’은 저장매체가 실재(tangible)하는 장치이고, 신호(예컨대, 전자기파)를 포함하지 않는다는 것을 의미할 뿐이며, 이 용어는 데이터가 저장매체에 반영구적으로 저장되는 경우와 임시적으로 저장되는 경우를 구분하지 않는다. The processor 290 according to the present invention may call and execute at least one command of software stored in a storage medium that can be read by a machine such as the electronic device 100. This enables a device such as the electronic device 100 to be operated to perform at least one function according to the called at least one command. The one or more instructions may include code generated by a compiler or code executable by an interpreter. A storage medium that can be read by a device may be provided in the form of a non-transitory storage medium. Here,'non-transient' only means that the storage medium is a tangible device and does not contain a signal (e.g., electromagnetic wave), and this term is used when data is semi-permanently stored in the storage medium and temporarily stored. It does not distinguish between cases.
한편, 프로세서(290)는 마이크로폰(270) 등에 의해 사용자(130)의 발화음성 과 함께 다른 외부기기의 소리, 즉, 노이즈를 수신하고, 수신한 전체 소리에서 이러한 노이즈들을 제거하여 사용자의 발화음성에 대응하는 동작을 수행하기 위한 데이터 분석, 처리, 및 결과 정보 생성 중 적어도 일부를 규칙 기반 또는 인공지능(Artificial Intelligence) 알고리즘으로서 기계학습, 신경망 네트워크(neural network), 또는 딥러닝 알고리즘 중 적어도 하나를 이용하여 수행할 수 있다.On the other hand, the processor 290 receives the sound of another external device, that is, noise, along with the spoken voice of the user 130 by the microphone 270, and removes these noises from the entire received sound to improve the speech voice of the user. At least one of machine learning, neural network, or deep learning algorithm is used as rule-based or artificial intelligence algorithm for at least part of data analysis, processing, and generation of result information to perform the corresponding operation. You can do it.
일 예로, 프로세서(290)는 학습부 및 인식부의 기능을 함께 수행할 수 있다. 학습부는 학습된 신경망 네트워크를 생성하는 기능을 수행하고, 인식부는 학습된 신경망 네트워크를 이용하여 데이터를 인식(또는, 추론, 예측, 추정, 판단)하는 기능을 수행할 수 있다. 학습부는 신경망 네트워크를 생성하거나 갱신할 수 있다. 학습부는 신경망 네트워크를 생성하기 위해서 학습 데이터를 획득할 수 있다. 일 예로, 학습부는 학습 데이터를 저장부(260) 또는 외부로부터 획득할 수 있다. 학습 데이터는, 신경망 네트워크의 학습을 위해 이용되는 데이터일 수 있으며, 상기한 동작을 수행한 데이터를 학습데이터로 이용하여 신경망 네트워크를 학습시킬 수 있다.For example, the processor 290 may perform functions of a learning unit and a recognition unit together. The learning unit may perform a function of generating a learned neural network network, and the recognition unit may perform a function of recognizing (or inferring, predicting, estimating, or determining) data using the learned neural network network. The learning unit can create or update a neural network network. The learning unit may acquire training data to create a neural network. For example, the learning unit may acquire the learning data from the storage unit 260 or externally. The training data may be data used for training a neural network network, and the neural network network may be trained by using the data obtained by performing the above-described operation as training data.
학습부는 학습 데이터를 이용하여 신경망 네트워크를 학습시키기 전에, 획득된 학습 데이터에 대하여 전처리 작업을 수행하거나, 또는 복수 개의 학습 데이터들 중에서 학습에 이용될 데이터를 선별할 수 있다. 일 예로, 학습부는 학습 데이터를 기 설정된 포맷으로 가공하거나, 필터링하거나, 또는 노이즈를 추가/제거하여 학습에 적절한 데이터의 형태로 가공할 수 있다. 학습부는 전처리된 학습 데이터를 이용하여 상기한 동작을 수행하도록 설정된 신경망 네트워크를 생성할 수 있다.The learning unit may perform pre-processing on the acquired training data before training the neural network network using the training data, or may select data to be used for training from among a plurality of training data. For example, the learning unit may process the training data into a preset format, filter it, or add/remove noise to process the training data into a form suitable for learning. The learning unit may generate a neural network network configured to perform the above-described operation by using the preprocessed training data.
학습된 신경망 네트워크는, 복수의 신경망 네트워크(또는, 레이어)들로 구성될 수 있다. 복수의 신경망 네트워크의 노드들은 가중치를 가지며, 복수의 신경망 네트워크들은 일 신경망 네트워크의 출력 값이 다른 신경망 네트워크의 입력 값으로 이용되도록 서로 연결될 수 있다. 신경망 네트워크의 예로는, CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN (Bidirectional Recurrent Deep Neural Network) 및 심층 Q-네트워크 (Deep Q-Networks)과 같은 모델을 포함할 수 있다.The learned neural network network may be composed of a plurality of neural network networks (or layers). Nodes of a plurality of neural network networks have weights, and the plurality of neural network networks may be connected to each other so that an output value of one neural network is used as an input value of another neural network. Examples of neural network networks include CNN (Convolutional Neural Network), DNN (Deep Neural Network), RNN (Recurrent Neural Network), RBM (Restricted Boltzmann Machine), DBN (Deep Belief Network), BRDNN (Bidirectional Recurrent Deep Neural Network), and It may include models such as Deep Q-Networks.
한편 인식부는 상기한 동작을 수행하기 위해, 타겟 데이터를 획득할 수 있다. 타겟 데이터는 저장부(260) 또는 외부로부터 획득된 것일 수 있다. 타겟 데이터는 신경망 네트워크의 인식 대상이 되는 데이터일 수 있다. 인식부는 타겟 데이터를 학습된 신경망 네트워크에 적용하기 전에, 획득된 타겟 데이터에 대하여 전처리 작업을 수행하거나, 또는 복수 개의 타겟 데이터들 중에서 인식에 이용될 데이터를 선별할 수 있다. 일 예로, 인식부는 타겟 데이터를 기 설정된 포맷으로 가공하거나, 필터링 하거나, 또는 노이즈를 추가/제거하여 인식에 적절한 데이터의 형태로 가공할 수 있다. 인식부는 전처리된 타겟 데이터를 신경망 네트워크에 적용함으로써, 신경망 네트워크로부터 출력되는 출력값을 획득할 수 있다. 인식부는 출력값과 함께, 확률값 또는 신뢰도값을 획득할 수 있다.Meanwhile, the recognition unit may acquire target data to perform the above-described operation. The target data may be obtained from the storage unit 260 or externally. The target data may be data to be recognized by the neural network. The recognition unit may perform pre-processing on the acquired target data or select data to be used for recognition from among a plurality of target data before applying the target data to the learned neural network network. For example, the recognition unit may process target data into a preset format, filter, or add/remove noise to process the target data into a form suitable for recognition. The recognition unit may obtain an output value output from the neural network network by applying the preprocessed target data to the neural network network. The recognition unit may acquire a probability value or a reliability value together with the output value.
일 예로, 본 발명에 따른 전자장치(100)의 제어방법은 컴퓨터 프로그램 제품(Computer Program Product)에 포함되어 제공될 수 있다. 컴퓨터 프로그램 제품은 상품으로서 판매자 및 구매자 간에 거래될 수 있다. 컴퓨터 프로그램 제품은 기기로 읽을 수 있는 저장 매체(예컨대, CD-ROM)의 형태로 배포되거나, 또는 어플리케이션 스토어(예컨대, 플레이 스토어TM)를 통해 또는 두 개의 사용자 장치들(예컨대, 스마트폰들) 간에 직접, 온라인으로 배포(예컨대, 다운로드 또는 업로드)될 수 있다. 온라인 배포의 경우에, 컴퓨터 프로그램 제품의 적어도 일부는 제조사의 서버, 어플리케이션 스토어의 서버, 또는 중계 서버의 메모리와 같은 기기로 읽을 수 있는 저장 매체에 적어도 일시 저장되거나, 임시적으로 생성될 수 있다.For example, the control method of the electronic device 100 according to the present invention may be provided by being included in a computer program product. Computer program products can be traded between sellers and buyers as commodities. The computer program product is distributed in the form of a device-readable storage medium (e.g., CD-ROM), or through an application store (e.g., Play StoreTM) or between two user devices (e.g., smartphones). It can be distributed directly, online (eg, downloaded or uploaded). In the case of online distribution, at least a part of the computer program product may be temporarily stored or temporarily generated in a storage medium that can be read by a device such as a server of a manufacturer, a server of an application store, or a memory of a relay server.
도 3은 본 발명의 일 실시예에 따른 전자장치의 동작 흐름도를 도시한 도면이다. 본 실시예에서는 전자장치(100)는 사용자의 발화를 보다 정확하게 인식하기 위해, 외부기기(110, 120)로부터 발생하는 소리를 제거하고자, 외부기기(110, 120)들의 위치를 식별하는 흐름도를 도시한다. 따라서, 전자장치(100)는 외부기기(110)에 자신의 위치를 알리는 소리(제1소리)를 출력하도록 통신부(210)에 요청할 수 있다(S310). 여기서 자신의 위치를 알리는 소리란, 예컨대, 외부기기(110)가 AI스피커인 경우 "저는 AI스피커 갤럭시 홈입니다." 등의 소리일 수 있다. 이러한 소리는 사용자에게 현재 장치 간 위치를 식별하는 동작을 수행하고 있음을 알리는 의미로써 구성된 문장일 수 있으나, 사용자의 편의를 위해 비가청 주파수로 이루어질 수 있으며, 어느 하나에 한정되는 것은 아니다. 또한, 동작 S310의 요청은 사용자입력부(250)에 입력된 사용자의 명령에 기초하여 개시될 수 있다. 장치 간 위치를 식별하도록 하기 위한 사용자의 명령은, 예컨대, 리모트 컨트롤러의 버튼을 입력하거나 디스플레이 화면에 터치함으로써 이루어질 수 있다. 외부기기(110)가 제1소리를 출력하면, 전자장치(100)는 제1소리가 수신되는 방향에 기초하여 외부기기(110)의 위치를 식별한다(S320). 전자장치(100)가 외부기기(110)의 위치를 식별하는 과정은 추후 설명한다. 외부기기(110)의 위치를 식별하고 나면, 프로세서(290)는 식별된 외부기기(110)의 위치에 관한 정보를 저장부(260)에 저장한다(S330). 본 발명의 일 실시예에 따른 과정은 다른 외부기기(120)에도 동일하게 적용될 수 있다.3 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention. In this embodiment, the electronic device 100 shows a flow chart for identifying the locations of the external devices 110 and 120 in order to remove sound generated from the external devices 110 and 120 in order to more accurately recognize the user's utterance. do. Accordingly, the electronic device 100 may request the communication unit 210 to output a sound (first sound) informing the external device 110 of its location (S310). Here, the sound that announces your location means, for example, if the external device 110 is an AI speaker, "I am the AI speaker Galaxy Home." It could be the sound of the back. Such a sound may be a sentence constructed as a meaning to inform the user that an operation to identify a location between devices is being performed, but may be made of an inaudible frequency for the user's convenience, and is not limited to any one. In addition, the request of operation S310 may be initiated based on a user's command input to the user input unit 250. The user's command for identifying the location between devices may be performed, for example, by inputting a button of a remote controller or touching a display screen. When the external device 110 outputs the first sound, the electronic device 100 identifies the location of the external device 110 based on the direction in which the first sound is received (S320). A process by which the electronic device 100 identifies the location of the external device 110 will be described later. After identifying the location of the external device 110, the processor 290 stores information on the location of the identified external device 110 in the storage unit 260 (S330). The process according to an embodiment of the present invention can be applied equally to other external devices 120.
도 4는 본 발명의 일 실시예에 따른 전자장치의 동작 흐름도를 도시한 도면이다. 본 발명의 일 실시예에 따르면, 전자장치(100)는 외부기기(110)의 위치를 식별하는 동작을 수행하는 시점(이하, '외부기기 위치 식별'이라고도 함)이 다양할 수 있다. 전자장치(100)의 저장부(260)는 외부기기 위치 식별을 수행하기 위한 시점에 대한 정보를 저장할 수 있다. 프로세서(290)는 저장부(260)에 저장된 정보를 참조하여 외부기기 위치 식별을 수행하기 위한 시점을 판단할 수 있다(S410). 본 실시예의 저장부(260)에 저장된 정보는, 외부기기 위치 식별을 수행하기 위한 시점으로서 다음과 같은 시점을 나타낼 수 있다. 예컨대, 전자장치(100)는 최초로 설치하는 과정에서 음성인식 초기 설정을 수행한다. 이 과정에서 사용자는 전자장치(100)의 전원을 연결하고, 홈 네트워크에 연결하며, 미리 설정한 문장을 반복해서 소리 내어 읽음으로써 전자장치(100)의 음성인식 기능을 사용하기 위한 초기 설정을 한다. 이 때, TV, 냉장고 등의 대부분의 가전제품은 그 크기와 무게로 인해 사용자가 한 번 설치한 후 그 위치를 자주 변경하지 않는다고 가정한다. 따라서, 전자장치(100)는 최초로 설치되어 네트워크에 연결된 경우 외부기기 위치 식별을 실행할 수 있다. 다른 실시예로서, 저장부(260)에 저장된 정보가 나타내는 시점은, 전자장치(100) 또는 외부기기(110) 중 어느 하나가 장시간 인터넷에 연결되지 않거나, 전원이 꺼져 있었던 경우 등을 나타낼 수 있다. 즉, 전자장치(100) 또는 외부기기(110)가 장시간 미연결 혹은 전원이 오프되었다면, 이들의 위치가 변경되었다고 예측할 수 있으므로, 새로이 외부기기 위치 식별을 실행할 수 있다. 또 다른 예로는 새로운 기기가 네트워크에 연결된 경우, 이를 감지하여 새로운 외부기기의 위치를 식별할 수 있다. 도 4를 다시 참조하면, 저장부(260)에 저장된 정보가 나타내는 시점이 도래하게 되면(S420의 Yes), 전자장치(100)는 도 3의 동작 S310으로 진입하여, 앞서 언급된 설명과 동일한 과정을 실행한다. 본 실시예에 따르면, 다양한 환경에서 전자장치(100)가 외부기기(110)의 위치를 식별할 수 있어 활용도가 높다.4 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention. According to an embodiment of the present invention, the electronic device 100 may have various times (hereinafter, also referred to as “external device location identification”) at which an operation to identify the location of the external device 110 is performed. The storage unit 260 of the electronic device 100 may store information on a time point for performing location identification of an external device. The processor 290 may determine a time point for performing external device location identification by referring to information stored in the storage unit 260 (S410). The information stored in the storage unit 260 of the present embodiment may represent the following time point as a time point for performing the location identification of the external device. For example, the electronic device 100 performs initial setup of voice recognition during the initial installation process. In this process, the user connects the power of the electronic device 100, connects it to the home network, and repeatedly reads the previously set sentences aloud to make initial settings for using the voice recognition function of the electronic device 100. . In this case, it is assumed that most home appliances such as TVs and refrigerators do not change their location frequently after installing them once due to their size and weight. Accordingly, when the electronic device 100 is initially installed and connected to a network, the location of the external device can be identified. As another embodiment, the time point indicated by the information stored in the storage unit 260 may indicate a case in which either the electronic device 100 or the external device 110 is not connected to the Internet for a long time, or the power has been turned off. . That is, if the electronic device 100 or the external device 110 is not connected or powered off for a long time, it can be predicted that their location has changed, and thus the location of the external device can be newly identified. As another example, when a new device is connected to a network, it may be detected and the location of the new external device may be identified. Referring again to FIG. 4, when the time point indicated by the information stored in the storage unit 260 arrives (Yes in S420), the electronic device 100 enters operation S310 in FIG. 3, and the same process as described above. Run. According to the present embodiment, since the electronic device 100 can identify the location of the external device 110 in various environments, the utilization is high.
도 5는 본 발명의 일 실시예에 따른 전자장치의 동작 흐름도를 도시한 도면이다. 본 실시예는 도 3을 참조하여 설명한, 제1소리가 수신되는 방향에 기초하여 외부기기(110)의 위치를 식별하는 동작(S320)의 보다 구체적인 예시이다. 본 실시예에 따르면, 전자장치(100)의 저장부(260)는 수신되는 제1소리에 대하여 미리 정의된 특성을 저장한다. 여기서 제1소리의 특성이란, 제1소리의 진폭, 주파수, 주기 등 소리의 파형이 될 수 있다. 다른 실시예로서, 제1소리의 특성은 외부기기(110)의 이름, 제조사 등과 같은 외부기기(110)의 식별정보이거나, 외부기기(110)가 출력하는 제1소리에 포함된 발화의 리스트 등에 관한 정보가 될 수 있다. 따라서, 전자장치(100)가 소리를 수신(S520)하게 되면, 프로세서(290)는 수신한 소리가 저장부(260)에 저장된 제1소리의 미리 정의된 특성에 해당하는 소리인지 여부를 식별한다(S530). 프로세서(290)는, 수신된 소리가 미리 정의된 특성에 해당하는 제1소리인 것으로 식별하면(S540), 이어서 제1소리를 발생시키는 외부기기(110)의 위치를 식별하는 동작을 수행할 수 있다. 또 다른 실시예로서, 프로세서(290)는 저장부(260)에 저장된 제1소리의 특성에 기초하여, 통신부(210)를 통해 외부기기(110)에 제1소리의 특성을 가진 제1소리를 출력하도록 요청할 수 있다. 5 is a flowchart illustrating an operation of an electronic device according to an embodiment of the present invention. This embodiment is a more specific example of the operation (S320) of identifying the location of the external device 110 based on the direction in which the first sound is received, described with reference to FIG. 3. According to the present embodiment, the storage unit 260 of the electronic device 100 stores a predefined characteristic of the received first sound. Here, the characteristic of the first sound may be a waveform of the sound such as amplitude, frequency, and period of the first sound. In another embodiment, the characteristic of the first sound is identification information of the external device 110 such as the name and manufacturer of the external device 110, or a list of utterances included in the first sound output from the external device 110. It can be information about. Accordingly, when the electronic device 100 receives the sound (S520), the processor 290 identifies whether the received sound corresponds to a predefined characteristic of the first sound stored in the storage unit 260. (S530). If the received sound is identified as the first sound corresponding to a predefined characteristic (S540), the processor 290 may then perform an operation of identifying the location of the external device 110 generating the first sound. have. In another embodiment, the processor 290 transmits the first sound having the characteristics of the first sound to the external device 110 through the communication unit 210 based on the characteristics of the first sound stored in the storage unit 260. You can ask to print.
도 6은 본 발명의 일 실시예에 따른 발화리스트를 도시한 도면이다. 본 실시예는 도 5에서 설명한 미리 정의된 특성 중 하나로써 발화리스트가 저장될 수 있다. 발화리스트는 외부기기(110)가 자신의 위치를 전자장치(100)에 알리기 위한 문장들로 이루어진 리스트로써, 전자장치(100)는 외부기기(110)의 소리를 듣고 외부기기(110)의 위치를 식별할 수 있다. 또한, 발화리스트는 장치 간 위치를 식별하는 동작이 수행 중임을 사용자에게 안내하는 문구로 사용될 수 있다. 이는 도 4에 기술된 바와 같이, 외부기기 위치 식별을 수행하기 위한 시점에 대한 정보가 저장되어 있더라도, 사용자는 이에 대해 인지하지 않고 있을 수 있으므로, 문장으로 이루어진 발화리스트가 저장될 수 있다. 예컨대, 외부기기(110)가 스피커인 경우, "저는 AI 스피커 갤럭시홈입니다. 음악이 듣고 싶을 때는 저를 불러주세요."혹은 "지금 음성인식 초기설정을 위해 삼성 스마트 TV에게 저의 위치를 알리고 있습니다." 등의 소리를 반복해서 낼 수 있다. 전자장치(100)는 외부기기(110)의 예시된 소리를 듣고, 진폭, 주파수, 주기 등을 고려하여 전자장치의 위치를 식별할 수 있다. 6 is a diagram showing a utterance list according to an embodiment of the present invention. In this embodiment, a utterance list may be stored as one of the predefined characteristics described in FIG. 5. The utterance list is a list consisting of sentences for the external device 110 to inform the electronic device 100 of its location, and the electronic device 100 listens to the sound of the external device 110 and the location of the external device 110 Can be identified. In addition, the utterance list may be used as a phrase to inform the user that an operation of identifying a location between devices is being performed. As illustrated in FIG. 4, even though information on the time point for performing the location identification of the external device is stored, the user may not be aware of this, and thus the utterance list composed of sentences may be stored. For example, if the external device 110 is a speaker, "I am an AI speaker Galaxy Home. Please call me when you want to listen to music." " You can make the sound of your back repeatedly. The electronic device 100 may listen to the exemplified sound of the external device 110 and identify the location of the electronic device in consideration of amplitude, frequency, and period.
도 7은 본 발명의 일 실시예에 따른 서버와 장치간 통신하는 것을 도시한 도면이고, 도 8은 본 발명의 일 실시예에 따른 전자장치의 동작 흐름도를 도시한 도면이다. 본 실시예에 따르면, 프로세서(290)는 통신부(210)를 통해 서버(710) 등 다른 장치로부터 외부기기(110)의 특성에 관한 정보를 수신하고(S810), 수신된 정보를 저장부(260)에 저장할 수 있다. 이는 장치간 제조사가 달라 서로의 특성에 관한 정보를 확보하기 어려운 경우에 있어서, 서버 등을 통해 특성에 관한 정보를 수신할 수 있다. 전자장치(100)는 수신하여 저장한 정보에 기초하여 외부기기(110)의 위치를 보다 용이하게 식별할 수 있다(S820). 7 is a diagram illustrating communication between a server and a device according to an embodiment of the present invention, and FIG. 8 is a diagram illustrating an operation flowchart of an electronic device according to an embodiment of the present invention. According to the present embodiment, the processor 290 receives information on the characteristics of the external device 110 from another device such as the server 710 through the communication unit 210 (S810), and stores the received information in the storage unit 260 ) Can be saved. In this case, in a case where it is difficult to obtain information on the characteristics of each other because the manufacturers of the devices are different, information on the characteristics may be received through a server or the like. The electronic device 100 may more easily identify the location of the external device 110 based on the received and stored information (S820).
도 9는 본 발명의 일 실시예에 따른 외부기기의 위치를 식별하는 모습을 도시한 도면이다. 소리가 발생하는 음원의 방향을 추정하는 방법에는 여러 가지가 존재한다. 그 중 도 9에 의하면, 소리가 특정영역에 도달하는 시간의 차이를 통해 음원의 위치를 식별할 수 있다. 본 실시예에 따르면, 외부기기(110)에서 소리가 발생하는 경우, 발생한 소리가 전자장치의 어느 두 지점 A와 B에 각각 도달하는 시간은 차이가 존재한다. 이 때, 소리의 속도와 각 지점에 도달하는데 걸리는 시간을 이용하여 외부기기로부터 A와 B까지의 거리를 알 수 있다. 이 때, A와 B 사이의 거리 d는 전자장치(100)와 외부기기(110)간의 거리 r에 비해서 매우 작다고 가정하면(d≪r), 전자장치와 외부기기의 지점 A와 B 사이의 거리차인 Δl을 알 수 있고, 이를 통해 전자장치(100)와 외부기기(110) 간의 각도 θ를 알 수 있다. 따라서, 전자장치(100)는, 예컨대, A와 B와 같이 상호 이격되어 배치되는 복수의 마이크로폰(270)을 포함하고, 외부기기(110)와의 거리 r 및 각도 θ를 통해 외부기기(110)의 위치를 식별할 수 있다. 다만, 도 9에 도시된 외부기기(110)의 위치 식별 방법은, 하나의 예시에 불과하며, 본 개시에 따라 외부기기(110)의 위치를 식별하는 방법은 다양할 수 있다.9 is a diagram illustrating a state of identifying a location of an external device according to an embodiment of the present invention. There are several methods of estimating the direction of a sound source from which sound is generated. Among them, according to FIG. 9, the location of the sound source can be identified through a difference in time when the sound reaches a specific region. According to the present embodiment, when sound is generated from the external device 110, there is a difference in time for the generated sound to reach any two points A and B of the electronic device, respectively. At this time, the distance from the external device to A and B can be known using the speed of the sound and the time it takes to reach each point. At this time, assuming that the distance d between A and B is very small compared to the distance r between the electronic device 100 and the external device 110 (d≪r), the distance between the points A and B of the electronic device and the external device The difference Δl can be known, and through this, the angle θ between the electronic device 100 and the external device 110 can be known. Accordingly, the electronic device 100 includes a plurality of microphones 270 disposed to be spaced apart from each other, such as A and B, and the external device 110 through a distance r and an angle θ from the external device 110 Location can be identified. However, the method of identifying the location of the external device 110 illustrated in FIG. 9 is only an example, and methods of identifying the location of the external device 110 may be various according to the present disclosure.
도 10은 본 발명의 일 실시예에 따른 외부기기에 관한 정보를 도시한 도면이다. 일 실시예에 따르면, 프로세서(290)는 외부기기(110)의 위치를 식별하고, 식별된 외부기기(110)의 위치에 관한 정보를 테이블(1010)의 형태로 저장부(260)에 저장할 수 있다. 이 때, 프로세서(290)는 외부기기(110)의 위치에 관한 정보를 외부기기(110)의 명칭, 거리, 방향, 연결여부 등에 매핑하여 저장할 수 있다. 예컨대, 외부기기 1의 경우 전자장치와의 거리는 r1이고, 전자장치(100)의 기준 방향에 대해서 방위각 θ1에 위치한다. 외부기기 2의 경우 전자장치(100)와의 거리는 r2이고, 방위각 θ2에 위치한다. 본 실시예의 외부기기(110)의 위치는 전자장치(100)의 기준 방향에 대한 방위각으로 나타내나, 이는 하나의 예시에 불과하며, 본 개시에 따른 외부기기(110)의 위치를 나타내는 정보는 다양할 수 있다. 또한, 외부기기(110)가 연결되지 않은 경우 외부기기(110)가 위치한 방향으로 들려오는 소리는 제거대상이 아니므로, 외부기기(110)의 네트워크나 전원 연결여부에 관한 정보 또한 저장할 수 있다. 외부기기(110)의 명칭 등 기타 정보들은 통신부(210)를 통하여 외부기기(110)로부터 수신할 수 있다.10 is a diagram showing information on an external device according to an embodiment of the present invention. According to an embodiment, the processor 290 may identify the location of the external device 110 and store information on the location of the identified external device 110 in the storage unit 260 in the form of a table 1010. have. In this case, the processor 290 may map and store information about the location of the external device 110 by mapping the name, distance, direction, and connection status of the external device 110. For example, in the case of the external device 1, the distance to the electronic device is r1, and is located at an azimuth angle θ1 with respect to the reference direction of the electronic device 100. In the case of the external device 2, the distance from the electronic device 100 is r2, and is located at an azimuth angle θ2. The location of the external device 110 of the present embodiment is indicated by an azimuth with respect to the reference direction of the electronic device 100, but this is only an example, and information indicating the location of the external device 110 according to the present disclosure is various can do. In addition, when the external device 110 is not connected, since the sound heard in the direction in which the external device 110 is located is not to be removed, information about whether the external device 110 is connected to a network or power source may also be stored. Other information such as the name of the external device 110 may be received from the external device 110 through the communication unit 210.
도 11은 본 발명의 일 실시예에 따른 전자장치가 수신되는 소리를 처리하는 상황을 도시하며, 도 12는 본 실시예의 전자장치가 수행하는 동작의 흐름도를 도시하며, 도 13은 본 실시예의 전자장치가 소리를 처리하는 노이즈 제거 블록을 도시한다. 도 11을 참조하면, 사용자(130)가 전자장치(100)의 음성인식 기능을 사용하고자 할 때, 사용자(130)의 발화에 의한 음성(S1)외에도 외부기기(110, 120)로부터 발생하는 소리(S2, S3)가 존재한다고 가정한다. 먼저, 도 12를 참조하면, 전자장치(100)는 마이크로폰(270)을 통해 소리(이하, '제2소리'라고도 함)를 수신한다(S1210). 이 경우, 도 11 및 13에 도시된 바와 같이, 전자장치(100)는 사용자(130)의 발화음성(S1)과 외부기기(110, 120)의 소리(S2, S3)를 합친 제2소리(S)를 마이크로폰(270)으로부터 획득하게 된다. 11 is a diagram illustrating a situation in which an electronic device processes a received sound according to an embodiment of the present invention, and FIG. 12 is a flowchart of an operation performed by the electronic device according to the present embodiment. It shows the noise reduction block in which the device processes sound. Referring to FIG. 11, when the user 130 wants to use the voice recognition function of the electronic device 100, the sound generated from the external devices 110 and 120 in addition to the voice S1 caused by the user 130's utterance Assume that (S2, S3) exists. First, referring to FIG. 12, the electronic device 100 receives a sound (hereinafter, also referred to as “second sound”) through a microphone 270 (S1210). In this case, as shown in FIGS. 11 and 13, the electronic device 100 includes a second sound (S1) of the user 130 and the sound (S2, S3) of the external devices 110 and 120 combined. S) is obtained from the microphone 270.
다음으로, 도 12를 참조하면, 전자장치(100)의 프로세서(290)는 수신되는 제2소리(S)의 신호에서 외부기기(110, 120)의 위치로부터 수신되는 소리에 대응하는 노이즈 성분을 제거한다(S1220). 이 때, 전자장치(100)가 사전에 식별해 둔 외부기기(110, 120)의 위치에 관한 정보(도 10의 1010 참조)에 기초하여, 전자장치(100)의 프로세서(290)는 획득한 제2소리(S)의 신호 중에 포함된 노이즈 성분(S2, S3)이 발생되는 외부기기(110, 120)의 위치를 판단할 수 있다. 따라서, 프로세서(290)는 획득한 제2소리(S)의 신호 중에서 외부기기(110, 120)의 노이즈 성분(S2, S3)을 분리하여 제거할 수 있다.Next, referring to FIG. 12, the processor 290 of the electronic device 100 determines a noise component corresponding to the sound received from the location of the external devices 110 and 120 in the received second sound S signal. Remove (S1220). At this time, based on the information on the location of the external devices 110 and 120 identified by the electronic device 100 (see 1010 of FIG. 10), the processor 290 of the electronic device 100 The location of the external devices 110 and 120 in which the noise components S2 and S3 included in the signal of the second sound S are generated may be determined. Accordingly, the processor 290 may separate and remove the noise components S2 and S3 of the external devices 110 and 120 from the obtained signal of the second sound S.
제2소리(S)의 신호 중에서 외부기기(110, 120)의 노이즈 성분(S2, S3)을 분리 제거하기 위하여, 도 13에 도시된 바와 같이, 전자장치(100)의 프로세서(290)는 노이즈 제거 블록(1310)을 포함할 수 있다. 노이즈 제거 블록(1310)은 하드웨어 및/또는 소프트웨어의 조합으로 구현될 수 있다. 프로세서(290)의 노이즈 제거 블록(1310)은, 빔포밍 기술을 사용하여 제2소리(S)의 신호 중에서 외부기기(110, 120)의 노이즈 성분(S2, S3)을 분리하여 사용자의 발화음성(S1)을 추출할 수 있다. 구체적으로, 노이즈 제거 블록(1310)은 주파수 영역에서 국소 푸리에 변환을 이용해 제2소리(S)의 신호를 일정한 주파수 범위로 나누어 분리한 후, 각자 다른 방향에서 오는 신호 중 겹치는 주파수 범위를 제거함으로써 신호를 분리한다. 프로세서(290)는 도 10에 도시된 바와 같은 테이블(1010)을 참조하여, 노이즈가 발생될 수 있는 외부기기(110, 120)가 존재하는지 여부를 확인한다. 예컨대, 프로세서(290)는 테이블(1010)에서, 네트워크 및 전원이 연결되어 있는 외부기기 1 및 2(110, 120)가 존재하는 것으로 확인한다. 이어, 도 13에 도시된 바와 같이, 프로세서(290)는 외부기기 1 및 2(110, 120)의 위치 정보(θ1, θ2)를 이용하여, 제2소리(S)의 신호 중에서 외부기기(110, 120)의 노이즈 성분(S2, S3)에 대응하는 주파수 범위를 제거하여 제2소리(S)의 신호 중 사용자의 발화음성(S1)을 추출할 수 있다. 마지막으로, 도 12를 다시 참조하면, 프로세서(290)는 노이즈 성분(S2, S3)이 제거된 신호(S1)에 기초하여 사용자의 발화음성을 인식한다(S1230).In order to separate and remove the noise components S2 and S3 of the external devices 110 and 120 from the signal of the second sound S, as shown in FIG. 13, the processor 290 of the electronic device 100 is A removal block 1310 may be included. The noise removal block 1310 may be implemented by a combination of hardware and/or software. The noise removal block 1310 of the processor 290 separates the noise components (S2, S3) of the external devices 110 and 120 from the signal of the second sound (S) using beamforming technology to (S1) can be extracted. Specifically, the noise removal block 1310 divides the signal of the second sound (S) into a certain frequency range using a local Fourier transform in the frequency domain, and then removes the overlapping frequency range among signals coming from different directions. Separate The processor 290 refers to the table 1010 as illustrated in FIG. 10 to check whether external devices 110 and 120 that may generate noise exist. For example, in the table 1010, the processor 290 confirms that external devices 1 and 2 (110, 120) to which a network and power are connected exist. Subsequently, as shown in FIG. 13, the processor 290 uses the location information (θ1, θ2) of the external devices 1 and 2 (110, 120), and the external device 110 among the signals of the second sound (S). By removing the frequency range corresponding to the noise components S2 and S3 of 120), the user's speech voice S1 may be extracted from the signal of the second sound S. Finally, referring to FIG. 12 again, the processor 290 recognizes the user's speech speech based on the signal S1 from which the noise components S2 and S3 have been removed (S1230).
본 발명의 일 실시예에 따르면, 전자장치(100)는 외부기기의 존재 및 위치를 식별하여, 외부기기가 존재하는 방향으로부터 발생하는 소리는 노이즈로 구분하므로, 획득된 제2소리(S) 중 노이즈가 되는 신호를 구별하여 제거한 뒤 사용자의 발화음성(S1)을 얻을 수 있다. 즉, 일 실시예에 따르면, 획득된 제2소리(S)의 신호 중 어느 것이 사용자의 발화음성인지, 외부기기로 발생하는 소리인지를 구별함으로써, 단순히 소리의 크기 차이를 이용하여 큰 소리가 발생하는 방향을 유효한 방향으로 두고, 이에 포커스를 맞추어 사용자의 발화음성을 분리하는 기존 기술에 비하여, 사용자의 발화음성을 인식함에 있어 정확성을 높일 수 있고, 외부기기의 위치를 미리 식별해 두어 음성 처리의 속도가 빠르다. According to an embodiment of the present invention, since the electronic device 100 identifies the presence and location of the external device and classifies the sound generated from the direction in which the external device is present as noise, the obtained second sound S After discriminating and removing the signal that becomes noise, the user's speech voice (S1) can be obtained. That is, according to an embodiment, by discriminating which of the acquired signals of the second sound (S) is the user's speech voice or the sound generated by an external device, a loud sound is generated simply by using the difference in the loudness of the sound. Compared to the existing technology that separates the user's spoken voice by setting the direction to the effective direction and focusing thereon, the accuracy in recognizing the user's spoken voice can be improved, and the location of the external device is identified in advance so that the voice processing can be performed. The speed is fast.
도 14는 본 발명의 일 실시예에 따른 전체 시스템을 도시한 도면이고, 도 15는 해당 시스템에 따른 전자장치의 동작 흐름도를 도시한 도면이다. 앞선 실시예에 따르면, 전자장치(100)가 외부기기(110, 120)로부터 발생하는 소리를 수신하여 이들의 위치를 식별하였으나, 본 도면에서는 전자장치(100)뿐만 아니라 외부기기(110, 120)들도 각자의 위치에서 나머지 장치들의 위치를 식별하여 상호간의 위치를 모두 파악할 수 있도록 한다. 이는 도 4에서 기술된 외부기기 위치 식별을 실행하는 시점에 도래한 경우, 각 장치들을 상호간의 위치를 파악할 수 있다. 본 발명의 일 실시예에 따르면, 전자장치(100)의 프로세서(290)는 통신부(210)를 통해 네트워크에 연결되어 있는 외부기기의 목록을 수신할 수 있고, 이를 저장부(260)에 저장할 수 있다(S1510). 프로세서는 목록에 존재하는 외부기기의 위치를 식별하고 저장할 수 있다(S1520). 이 과정을 마친 뒤, 프로세서(290)는 기 저장한 네트워크에 연결되어 있는 외부기기의 목록을 참조하여 목록에 존재하는 모든 외부기기의 위치를 식별하였는지 여부를 판단한다(S1530). 목록에 존재하는 모든 외부기기의 위치를 식별하였다면(S1530의 Yes), 프로세서(290)는 동작을 종료한다. 만약 위치를 식별하지 않은 외부기기가 존재한다면(S1530의 No), 프로세서(290)는 다시 식별하지 않은 외부기기의 위치를 식별하는 과정을 수행한다. 이와 같은 과정은 전자장치에 대해 완료된 경우, 각 외부기기에 대해서도 마찬가지로 수행한다. 따라서 한정된 공간 내에 존재하고, 네트워크에 연결된 모든 기기들은 자신이 아닌 다른 외부기기의 위치를 식별할 수 있고 본 발명이 적용 가능하다.14 is a diagram illustrating an entire system according to an embodiment of the present invention, and FIG. 15 is a diagram illustrating an operation flowchart of an electronic device according to the corresponding system. According to the previous embodiment, the electronic device 100 received sound generated from the external devices 110 and 120 and identified their locations, but in this drawing, not only the electronic device 100 but also the external devices 110 and 120 They also identify the locations of the rest of the devices in their respective locations so that they can see all of their locations. When this occurs at the time when the external device location identification described in FIG. 4 is performed, the locations of each device can be identified. According to an embodiment of the present invention, the processor 290 of the electronic device 100 may receive a list of external devices connected to the network through the communication unit 210 and store the list in the storage unit 260. Yes (S1510). The processor may identify and store the location of the external device existing in the list (S1520). After completing this process, the processor 290 determines whether the locations of all external devices existing in the list have been identified by referring to a list of external devices connected to the previously stored network (S1530). If the locations of all external devices in the list are identified (Yes in S1530), the processor 290 ends the operation. If there is an external device that has not identified the location (No in S1530), the processor 290 performs a process of identifying the location of the external device that has not been identified again. When such a process is completed for the electronic device, it is similarly performed for each external device. Accordingly, all devices existing in a limited space and connected to the network can identify the location of external devices other than themselves, and the present invention can be applied.
도 16은 본 발명의 일 실시예에 따른 음성처리 후 시스템을 도시한 도면이다. 본 실시예에 따르면, 도 13과 같이, 노이즈 제거블록(1310)에서 사용자의 발화 음성(S1)을 획득하는 음성 전처리 과정을 거친 뒤에는, 전자장치(100)의 통신부는 장치 간을 연결하는 블루투스 모듈(1610)을 이용해 외부기기(110, 120)의 볼륨을 조절하도록 제어명령을 외부기기의 통신부로 전송한다. 외부기기의 제어부(1620)는 이에 따라 외부기기의 볼륨을 조절한다. 이 때, 전자장치(100)에서 사용자의 음성인식이 완료되면 자동으로 타 기기의 볼륨이 원래 상태로 복구되도록 설정한다. 블루투스 모듈의 경우 와이파이로 쉽게 대체 가능하다. 다만, 본 실시예는 도 13과 같은 음성 전처리 과정을 통해 얻은 발화 음성뿐만 아니라, 전자장치가 외부기기의 노이즈 영향이 없는 것으로 확인되는 경우 등에도 적용 가능하고 어느 하나에 한정되는 것은 아니다.16 is a diagram illustrating a system after speech processing according to an embodiment of the present invention. According to the present embodiment, as shown in FIG. 13, after a voice pre-processing process of acquiring the user's spoken voice S1 in the noise removal block 1310, the communication unit of the electronic device 100 is a Bluetooth module that connects the devices. A control command is transmitted to the communication unit of the external device to adjust the volume of the external devices 110 and 120 using (1610). The controller 1620 of the external device adjusts the volume of the external device accordingly. In this case, when the user's voice recognition is completed in the electronic device 100, the volume of the other device is automatically restored to its original state. In the case of a Bluetooth module, it can be easily replaced with Wi-Fi. However, the present embodiment is applicable not only to the speech obtained through the voice preprocessing process as shown in FIG. 13, but also to a case where it is confirmed that the electronic device does not have an influence of noise from an external device, and is not limited to any one.

Claims (15)

  1. 전자장치에 있어서,In the electronic device,
    마이크로폰;microphone;
    외부기기와 통신하는 통신부; 및A communication unit that communicates with an external device; And
    상기 통신부를 통하여 상기 외부기기에 제1소리를 출력하도록 요청하고,Request to output the first sound to the external device through the communication unit,
    상기 마이크로폰에 상기 제1소리가 수신되는 방향에 기초하여 식별된 상기 외부기기의 위치에 관한 정보에 기초하여, 상기 마이크로폰에 수신되는 제2소리의 신호에서 상기 외부기기의 위치로부터 수신되는 소리에 대응하는 노이즈 성분을 제거하고,Corresponds to the sound received from the location of the external device in the signal of the second sound received by the microphone based on information on the location of the external device identified based on the direction in which the first sound is received by the microphone Remove the noise component,
    상기 노이즈 성분이 제거된 신호에 기초하여 사용자의 발화를 인식하는 프로세서를 포함하는 전자장치.An electronic device comprising a processor for recognizing a user's speech based on the signal from which the noise component has been removed.
  2. 제1항에 있어서, 상기 프로세서는, 상기 제1소리에 대하여 미리 정의된 특성에 기초하여 상기 제1소리가 수신되는지 여부를 식별하는 전자장치.The electronic device of claim 1, wherein the processor identifies whether the first sound is received based on a predefined characteristic of the first sound.
  3. 제1항에 있어서,The method of claim 1,
    저장부를 더 포함하고,Further comprising a storage unit,
    상기 프로세서는, 상기 마이크로폰에 상기 제1소리가 수신되는 방향에 기초하여 상기 외부기기의 위치를 식별하고,The processor identifies the location of the external device based on the direction in which the first sound is received by the microphone,
    상기 식별된 외부기기의 위치에 관한 정보를 상기 저장부에 저장하는 전자장치.An electronic device that stores information on the location of the identified external device in the storage unit.
  4. 제2항에 있어서,The method of claim 2,
    상기 특성은, 상기 외부기기의 위치 식별 동작에 관한 안내 관련 정보를 포함하는 전자장치.The characteristic is an electronic device including information related to guidance on a location identification operation of the external device.
  5. 제2항에 있어서,The method of claim 2,
    상기 특성은, 비가청 주파수 대역을 포함하는 전자장치.The characteristic includes an inaudible frequency band.
  6. 제2항에 있어서,The method of claim 2,
    상기 프로세서는, 상기 외부기기에 상기 특성을 가지는 제1소리를 출력하도록 요청하는 전자장치.The processor is an electronic device that requests the external device to output the first sound having the characteristic.
  7. 제2항에 있어서,The method of claim 2,
    상기 프로세서는, 상기 통신부를 통해 서버로부터 상기 특성을 수신하는 전자장치.The processor is an electronic device that receives the characteristic from a server through the communication unit.
  8. 제1항에 있어서,The method of claim 1,
    사용자입력부를 더 포함하고,Further comprising a user input unit,
    상기 프로세서는, 상기 사용자입력부에 입력된 사용자의 명령에 기초하여, 상기 외부기기의 위치를 식별하는 전자장치.The processor is an electronic device that identifies the location of the external device based on a user command input to the user input unit.
  9. 제1항에 있어서,The method of claim 1,
    상기 저장부는 상기 외부기기의 위치 식별을 실행하는 시점에 관한 정보를 저장하고,The storage unit stores information on a time point at which the location identification of the external device is performed,
    상기 프로세서는, 상기 저장된 정보에 기초하여 상기 실행하는 시점에 상기 외부기기의 위치를 식별하는 전자장치.The processor is an electronic device that identifies the location of the external device at the execution time based on the stored information.
  10. 제1항에 있어서,The method of claim 1,
    상기 프로세서는, The processor,
    상기 통신부를 통해 서버로부터 상기 외부기기의 정보를 수신하고,Receiving information of the external device from the server through the communication unit,
    상기 수신된 정보에 기초하여, 상기 외부기기의 위치를 식별하는 전자장치.An electronic device that identifies the location of the external device based on the received information.
  11. 제1항에 있어서,The method of claim 1,
    스피커를 더 포함하고,Including more speakers,
    상기 프로세서는, The processor,
    상기 통신부를 통하여 상기 외부기기로부터 상기 전자장치의 위치 식별을 위한 제3소리의 출력 요청을 수신하고,Receiving a request for outputting a third sound for identifying the location of the electronic device from the external device through the communication unit,
    상기 스피커가 상기 제3소리를 출력하도록 제어하는 전자장치.An electronic device that controls the speaker to output the third sound.
  12. 전자장치의 제어방법에 있어서,In the control method of an electronic device,
    통신부를 통해 외부기기와 통신하여 상기 외부기기에 제1소리를 출력하도록 요청하는 단계;Communicating with an external device through a communication unit and requesting to output a first sound to the external device;
    마이크로폰에 상기 제1소리가 수신되는 방향에 기초하여 식별된 외부기기의 위치에 관한 정보에 기초하여, 상기 마이크로폰에 수신되는 제2소리의 신호에서 상기 외부기기의 위치로부터 수신되는 소리에 대응하는 노이즈 성분을 제거하는 단계; 및Noise corresponding to the sound received from the location of the external device in the signal of the second sound received by the microphone based on information on the location of the external device identified based on the direction in which the first sound is received by the microphone Removing the ingredients; And
    상기 노이즈 성분이 제거된 신호에 기초하여 사용자의 발화를 인식하는 단계를 포함하는 전자장치의 제어방법.And recognizing a user's speech based on the signal from which the noise component has been removed.
  13. 제12항에 있어서,The method of claim 12,
    상기 제1소리에 대하여 미리 정의된 특성을 저장하는 단계; 및Storing a predefined characteristic for the first sound; And
    상기 미리 정의된 특성에 기초하여 상기 제1소리가 수신되는지 여부를 식별하는 단계를 포함하는 전자장치의 제어방법.And identifying whether the first sound is received based on the predefined characteristic.
  14. 제12항에 있어서,The method of claim 12,
    상기 마이크로폰에 상기 제1소리가 수신되는 방향에 기초하여 상기 외부기기의 위치를 식별하는 단계;Identifying a location of the external device based on a direction in which the first sound is received by the microphone;
    상기 식별된 외부기기의 위치에 관한 정보를 저장부에 저장하는 단계를 포함하는 전자장치의 제어방법.And storing information on the location of the identified external device in a storage unit.
  15. 제12항에 있어서,The method of claim 12,
    상기 특성은, 상기 외부기기의 위치 식별 동작에 관한 안내 관련 정보를 포함하는 전자장치의 제어방법.The characteristic is a control method of an electronic device including information related to guidance on a location identification operation of the external device.
PCT/KR2020/011937 2019-11-05 2020-09-04 Electronic device and control method thereof WO2021091063A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020190140145A KR20210054246A (en) 2019-11-05 2019-11-05 Electorinc apparatus and control method thereof
KR10-2019-0140145 2019-11-05

Publications (1)

Publication Number Publication Date
WO2021091063A1 true WO2021091063A1 (en) 2021-05-14

Family

ID=75848890

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/011937 WO2021091063A1 (en) 2019-11-05 2020-09-04 Electronic device and control method thereof

Country Status (2)

Country Link
KR (1) KR20210054246A (en)
WO (1) WO2021091063A1 (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117040940B (en) * 2023-10-10 2023-12-19 成都运荔枝科技有限公司 Equipment data encryption method based on Internet of things

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011118124A (en) * 2009-12-02 2011-06-16 Murata Machinery Ltd Speech recognition system and recognition method
KR101082839B1 (en) * 2008-12-22 2011-11-11 한국전자통신연구원 Method and apparatus for multi channel noise reduction
KR20180107637A (en) * 2017-03-22 2018-10-02 삼성전자주식회사 Electronic device and controlling method thereof
KR20190096305A (en) * 2019-07-29 2019-08-19 엘지전자 주식회사 Intelligent voice recognizing method, voice recognizing apparatus, intelligent computing device and server
JP2019176332A (en) * 2018-03-28 2019-10-10 株式会社フュートレック Speech extracting device and speech extracting method

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101082839B1 (en) * 2008-12-22 2011-11-11 한국전자통신연구원 Method and apparatus for multi channel noise reduction
JP2011118124A (en) * 2009-12-02 2011-06-16 Murata Machinery Ltd Speech recognition system and recognition method
KR20180107637A (en) * 2017-03-22 2018-10-02 삼성전자주식회사 Electronic device and controlling method thereof
JP2019176332A (en) * 2018-03-28 2019-10-10 株式会社フュートレック Speech extracting device and speech extracting method
KR20190096305A (en) * 2019-07-29 2019-08-19 엘지전자 주식회사 Intelligent voice recognizing method, voice recognizing apparatus, intelligent computing device and server

Also Published As

Publication number Publication date
KR20210054246A (en) 2021-05-13

Similar Documents

Publication Publication Date Title
WO2018208026A1 (en) User command processing method and system for adjusting output volume of sound to be output, on basis of input volume of received voice input
KR20200109954A (en) Method for location inference of IoT device, server and electronic device supporting the same
WO2020054980A1 (en) Phoneme-based speaker model adaptation method and device
US10909332B2 (en) Signal processing terminal and method
WO2021091145A1 (en) Electronic apparatus and method thereof
WO2021091063A1 (en) Electronic device and control method thereof
WO2019147034A1 (en) Electronic device for controlling sound and operation method therefor
KR20200057501A (en) ELECTRONIC APPARATUS AND WiFi CONNECTING METHOD THEREOF
US11423893B2 (en) Response to secondary inputs at a digital personal assistant
US11942089B2 (en) Electronic apparatus for recognizing voice and method of controlling the same
US20220189478A1 (en) Electronic apparatus and method of controlling the same
KR20220015306A (en) Electronic device and control method thereof
WO2019103340A1 (en) Electronic device and control method thereof
WO2021107371A1 (en) Electronic device and control method therefor
WO2021141332A1 (en) Electronic device and control method therefor
WO2021141330A1 (en) Electronic device and control method therefor
KR20210071664A (en) Electronic apparatus and the method thereof
WO2021107464A1 (en) Electronic device and control method thereof
WO2023074956A1 (en) System comprising tv and remote control, and control method therefor
WO2021118032A1 (en) Electronic device and control method therefor
US11482230B2 (en) Communication method between different electronic devices, server and electronic device supporting same
JP7489928B2 (en) Information processing device, system, device control device, and program for operating a device by voice
WO2022097970A1 (en) Electronic device and control method thereof
WO2022019458A1 (en) Electronic device and control method therefor
WO2023068480A1 (en) Electronic device, terminal device, and method for controlling same

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20884432

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20884432

Country of ref document: EP

Kind code of ref document: A1