WO2021112391A1 - Dispositif électronique et son procédé de commande - Google Patents
Dispositif électronique et son procédé de commande Download PDFInfo
- Publication number
- WO2021112391A1 WO2021112391A1 PCT/KR2020/013895 KR2020013895W WO2021112391A1 WO 2021112391 A1 WO2021112391 A1 WO 2021112391A1 KR 2020013895 W KR2020013895 W KR 2020013895W WO 2021112391 A1 WO2021112391 A1 WO 2021112391A1
- Authority
- WO
- WIPO (PCT)
- Prior art keywords
- sound
- user
- electronic device
- component
- processor
- Prior art date
Links
- 238000000034 method Methods 0.000 title claims description 36
- 230000005236 sound signal Effects 0.000 claims description 24
- 238000004590 computer program Methods 0.000 claims description 7
- 238000012545 processing Methods 0.000 claims description 7
- 238000013528 artificial neural network Methods 0.000 description 23
- 238000010586 diagram Methods 0.000 description 20
- 230000006870 function Effects 0.000 description 13
- 238000005516 engineering process Methods 0.000 description 9
- 238000000926 separation method Methods 0.000 description 9
- 238000004891 communication Methods 0.000 description 8
- 239000004973 liquid crystal related substance Substances 0.000 description 5
- 230000005540 biological transmission Effects 0.000 description 4
- 238000012549 training Methods 0.000 description 4
- 230000003287 optical effect Effects 0.000 description 3
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000007781 pre-processing Methods 0.000 description 2
- 230000000306 recurrent effect Effects 0.000 description 2
- 230000001133 acceleration Effects 0.000 description 1
- 238000013473 artificial intelligence Methods 0.000 description 1
- 230000002457 bidirectional effect Effects 0.000 description 1
- 239000002131 composite material Substances 0.000 description 1
- 238000007405 data analysis Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 239000004615 ingredient Substances 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- APTZNLHMIGJTEW-UHFFFAOYSA-N pyraflufen-ethyl Chemical compound C1=C(Cl)C(OCC(=O)OCC)=CC(C=2C(=C(OC(F)F)N(C)N=2)Cl)=C1F APTZNLHMIGJTEW-UHFFFAOYSA-N 0.000 description 1
- 230000005855 radiation Effects 0.000 description 1
- 230000004044 response Effects 0.000 description 1
- 239000000758 substrate Substances 0.000 description 1
- 238000005406 washing Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L21/00—Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
- G10L21/02—Speech enhancement, e.g. noise reduction or echo cancellation
- G10L21/0208—Noise filtering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/011—Arrangements for interaction with the human body, e.g. for user immersion in virtual reality
- G06F3/014—Hand-worn input/output arrangements, e.g. data gloves
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/017—Gesture based interaction, e.g. based on a set of recognized hand gestures
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F3/00—Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
- G06F3/01—Input arrangements or combined input and output arrangements for interaction between user and computer
- G06F3/048—Interaction techniques based on graphical user interfaces [GUI]
- G06F3/0484—Interaction techniques based on graphical user interfaces [GUI] for the control of specific functions or operations, e.g. selecting or manipulating an object, an image or a displayed text element, setting a parameter value or selecting a range
-
- G—PHYSICS
- G10—MUSICAL INSTRUMENTS; ACOUSTICS
- G10L—SPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
- G10L15/00—Speech recognition
- G10L15/22—Procedures used during a speech recognition process, e.g. man-machine dialogue
-
- H—ELECTRICITY
- H04—ELECTRIC COMMUNICATION TECHNIQUE
- H04R—LOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
- H04R5/00—Stereophonic arrangements
- H04R5/027—Spatial or constructional arrangements of microphones, e.g. in dummy heads
Definitions
- the present invention relates to an electronic device for performing a voice recognition function and a method for controlling the same.
- the electronic device capable of voice recognition uses beamforming technology to extract the user's voice. Beamforming works by extracting an audio signal from one direction and removing the audio component from the other direction, creating a spatial filter. It automatically tracks the direction of multiple sound sources from a microphone array system composed of multiple microphones, and based on this, a direction-based sound source separation technology such as GSS (Geometric Source Separation) is applied to separate a specific sound from noisy ambient noise.
- GSS Global System for Microwave Access
- a plurality of sound components having different directions are obtained from a sound signal received through a microphone, and a noise direction from among the obtained sound components based on a user direction designation and a processor that identifies a sound component of , and performs user voice recognition on the received sound based on the identified sound component in the noise direction.
- the processor may remove the identified sound component in the noise direction from the plurality of acquired sound components, and perform the user voice recognition based on the sound component from which the sound component in the noise direction is removed.
- the user direction designation may include designation of the noise direction.
- the electronic device may further include a user input unit for receiving a user input for designating the user direction.
- the user input may include at least one of a button input, a touch input, and a gesture input.
- the user input unit may have a receiving area for the user input, the microphone may include a plurality of sub-microphones disposed in different directions with respect to the receiving area, and the processor may include the plurality of sub-microphones.
- the sound component in the noise direction may be identified based on the direction of the sub-microphone corresponding to the position of the user input received in the reception area.
- the electronic device may further include a display, wherein the processor displays a GUI for designating the user direction on the display, and the noise direction based on the user input using the GUI. can identify the sound components of
- the electronic device may further include an interface unit, wherein the processor receives the information regarding the user direction designation from an external device through the interface unit, and receives the noise based on the received information.
- the sound component of the direction can be identified.
- the processor may recognize, as a user voice component, a sound component having a predefined length among sound components excluding the sound component in the noise direction among the plurality of sound components.
- the processor may recognize a second sound component having a shorter length than the first sound component among the sound components having the predefined length as the user voice component.
- the processor may identify a positional movement of the microphone, and identify a sound component in the noise direction based on the positional movement of the microphone.
- the processor may identify the position movement of the microphone and guide the user to re-enter the user direction designation.
- the processor obtains a sound component from a sound signal received through the microphone, and identifies that the obtained sound component is either a user voice component or a sound component in the noise direction based on the user direction designation,
- the user voice recognition may be performed on the identified user voice component, and the user voice recognition may not be performed on the identified noise direction sound component.
- the electronic device may further include a voice recognition unit that performs a processing operation related to the user's voice recognition, wherein the processor transmits the user's voice component to the voice recognition unit, and the noise direction The sound component of may not be transmitted to the voice recognition unit.
- a voice recognition unit that performs a processing operation related to the user's voice recognition, wherein the processor transmits the user's voice component to the voice recognition unit, and the noise direction The sound component of may not be transmitted to the voice recognition unit.
- the method comprising: acquiring a plurality of sound components having different directions from a sound signal received through a microphone; identifying a sound component in a noise direction from among the plurality of acquired sound components based on user direction designation; and performing user voice recognition on the received sound based on the identified sound component in the noise direction.
- the performing of the user voice recognition may include: removing a sound component in the identified noise direction from the plurality of acquired sound components; and performing the user's voice recognition based on the sound component from which the sound component in the noise direction is removed.
- the user direction designation may include designation of the noise direction.
- the step of identifying the sound component in the noise direction may include: based on a direction of a sub-microphone corresponding to a position of the user input received in a receiving area of a user input among a plurality of sub-microphones arranged in different directions, the noise direction It may include the step of identifying a sound component of
- a method of controlling an electronic device includes: acquiring a sound component from a sound signal received through the microphone; identifying, based on the user direction designation, that the obtained sound component is any one of a user voice component and a noise direction sound component; and performing the user voice recognition only when the identified user voice component is a user voice component.
- the control method of the electronic device includes a sound signal received through a microphone in different directions from each other. obtaining a plurality of sound components; identifying a sound component in a noise direction from among the plurality of acquired sound components based on user direction designation; and performing user voice recognition on the received sound based on the identified sound component in the noise direction.
- FIG. 1 is a diagram illustrating an entire system according to an embodiment of the present invention.
- FIG. 2 is a block diagram illustrating the configuration of an electronic device according to an embodiment of the present invention.
- FIG. 3 is a diagram illustrating an operation flowchart of an electronic device according to an embodiment of the present invention.
- FIG. 4 is a diagram illustrating an operation of an electronic device according to an embodiment of the present invention.
- FIG. 5 is a diagram illustrating an operation of an electronic device according to an embodiment of the present invention.
- FIG. 6 is a diagram illustrating an operation of an electronic device according to an embodiment of the present invention.
- FIG. 7 is a diagram illustrating an operation state according to movement of a microphone according to an embodiment of the present invention.
- FIG. 8 is a diagram illustrating a flowchart of an operation performed by the electronic device according to the present embodiment.
- FIG. 9 is a diagram illustrating a state of recognizing a sound component according to an embodiment of the present invention.
- FIG. 10 is a diagram illustrating an operation flowchart of an electronic device according to an embodiment of the present invention.
- a 'module' or 'unit' performs at least one function or operation, and may be implemented as hardware or software or a combination of hardware and software, and is integrated into at least one module. and can be implemented.
- at least one of the plurality of elements refers to all of the plurality of elements as well as each one or a combination thereof excluding the rest of the plurality of elements.
- the electronic device 100 may be implemented as a display device capable of displaying an image.
- the electronic device 100 may include a TV, a computer, a smart phone, a tablet, a portable media player, a wearable device, a video wall, an electronic picture frame, and the like.
- the electronic device 100 is an AI assistance device (AI speaker, etc.) that does not have a display, an image processing device such as a Bluetooth speaker, a set-top box, household appliances such as a refrigerator, a washing machine, an information processing device such as a computer body, etc. It can be implemented with various types of devices.
- the electronic device 100 the user 110 , the speaker 121 , and a plurality of people 122 exist in the use space.
- the sound received by the electronic device 100 includes the user 110's voice, the sound from the speaker 121, and/or a plurality of people ( 122) may be mixed.
- the electronic device 100 processes the received sound, it is difficult to distinguish which sound signal is a signal generated by the user's utterance.
- the user 110 designates the direction of the electronic device 100 so that the sound emitted from the speaker 121 or a plurality of people 122 is recognized as noise, and the electronic device 100 receives the input sound. It is recognized by removing the signal coming from the direction designated as noise among the signals.
- the user 110 specifies the direction in which the speaker 121 and the plurality of people 122 are located as the direction of noise and speaks to use the voice recognition function of the electronic device 100, the speaker 121 or Since the sound signal emitted from the plurality of people 122 can be removed, the electronic device 100 recognizes only the voice uttered by the user 110 so that more accurate voice recognition is possible.
- the user 110 may designate the direction in which the electronic device 100 recognizes the direction of his/her own utterance, not the direction of the noise, but is not limited thereto.
- the direction designated by the user is the direction of noise.
- FIG. 2 is a block diagram illustrating the configuration of an electronic device according to an embodiment of the present invention.
- the electronic device 100 may include an interface unit 210 .
- the interface unit 210 may include a wired interface unit 211 .
- the wired interface unit 211 includes a connector or port to which an antenna capable of receiving a broadcast signal according to a broadcasting standard such as terrestrial/satellite broadcasting can be connected, or a cable capable of receiving a broadcast signal according to the cable broadcasting standard can be connected. do.
- the electronic device 100 may have a built-in antenna capable of receiving a broadcast signal.
- the wired interface unit 211 is configured according to video and/or audio transmission standards, such as HDMI port, DisplayPort, DVI port, Thunderbolt, composite video, component video, super video, SCART, etc.
- the wired interface unit 211 may include a connector or port according to a universal data transmission standard such as a USB port.
- the wired interface unit 211 may include a connector or a port to which an optical cable can be connected according to an optical transmission standard.
- the wired interface unit 211 is connected to an external microphone or an external audio device having a microphone, and may include a connector or a port capable of receiving or inputting an audio signal from the audio device.
- the wired interface unit 211 is connected to an audio device such as a headset, earphone, or external speaker, and may include a connector or port capable of transmitting or outputting an audio signal to the audio device.
- the wired interface unit 211 may include a connector or port according to a network transmission standard such as Ethernet.
- the wired interface unit 211 may be implemented as a LAN card connected to a router or a gateway by wire.
- the wired interface unit 211 is wired through the connector or port in a 1:1 or 1:N (N is a natural number) method such as an external device such as a set-top box, an optical media playback device, or an external display device, speaker, server, etc. By being connected, a video/audio signal is received from the corresponding external device or a video/audio signal is transmitted to the corresponding external device.
- the wired interface unit 211 may include a connector or a port for separately transmitting video/audio signals.
- the wired interface unit 211 is embedded in the electronic device 100 , but may be implemented in the form of a dongle or a module to be detachably attached to the connector of the electronic device 100 .
- the interface unit 210 may include a wireless interface unit 212 .
- the wireless interface unit 212 may be implemented in various ways corresponding to the implementation form of the electronic device 100 .
- the wireless interface unit 212 is a communication method RF (radio frequency), Zigbee (Zigbee), Bluetooth (bluetooth), Wi-Fi (Wi-Fi), UWB (Ultra WideBand) and NFC (Near Field Communication), etc.
- Wireless communication can be used.
- the wireless interface unit 212 may be implemented as a wireless communication module that performs wireless communication with an AP according to a Wi-Fi method, or a wireless communication module that performs one-to-one direct wireless communication such as Bluetooth.
- the wireless interface unit 212 may transmit/receive data packets to/from the server by wirelessly communicating with the server on the network.
- the wireless interface unit 212 may include an IR transmitter and/or an IR receiver capable of transmitting and/or receiving an IR (Infrared) signal according to an infrared communication standard.
- the wireless interface unit 212 may receive or input a remote control signal from a remote control or other external device through an IR transmitter and/or an IR receiver, or transmit or output a remote control signal to another external device.
- the electronic device 100 may transmit/receive a remote control signal to and from the remote control or other external device through the wireless interface unit 212 of another method such as Wi-Fi or Bluetooth.
- the electronic device 100 may further include a tuner for tuning the received broadcast signal for each channel.
- the electronic device 100 may include a display unit 220 .
- the display unit 220 includes a display panel capable of displaying an image on the screen.
- the display panel is provided with a light-receiving structure such as a liquid crystal type or a self-luminous structure such as an OLED type.
- the display unit 220 may further include additional components according to the structure of the display panel. For example, if the display panel is a liquid crystal type, the display unit 220 includes a liquid crystal display panel and a backlight unit for supplying light. and a panel driving substrate for driving the liquid crystal of the liquid crystal display panel.
- the electronic device 100 may include a user input unit 230 .
- the user input unit 230 includes various types of input interface related circuits provided to perform user input.
- the user input unit 230 may be configured in various forms depending on the type of the electronic device 100 , for example, a mechanical or electronic button unit of the electronic device 100 , a remote controller separated from the electronic device 100 , and an electronic device.
- the electronic device 100 may include a storage unit 240 .
- the storage unit 240 stores digitized data.
- the storage unit 240 has a non-volatile property that can preserve data regardless of whether or not power is provided, and data to be processed by the processor 270 is loaded, and data is stored when power is not provided. It includes memory of volatile properties that cannot. Storage includes flash-memory, hard-disc drive (HDD), solid-state drive (SSD), read-only memory (ROM), etc., and memory includes buffer and random access memory (RAM). etc.
- the electronic device 100 may include a microphone 250 .
- the microphone 250 collects sounds of the external environment including the user's voice.
- the microphone 250 transmits the collected sound signal to the processor 270 .
- the electronic device 100 may include a microphone 250 for collecting user voices or may receive a voice signal from an external device such as a remote controller having a microphone or a smart phone through the interface unit 210 .
- a remote controller application may be installed in an external device to control the electronic device 100 or perform functions such as voice recognition. In the case of an external device installed with such an application, a user's voice can be received, and the external device can transmit/receive and control data using the electronic device 100 and Wi-Fi/BT or infrared rays, etc.
- a plurality of interface units 210 that can be implemented may exist in the electronic device 100 .
- the electronic device 100 may include a speaker 260 .
- the speaker 260 outputs audio data processed by the processor 270 as sound.
- the speaker 260 may include a unit speaker provided to correspond to audio data of one audio channel, and may include a plurality of unit speakers to respectively correspond to audio data of a plurality of audio channels.
- the speaker 260 may be provided separately from the electronic device 100 . In this case, the electronic device 100 may transmit audio data to the speaker 260 through the interface unit 210 .
- the electronic device 100 may include a processor 270 .
- the processor 270 includes one or more hardware processors implemented with a CPU, a chipset, a buffer, a circuit, etc. mounted on a printed circuit board, and may be implemented as a system on chip (SOC) depending on a design method.
- SOC system on chip
- the processor 270 includes modules corresponding to various processes such as a demultiplexer, a decoder, a scaler, an audio digital signal processor (DSP), and an amplifier.
- DSP audio digital signal processor
- some or all of these modules may be implemented as SOC.
- a module related to image processing such as a demultiplexer, a decoder, and a scaler may be implemented as an image processing SOC
- an audio DSP may be implemented as a chipset separate from the SOC.
- the processor 270 may convert the voice signal into voice data.
- the voice data may be text data obtained through a speech-to-text (STT) process for converting a voice signal into text data.
- STT speech-to-text
- the processor 270 identifies a command indicated by the voice data, and performs an operation according to the identified command.
- the voice data processing process and the command identification and execution process may all be executed in the electronic device 100 .
- at least a part of the process is performed by at least one server communicatively connected to the electronic device 100 through a network. can be performed.
- the processor 270 may call at least one command among commands of software stored in a storage medium readable by a machine such as the electronic device 100 and execute it. This enables a device such as the electronic device 100 to be operated to perform at least one function according to the called at least one command.
- the one or more instructions may include code generated by a compiler or code executable by an interpreter.
- the device-readable storage medium may be provided in the form of a non-transitory storage medium.
- 'non-transitory' only means that the storage medium is a tangible device and does not include a signal (eg, electromagnetic wave), and this term refers to a case in which data is semi-permanently stored in a storage medium and a case in which data is temporarily stored. It does not distinguish between cases where
- the processor 270 obtains a plurality of sound components having different directions from the sound signal received through the microphone 250 , and relates to the received sound based on the sound component in the noise direction specified by the user 110 .
- Using at least one of machine learning, neural network, or deep learning algorithm as a rule-based or artificial intelligence algorithm for at least a part of data analysis, processing, and result information generation for performing user speech recognition can be done by
- the processor 270 may perform the functions of the learning unit and the recognition unit together.
- the learning unit may perform a function of generating a learned neural network
- the recognition unit may perform a function of recognizing (or inferring, predicting, estimating, and judging) data using the learned neural network.
- the learning unit may generate or update the neural network.
- the learning unit may acquire learning data to generate a neural network.
- the learning unit may acquire the learning data from the storage unit 240 or the outside.
- the learning data may be data used for learning of the neural network, and the neural network may be trained by using the data obtained by performing the above-described operation as learning data.
- the learning unit may perform a preprocessing operation on the acquired training data before training the neural network using the training data, or may select data to be used for learning from among a plurality of training data. For example, the learning unit may process the learning data into a preset format, filter it, or add/remove noise to process the learning data into a form suitable for learning.
- the learner may generate a neural network set to perform the above-described operation by using the preprocessed learning data.
- the learned neural network network may be composed of a plurality of neural network networks (or layers). Nodes of the plurality of neural networks have weights, and the plurality of neural networks may be connected to each other so that an output value of one neural network is used as an input value of another neural network.
- Examples of neural networks include Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN) and It can include models such as Deep Q-Networks.
- the recognizer may acquire target data to perform the above-described operation.
- the target data may be obtained from the storage unit 240 or the outside.
- the target data may be data to be recognized by the neural network.
- the recognizer may perform preprocessing on the acquired target data before applying the target data to the learned neural network, or select data to be used for recognition from among a plurality of target data.
- the recognition unit may process the target data into a preset format, filter, or add/remove noise to process the target data into a form suitable for recognition.
- the recognizer may obtain an output value output from the neural network by applying the preprocessed target data to the neural network.
- the recognition unit may obtain a probability value or a reliability value together with the output value.
- control method of the electronic device 100 may be provided by being included in a computer program product.
- the computer program product may include instructions in software executed by processor 270 , as described above.
- Computer program products may be traded between sellers and buyers as commodities.
- the computer program product is distributed in the form of a machine-readable storage medium (eg, CD-ROM), or via an application store (eg, Play StoreTM) or between two user devices (eg, smartphones). It can be distributed directly, online (eg, downloaded or uploaded). In the case of online distribution, at least a part of the computer program product may be temporarily stored or temporarily created in a machine-readable storage medium such as a memory of a server of a manufacturer, a server of an application store, or a relay server.
- a machine-readable storage medium such as a memory of a server of a manufacturer, a server of an application store, or a relay server.
- FIG. 3 is a diagram illustrating an operation flowchart of an electronic device according to an embodiment of the present invention.
- the electronic device 100 identifies the sound generated by the speaker 121 or the plurality of people 122 from the sound received by the electronic device 100 .
- a flowchart for performing user voice recognition is shown.
- the processor 270 acquires a plurality of sound components having different directions from the sound signal received through the microphone (S310).
- the microphone may be a microphone 250 built into the electronic device 100 or an external microphone connected to the electronic device 100 .
- the processor 270 may separate the sound signal received through the microphone into a plurality of sound components using a sound source separation technology such as geometric source separation (GSS).
- GSS is a spatial sound source separation technology, and refers to a technology that separates sound source signals using spatial information that can be obtained using a microphone array among sound source separation technologies.
- the user can designate the direction of the speaker 121 and the plurality of people 122 so that the electronic device 100 can identify a component in the noise direction among a plurality of sound components having different directions obtained by using the sound source separation technology.
- the time point at which the user designates the direction is not limited to any one.
- the user may designate the direction of an external device installed in the vicinity as noise during initial setting for voice recognition of the electronic device 100 , or may arbitrarily designate the direction of noise before performing voice recognition.
- the processor 270 determines the location where the user 110 performs voice recognition and a designated location.
- the direction is stored in the storage unit 250 and can be used when performing voice recognition.
- voice recognition is performed in a place such as a restaurant, the direction of noise may be temporarily designated, and the embodiment is not limited thereto.
- the processor 270 identifies a noise direction sound component among a plurality of acquired sound components based on the user direction designation ( S320 ). According to an embodiment of the present invention, the processor 270 divides the sound components in the direction of the speaker 121 and the plurality of people 122 specified by the user 110 from the plurality of sound components based on the user direction designation, can be identified. A detailed description of the classification and identification of sound components based on user direction designation will be described later. Then, the processor 270 performs user voice recognition on the received sound based on the identified sound component in the noise direction (S330). When the electronic device 100 simultaneously acquires a plurality of sounds, if the direction of the surrounding sound source is recognized, a component of a specific sound can be effectively separated or extracted.
- the electronic device when there is noise around the electronic device and the user can know it in advance, the electronic device can effectively recognize a sound to be recognized based on the designated direction by simply designating the corresponding direction.
- FIG. 4 is a diagram illustrating an operation of an electronic device according to an embodiment of the present invention. 4 illustrates a user's direction designation for identifying a sound component in a noise direction among a plurality of acquired sound components.
- the implementation form of the disclosed microphone 400 may be various, may be implemented as an independent device, or may be provided as one configuration in any device. Also, the microphone 400 may be provided in the electronic device 100 or may be provided outside the electronic device 100 . In the latter case, the electronic device 100 may receive sound information received by the microphone 400 through the interface unit 210 .
- the microphone 400 shown in FIG. 4 is located in a space where the user 110 can operate in the space of the electronic device 100 , and the speaker 121 and a plurality of people 122 are located around the microphone 400 . Assume it exists
- the microphone 400 includes, for example, a plurality of sub-microphones 410 arranged in a circle, and a touch panel 420 for receiving direction designation of the user 110 .
- the arrangement form of the plurality of sub-microphones 410 shown in FIG. 4 is only one example of the present disclosure, and each sub-microphone 410 so as to collect sounds that may be generated in various directions such as a polygonal shape, a straight line shape, and the like. ) may be arranged in various other forms, such as being arranged in various directions.
- the touch panel 420 has a receiving area capable of receiving a user's touch input. The receiving area of the touch panel 420 is provided to correspond to the arrangement of the plurality of sub-microphones 410 .
- the receiving area of the touch panel 420 is provided to cover the position where the plurality of sub-microphones 410 are arranged, the user recognizes the direction of the generated sound, and selects a portion corresponding to the sound direction in the receiving area. It is designed to be touchable.
- the touch panel 420 is only one input means for receiving the user's direction designating input, and a plurality of buttons arranged at a plurality of positions in the reception area to correspond to the plurality of sub-microphones 410 . It can be replaced with various other input means, such as
- the processor 270 of the electronic device 100 receives information of a touch input corresponding to user direction designation through the touch panel 420 of the microphone 400 .
- the processor 270 identifies a sound component in the noise direction based on the received touch input information.
- the processor 270 separates and recognizes the sound component in a specific direction among the sounds received by the microphone 400 . Specifically, the processor 270 determines that the microphone 400 is disposed so that, for example, when receiving a sound generated from a plurality of people 122, the positions of each of the plurality of sub-microphones 410 face different directions. In consideration of this, it can be identified that the characteristics of the sound signal received at the position of each sub-microphone 410 are different depending on the direction in which the plurality of people 122 are present.
- the processor 270 uses various methods such as a Direction Of Arrival (DOA) which is calculated using the time difference of the sounds input to the at least two sub-microphones 410 to determine the direction of the sound. information can be obtained.
- DOA Direction Of Arrival
- the processor 270 may analyze the separation of the sound received from each sub-microphone 410 into sound components in different directions, and identify that a specific sound component is prominently displayed in a similar direction.
- the processor 270 may recognize that the specific sound component is generated from the specific direction when the specific sound component is prominent in a similar direction in the plurality of sub-microphones 410 .
- the user 110 recognizes the directions of the speaker 121 and the plurality of people 122 , and two corresponding directions of the speaker 121 and the plurality of people 122 in the reception area of the touch panel 410 .
- Touch a location see 411, 412).
- the processor 270 receives information of a touch input regarding two positions (refer to 411 and 412 ) of the reception area of the touch panel 420 through the microphone 400 . Based on the received information and the predefined positional correspondence between the plurality of sub-microphones 410 and the reception areas of the touch panel 420 , the processor 270 generates two touch inputs among the plurality of sub-microphones 410 .
- the sub-microphones 411 and 412 corresponding to the positions are identified.
- the processor 270 performs a mutual position comparison between the identified sub-microphones 411 and 412 and the sound components of the plurality of specific directions mentioned above. That is, the processor 270 identifies a sound component in a direction corresponding to the positions of the sub-microphones 411 and 412 among a plurality of sound components in a specific direction. The processor 270 may identify a sound component generated from a direction corresponding to the positions of the sub-microphones 411 and 412 as a sound component in the noise direction.
- direction designation can be performed more accurately according to the arrangement and number of the plurality of sub-microphones, thereby increasing user convenience.
- FIG. 5 is a diagram illustrating an operation of an electronic device according to an embodiment of the present invention.
- a user input unit such as a touch panel 420 is not separately provided in the microphone 400 of FIG. 4 , it is connected to the microphone 400 and allows the user to designate a direction.
- a state of the electronic device 100 displaying the interface 510 on the display is shown.
- the GUI 510 displayed on the display includes a receiving area capable of receiving a user input related to direction designation.
- the GUI 510 according to an embodiment of the present invention has the same shape as that of the microphone 400 , and the receiving area of the GUI 510 is provided to correspond to the arrangement of the plurality of sub-microphones 410 .
- the user 110 can designate the direction of the noise by selecting the positions 521 and 522 of the reception area of the GUI 510 corresponding to the directions of the speaker 121 and the plurality of people 122 .
- the processor 270 may receive information of a user input corresponding to user direction designation through the GUI 510 .
- a user input for the user to designate a direction using the GUI 510 includes, for example, a pointer designation using a remote control, a button input, or a gesture input, and is not limited thereto. Since the positions and directions of the plurality of sub-microphones displayed on the GUI 510 correspond to the actual positions and directions of the plurality of sub-microphones 410 , the processor 270 generates a noise direction sound based on the received user input information. ingredients can be identified.
- a user input unit such as a touch panel
- it is connected to the electronic device 100 having a user input unit to receive a user input, so that the user can use it.
- a user input unit such as a touch panel
- FIG. 6 is a diagram illustrating an operation of an electronic device according to an embodiment of the present invention.
- FIG. 6 shows a case in which a microphone is built in the electronic device 100 unlike FIGS. 4 and 5 .
- two microphones 250 are built in the lower part.
- the electronic device 100 may further include a user input unit for receiving a user input for designating a user direction.
- the user 110 , the speaker 121 , and a plurality of people 122 exist in the use space, and sound is heard from the two microphones 250 of the electronic device 100 .
- the touch panel 620 is shown in the receiving radiation range 610 and the lower horizontal edge of the electronic device 100 .
- the touch panel 620 has a receiving area capable of receiving a user's touch input. In this embodiment, the receiving area of the touch panel 620 is provided to correspond to the direction of the noise.
- the processor 270 of the electronic device 100 may receive information of a touch input corresponding to user direction designation through the touch panel 620 of the electronic device 100 .
- the user 100 may designate by touching the portions 621 and 622 of the touch panel 620 corresponding to the direction of the noise.
- a method of recognizing an effective sound that is, a user's voice, is the same as the principle of FIGS.
- the user input unit may be implemented as a physical button such as a jog dial in addition to the GUI 510 displayed on the display of FIG. 5 or the touch panel 620 according to the present drawing.
- the electronic device 100 may further include an interface unit 210 , and the processor 270 includes an external device, for example, a remote controller and an electronic device separated from the electronic device 100 through the interface unit 210 . Information on user direction designation input through an input unit in an external device connected to 100 may be received.
- the user 110 can designate the direction of the sound in various ways, user convenience can be improved.
- the electronic device 700 is a diagram illustrating an operation state according to movement of a microphone according to an embodiment of the present invention. It is assumed that the microphone according to an embodiment of the present invention is a microphone built into the electronic device 700 .
- the electronic device 700 is a mobile electronic device such as a smart phone, as shown in FIG. 7 , the direction of the noise designated by the user must also be changed in response to the movement of the electronic device 700 .
- the electronic device 700 may further include a sensor, for example, an acceleration sensor and a gyro sensor, and through these, a moving direction, a moving distance, a moving speed, etc. may be identified when the electronic device 700 moves.
- the processor 270 may identify a positional movement of the microphone, and may identify a sound component in a noise direction based on the positional movement of the microphone. Alternatively, the processor 270 may identify the position movement of the microphone and guide the user to re-enter the user direction designation. This is a case in which an external microphone is connected to the electronic device and is applicable even when the microphone is movable, but is not limited thereto. According to an embodiment of the present invention, although noise is still present in the same place, it can be applied even when the user moves, so that user convenience can be improved.
- the electronic device 100 may consider the length of the acquired voice component in addition to the user designating the direction of the noise.
- the electronic device may include a recognition engine for recognizing Wake-Up Word (WUW) as a trigger signal for performing a voice recognition function, and a server voice recognition engine for recognizing a user's utterance after being called.
- WUW Wake-Up Word
- server voice recognition engine for recognizing a user's utterance after being called.
- the processor 270 may acquire K sound components having different directions from the received N sound signals ( S820 ). In this case, the processor 270 may use a sound source separation technology such as GSS to obtain sound components having different directions as described above.
- GSS sound source separation technology
- the user may designate the direction of Q noises.
- the user specifies at least one direction of noise (Yes in S830; Q>0)
- a target direction identification (TDI) algorithm that can be sorted can be applied.
- the remaining M sound components may be one depending on the use environment.
- the processor 270 may not perform a length comparison process using the TDI algorithm (No in S850).
- the processor 270 may identify a sound component longer than a predefined length Ls from among the plurality of sound components, and arrange the identified components in the order of length through the TDI algorithm ( S860 ).
- the predefined length Ls is a minimum length that must be satisfied in order for the continuity of the detected sound direction to be classified as a natural human speech.
- the processor 270 may recognize the shortest sound component as the user's voice component (S870).
- the user's voice component can be more effectively separated by effectively extracting the sound through the direction designation of the user's noise component and considering the length.
- FIG. 9 is a diagram illustrating a state of recognizing a sound component according to an embodiment of the present invention.
- This figure is a view showing the volume and frequency of the sound over time in an environment in which a continuous sound such as a TV sound is generated.
- the processor 270 automatically tracks the surrounding TV noise and the user's voice by using the TDI algorithm, sorts the ranks according to the length, and the result of extracting only the user's voice using the GSS is shown in the graph 920 below. Since only the TV sound is detected before the user speaks, and a short voice command is detected after 5 seconds, the processor 270 recognizes it as a sound to be recognized and can separate only the user's voice.
- the recognition priority can be identified only by the TDI algorithm
- the processor 270 may generate non-continuous noise, for example, when making a phone call or chatting around. If this is present, recognition may be performed based on a TDI algorithm including a technique for separating such noise from the user's voice.
- the processor 270 obtains a sound component from the sound signal received through the microphone 250 (S1010). In this case, the processor 270 may identify that the sound component obtained based on the user direction designation is either a user voice component or a noise direction sound component ( S1020 ). If the acquired sound component is the user's voice component (Yes in S1020), the processor may transmit the user's voice component to the voice recognition unit and perform voice recognition.
- the processor may include a voice recognition unit, and in some cases, a separate voice recognition unit may be provided, but is not limited thereto. If the acquired sound component is a noise-direction sound component (No in S1020), the noise-direction sound component may not be transmitted to the voice recognition unit.
- the acquired sound component when the acquired sound component is not the user's voice component, it is effective because the unnecessary recognition process can be eliminated by not transmitting the acquired sound component to the voice recognition unit.
Landscapes
- Engineering & Computer Science (AREA)
- Physics & Mathematics (AREA)
- Theoretical Computer Science (AREA)
- General Engineering & Computer Science (AREA)
- Human Computer Interaction (AREA)
- Acoustics & Sound (AREA)
- General Physics & Mathematics (AREA)
- Signal Processing (AREA)
- Computational Linguistics (AREA)
- Health & Medical Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Multimedia (AREA)
- Quality & Reliability (AREA)
- Circuit For Audible Band Transducer (AREA)
- User Interface Of Digital Computer (AREA)
Abstract
Un dispositif électronique selon un mode de réalisation de la présente invention comprend un processeur pour obtenir, à partir d'un signal sonore reçu par l'intermédiaire d'un microphone, une pluralité de composants sonores ayant des directions différentes, identifier un composant sonore dans une direction de bruit parmi la pluralité de composants sonores obtenus, sur la base d'une désignation de direction d'utilisateur, et effectuer une reconnaissance vocale d'utilisateur sur le son reçu sur la base du composant sonore identifié dans la direction du bruit.
Applications Claiming Priority (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
KR10-2019-0162062 | 2019-12-06 | ||
KR1020190162062A KR20210071664A (ko) | 2019-12-06 | 2019-12-06 | 전자장치 및 그 제어방법 |
Publications (1)
Publication Number | Publication Date |
---|---|
WO2021112391A1 true WO2021112391A1 (fr) | 2021-06-10 |
Family
ID=76222100
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
PCT/KR2020/013895 WO2021112391A1 (fr) | 2019-12-06 | 2020-10-13 | Dispositif électronique et son procédé de commande |
Country Status (2)
Country | Link |
---|---|
KR (1) | KR20210071664A (fr) |
WO (1) | WO2021112391A1 (fr) |
Families Citing this family (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113873325B (zh) * | 2021-10-29 | 2024-05-17 | 深圳市兆驰股份有限公司 | 声音处理方法、装置、设备和计算机可读存储介质 |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090083112A (ko) * | 2008-01-29 | 2009-08-03 | 한국과학기술원 | 잡음 제거 장치 및 방법 |
US20090240495A1 (en) * | 2008-03-18 | 2009-09-24 | Qualcomm Incorporated | Methods and apparatus for suppressing ambient noise using multiple audio signals |
KR101475864B1 (ko) * | 2008-11-13 | 2014-12-23 | 삼성전자 주식회사 | 잡음 제거 장치 및 잡음 제거 방법 |
KR20160001964A (ko) * | 2014-06-30 | 2016-01-07 | 삼성전자주식회사 | 마이크 운용 방법 및 이를 지원하는 전자 장치 |
KR101827276B1 (ko) * | 2016-05-13 | 2018-03-22 | 엘지전자 주식회사 | 전자 장치 및 그 제어 방법 |
-
2019
- 2019-12-06 KR KR1020190162062A patent/KR20210071664A/ko not_active Application Discontinuation
-
2020
- 2020-10-13 WO PCT/KR2020/013895 patent/WO2021112391A1/fr active Application Filing
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
KR20090083112A (ko) * | 2008-01-29 | 2009-08-03 | 한국과학기술원 | 잡음 제거 장치 및 방법 |
US20090240495A1 (en) * | 2008-03-18 | 2009-09-24 | Qualcomm Incorporated | Methods and apparatus for suppressing ambient noise using multiple audio signals |
KR101475864B1 (ko) * | 2008-11-13 | 2014-12-23 | 삼성전자 주식회사 | 잡음 제거 장치 및 잡음 제거 방법 |
KR20160001964A (ko) * | 2014-06-30 | 2016-01-07 | 삼성전자주식회사 | 마이크 운용 방법 및 이를 지원하는 전자 장치 |
KR101827276B1 (ko) * | 2016-05-13 | 2018-03-22 | 엘지전자 주식회사 | 전자 장치 및 그 제어 방법 |
Also Published As
Publication number | Publication date |
---|---|
KR20210071664A (ko) | 2021-06-16 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
KR20150112337A (ko) | 디스플레이 장치 및 그 사용자 인터랙션 방법 | |
WO2021091145A1 (fr) | Appareil électronique et procédé associé | |
EP3971887A1 (fr) | Appareil et procédé de reconnaissance d'une pluralité de mots de réveil | |
WO2020054980A1 (fr) | Procédé et dispositif d'adaptation de modèle de locuteur basée sur des phonèmes | |
WO2021085812A1 (fr) | Appareil électronique et son procédé de commande | |
WO2021112391A1 (fr) | Dispositif électronique et son procédé de commande | |
US20240121501A1 (en) | Electronic apparatus and method of controlling the same | |
WO2021256760A1 (fr) | Dispositif électronique mobile et son procédé de commande | |
WO2021091063A1 (fr) | Dispositif électronique et procédé de commande associé | |
US11942089B2 (en) | Electronic apparatus for recognizing voice and method of controlling the same | |
WO2021251780A1 (fr) | Systèmes et procédés de conversation en direct au moyen d'appareils auditifs | |
KR20220015306A (ko) | 전자장치 및 그 제어방법 | |
WO2021141332A1 (fr) | Dispositif électronique et son procédé de commande | |
WO2021141330A1 (fr) | Dispositif électronique et son procédé de commande | |
KR20220065370A (ko) | 전자장치 및 그 제어방법 | |
WO2022092530A1 (fr) | Dispositif électronique et son procédé de commande | |
WO2013100368A1 (fr) | Appareil électronique et procédé de commande de celui-ci | |
WO2022114482A1 (fr) | Dispositif électronique et son procédé de commande | |
WO2021107371A1 (fr) | Dispositif électronique et son procédé de commande | |
US20220165263A1 (en) | Electronic apparatus and method of controlling the same | |
EP4216211A1 (fr) | Dispositif électronique et son procédé de commande | |
US20220165298A1 (en) | Electronic apparatus and control method thereof | |
WO2022124640A1 (fr) | Dispositif électronique et procédé associé de commande | |
WO2021107464A1 (fr) | Dispositif électronique et procédé de commande associé | |
WO2022019458A1 (fr) | Dispositif électronique et procédé de commande associé |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
121 | Ep: the epo has been informed by wipo that ep was designated in this application |
Ref document number: 20896875 Country of ref document: EP Kind code of ref document: A1 |
|
NENP | Non-entry into the national phase |
Ref country code: DE |
|
122 | Ep: pct application non-entry in european phase |
Ref document number: 20896875 Country of ref document: EP Kind code of ref document: A1 |