WO2021172641A1 - Dispositif pour générer des informations de commande sur la base d'un état d'énoncé d'un utilisateur, et procédé de commande associé - Google Patents

Dispositif pour générer des informations de commande sur la base d'un état d'énoncé d'un utilisateur, et procédé de commande associé Download PDF

Info

Publication number
WO2021172641A1
WO2021172641A1 PCT/KR2020/003007 KR2020003007W WO2021172641A1 WO 2021172641 A1 WO2021172641 A1 WO 2021172641A1 KR 2020003007 W KR2020003007 W KR 2020003007W WO 2021172641 A1 WO2021172641 A1 WO 2021172641A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
information
movement
voice input
control information
Prior art date
Application number
PCT/KR2020/003007
Other languages
English (en)
Korean (ko)
Inventor
파벨 그르제시아크그르제고르츠
Original Assignee
삼성전자 주식회사
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자 주식회사 filed Critical 삼성전자 주식회사
Publication of WO2021172641A1 publication Critical patent/WO2021172641A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/22Procedures used during a speech recognition process, e.g. man-machine dialogue
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/01Input arrangements or combined input and output arrangements for interaction between user and computer
    • G06F3/017Gesture based interaction, e.g. based on a set of recognized hand gestures
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F3/00Input arrangements for transferring data to be processed into a form capable of being handled by the computer; Output arrangements for transferring data from processing unit to output unit, e.g. interface arrangements
    • G06F3/16Sound input; Sound output
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/02Feature extraction for speech recognition; Selection of recognition unit
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L15/00Speech recognition
    • G10L15/04Segmentation; Word boundary detection
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L25/00Speech or voice analysis techniques not restricted to a single one of groups G10L15/00 - G10L21/00
    • G10L25/93Discriminating between voiced and unvoiced parts of speech signals

Definitions

  • the present disclosure relates to a device that performs voice recognition based on a user's utterance state, and generates a control command according to voice recognition, and a control method of the device, and more particularly, the device performs voice recognition according to the user's actual utterance. Recognizing a voice command and generating a control command for controlling the device according to the recognized voice command, or generating a control command for the device using motion information according to the movement of a user's body part and the recognized voice command .
  • a contact-type interface by a user's direct touch through a predetermined input means has been mainly used.
  • a user input interface using a keyboard or mouse is used in a PC
  • an interface in which a user intuitively touches a screen with a finger is mainly used in a smart phone.
  • speech recognition is a technology that converts an acoustic speech signal obtained through a sound sensor such as a microphone into words or sentences.
  • voice recognition technology develops, a user inputs a voice into a device, and it becomes possible to control an operation of the device according to the voice input.
  • the conventional voice recognition technology is vulnerable to external noise caused by a conversation of an external user other than the user of the device, and thus it may be difficult to accurately recognize the user's voice.
  • an operation according to the user's voice input is performed after the user's voice input is completed, and proper feedback is not made during the user's voice input. Accordingly, there is a need for a technology capable of more accurately recognizing a user's voice input and performing a device control operation intended by the user even during the user's voice input.
  • An embodiment of the present disclosure analyzes a signal generated when a user's speech or uses vibration information to identify a signal generated by the user's speech among audio signals input to a device to more accurately recognize the user's voice input and It is possible to provide a device for performing a control operation based on a voice input and a method for controlling the same.
  • An embodiment of the present disclosure provides a device and a control method for generating a control command related to a user's voice input, when generating a control command related to a user's voice input, using the user's motion information to generate the control command and controlling the control command based on the generated control command can do.
  • FIG. 1 is a diagram for explaining a method of controlling a device according to an embodiment of the present disclosure.
  • FIG. 2 is a block diagram of a device according to an embodiment of the present disclosure.
  • FIG. 3 is a diagram for explaining a voice recognition process of a device according to an embodiment of the present disclosure.
  • FIG. 4 is a diagram for explaining user movement information according to an embodiment of the present disclosure.
  • FIG. 5 is a diagram for explaining a voice recognition process and generation of control information using user's motion information according to an embodiment of the present disclosure.
  • 6 and 7 are diagrams for explaining a process of determining an attribute value of control information based on motion information according to an embodiment.
  • FIG. 8 is a diagram for explaining a process of determining an attribute value of control information based on motion information according to another embodiment of the present disclosure.
  • 9A is a diagram illustrating a control system including a device generating control information and an external device to be controlled according to an exemplary embodiment.
  • 9B is a diagram illustrating a control system including a device for generating control information and an external device to be controlled according to another exemplary embodiment.
  • FIG. 10 is a flowchart of a method for a device to provide control information according to an embodiment.
  • FIG. 11 is a detailed flowchart of a method for a device to provide control information according to an embodiment of the present disclosure.
  • FIG. 12 is a flowchart illustrating a method of controlling an external device according to an embodiment of the present disclosure.
  • FIG. 13 is a flowchart illustrating a method of controlling an external device according to another embodiment of the present disclosure.
  • FIG. 14 is a block diagram illustrating a configuration of an electronic device according to an embodiment of the present disclosure.
  • a device includes a memory in which at least one program is stored; a microphone for receiving an audio signal; a sensor module for acquiring vibration information according to the user's utterance state; and at least one processor configured to generate control information corresponding to the user's voice input identified from the audio signal by executing the at least one program, wherein the at least one program includes, based on the vibration information, identifying the user's voice input from the audio signal received through the microphone; and generating control information corresponding to the user's voice input based on the identified user's voice input.
  • a method for a device to provide control information for a user's voice input includes: receiving an audio signal; acquiring vibration information according to the user's utterance state; identifying the user's voice input included in the audio signal based on the vibration information; and generating control information corresponding to the user's voice input based on the identified user's voice input.
  • the voice assistant service provides automated speech recognition (ASR) processing, natural language understanding (NLU) processing, dialogue management (DM: Dialogue Manager) processing, natural language generation to an audio signal. It is a service that provides a response to a user's voice command through natural language generation (NLG) processing and text to speech (TTS) processing.
  • ASR automated speech recognition
  • NLU natural language understanding
  • DM Dialogue Manager
  • NGS natural language generation
  • TTS text to speech
  • the voice assistant service may be a service that recognizes a user's voice command and controls the operation of the device according to the corresponding voice command.
  • the artificial intelligence model is an artificial intelligence algorithm, and may be a model learned using at least one of machine learning, neural networks, genes, deep learning, and classification algorithms.
  • the model of the voice assistant service may be an artificial intelligence model in which standards and methods for providing feedback according to a user's voice command in the voice assistant service are learned.
  • the model of the voice assistant service may include, for example, a model for recognizing a user's input voice, a model for interpreting the user's input voice, and a model for generating a control command according to the user's input voice. , but not limited thereto.
  • the models constituting the model of the voice assistant service may be an artificial intelligence model to which an artificial intelligence algorithm is applied.
  • FIG. 1 is a diagram for explaining a method of controlling a device according to an embodiment of the present disclosure.
  • the device 1000 is in contact with the user's body, and detects a voice input by the user's utterance and the user's movement.
  • 1 illustrates a smart earphone attached to a user's ear as an example of the device 1000 .
  • a smart earphone is a voice assistant service capable of acquiring a user's voice input and user's movement information, and performing a control operation according to the user's voice input and user's movement information, in addition to a function of outputting an audio signal.
  • the device 1000 may represent a set of a plurality of devices that are in contact with a plurality of body parts without contacting only one part of the user's body.
  • the device 1000 may refer to an electronic device that is in contact with the user's body and can obtain the user's voice input and movement information.
  • the device 1000 may be a wearable device such as augmented reality (AR) glasses, a smart watch, a smart lens, a smart bracelet, or smart clothing.
  • the device 1000 may be a mobile device such as a smart phone, a smart tablet, a computer, a notebook computer, etc. used by a user.
  • the device 1000 may include various electronic devices capable of detecting a voice input by a user's utterance and user's movement information.
  • the device 1000 generates corresponding control information based on the user's voice input. Also, the device 1000 may generate control information by using the user's movement information in addition to the user's voice input.
  • the control information may be a command for controlling the device 1000 itself. For example, when the device 1000 is a smart earphone, the control information of the device 1000 is volume up/down, mute, and forward for changing the track of music output through the smart earphone. ) and a control command for controlling a backward operation, a track play, and a pause operation.
  • the control information may be a control command for controlling an external device.
  • the control information for controlling the external device may be information for controlling the movement of the external device or information for controlling an output signal output from the external device.
  • the control command of the device 1000 may be changed according to the type and function of the device to be controlled.
  • the speech recognition process performed by the device 1000 may be divided into an embedded method and a non-embedded method according to the subject of the speech recognition process.
  • a control command may be generated based on a user's voice input and movement information by a voice assistant program installed by default in the device 1000 .
  • the generated control information may be used to control the device 1000 itself, or may be transmitted to an external device connected through a network and used to control the operation of the external device.
  • the device 1000 may transmit the user's voice input and motion information to an external device connected through a network, and the external device may generate control information based on the user's voice input and motion information.
  • the control command generated by the external device may be used again to control the device 1000 or another external device.
  • the device 1000 transmits the user's voice input and motion information to an external device, and the external device generates control information based on the user's voice input and motion information, and the device 1000 .
  • the device 1000 may receive a control command generated from an external device and perform a control operation based on the received control information.
  • the device 1000 transmits the user's voice input and motion information for controlling the second external device to the first external device, and the first external device generates control information based on the user's voice input and motion information.
  • the second external device may receive a control command generated by the first external device and perform a control operation based on the received control information.
  • a device 1000 receives a user voice input by utterance from a user. Specifically, the device 1000 receives an audio signal through an input means such as a microphone, excludes a non-voice section not caused by the user's utterance from the received audio signal, and identifies the user's voice input by the actual user's utterance do. The device 1000 may detect vibrations generated in the larynx according to the user's utterance, and determine the audio signal input during the period in which the vibration is sensed among the input audio signals as the user's voice input due to the user's utterance. .
  • the device 1000 may analyze an audio signal input through a microphone to identify an utterance section caused by the user's utterance.
  • the device 1000 includes a cepstrum, a linear predictive coefficient (LPC), a mel frequency cepstral coefficient (MFCC) and a filter bank energy in the received audio signal. (Filter Bank Energy) may be used to extract a feature vector of an input audio signal using any one of the feature vector extraction techniques, and the feature vector may be analyzed to identify the user's voice input by the user's utterance.
  • LPC linear predictive coefficient
  • MFCC mel frequency cepstral coefficient
  • filter bank energy may be used to extract a feature vector of an input audio signal using any one of the feature vector extraction techniques, and the feature vector may be analyzed to identify the user's voice input by the user's utterance.
  • the above-described speech feature vector extraction technique is merely an example, and the feature vector extraction technique used in the present disclosure is not limited to the above-described example.
  • the device 1000 may extract a feature vector due to a user's utterance by applying a deep neural network model (DNN) to the feature vector extracted from the audio signal.
  • the user's voice input signal feature may be expressed as a user feature vector.
  • the device 1000 may extract the user's feature vector by applying a deep neural network (DNN) to the speech feature vector extracted from the input audio signal.
  • the device 1000 may obtain a user feature vector by inputting a speech feature vector as an input value to a deep neural network model (DNN) and a feature value related to a user as an output to the deep neural network model, respectively, and training.
  • DNN deep neural network model
  • the deep neural network model may include at least one of a convolutional neural network (CNN), a recurrent neural network (RNN), and a generative adversarial network (GAN), but is not limited to the examples listed above.
  • CNN convolutional neural network
  • RNN recurrent neural network
  • GAN generative adversarial network
  • the deep neural network model used by the device 1000 of the present disclosure may include all types of currently known neural network models.
  • the device 1000 may include an Automatic Speech Recognition (ASR) model.
  • ASR Automatic Speech Recognition
  • the ASR model is a speech recognition model that recognizes speech using an integrated neural network, and may output text from a user's speech input.
  • the ASR model may be, for example, an artificial intelligence model including an acoustic model, a pronunciation dictionary, and a language model.
  • the ASR model may be, for example, an end-to-end speech recognition model having a structure including an integrated neural network without separately including an acoustic model, a pronunciation dictionary, and a language model.
  • the end-to-end ASR model uses an integrated neural network to convert speech into text without a process of converting phonemes into text after recognizing phonemes from speech.
  • the text may include at least one character.
  • Characters refer to symbols used to express and write human language in a visible form.
  • the characters may include Hangul, alphabets, Chinese characters, numbers, diacritics, punctuation marks, and other symbols.
  • the text may include a character string.
  • a character string refers to a sequence of characters.
  • the text may include at least one alphabet.
  • a grapheme is the smallest unit of sound, composed of at least one letter.
  • one letter may be a letter element, and a character string may mean an arrangement of letter elements.
  • text may include morphemes or words.
  • a morpheme is the smallest unit having a meaning, which is composed of at least one grammeme.
  • a word is a basic unit of a language that can be used independently or exhibits a grammatical function, consisting of at least one morpheme.
  • the device 1000 may receive the user's voice input from the audio signal and obtain text from the user's voice input using the ASR model.
  • the device 1000 may analyze the meaning of the acquired user's voice input to generate a corresponding control command.
  • the device 1000 may include a sensor module capable of detecting a user's movement state, and may obtain user movement information.
  • the sensor module provided in the device 1000 includes at least one of a gesture sensor, a gyroscope sensor, and an accelerometer sensor.
  • the device 1000 may detect a user's movement, rotation, etc. through an provided sensor, and may generate an electrical signal or data value related to the sensed user's movement.
  • the device 1000 measures the amount of change in pitch, roll, and yaw based on three axes of x, y, and z to obtain tilt information of the device 1000 and acceleration in each axis direction. and the user's movement information may be obtained using the tilt information and acceleration obtained based on the three axes of the device 1000 .
  • a roll represents a rotational movement about the x-axis
  • a pitch indicates a rotational movement about the y-axis
  • a yaw indicates a rotational movement about the z-axis.
  • the device 1000 may identify the user's utterance state by detecting changes in pitch, roll, and yaw caused by the movement of the user's jaw.
  • FIG. 2 is a block diagram of a device according to an embodiment of the present disclosure.
  • a device 2000 may include an input unit 2100 , a memory 2200 , a sensing unit 2300 , and a processor 2400 .
  • the device 2000 may further include a communication unit 2500 and an output unit 2600 .
  • the electronic device 2000 may include more components or some components may be excluded.
  • the device 2000 may include an output module such as a speaker for outputting an audio signal.
  • the input unit 2100 receives an external audio signal of the device 2000 .
  • the input unit 2100 may include a microphone.
  • the external audio signal may include a user's voice input.
  • the input unit 2100 may include other means for inputting data for controlling the device 2000 according to the type of the device 2000 .
  • the input unit 2100 includes a key pad, a switch, a touch pad (contact capacitive method, pressure resistance film method, infrared sensing method, surface ultrasonic conduction method, integral tension measurement method, piezo effect method, etc.), It may include, but is not limited to, a jog wheel, a jog switch, and the like.
  • the processor 2400 controls the overall operation of the device 2000 .
  • the processor 2400 may be configured to process instructions of a computer program by performing arithmetic, logic and input/output operations and signal processing.
  • the instructions of the computer program are stored in the memory 2200 , and the instructions may be provided to the processor 2400 from the memory 2200 .
  • functions and/or operations performed by the processor 2400 may be implemented by the processor 2400 executing an instruction received according to computer program code stored in a recording device such as the memory 2200. .
  • the processor 2400 is, for example, a central processing unit (Central Processing Unit), a microprocessor (microprocessor), a graphic processor (Graphic Processing Unit), ASICs (Application Specific Integrated Circuits), DSPs (Digital Signal Processors), DSPDs (Digital Signal Processing Devices), PLDs (Programmable Logic Devices), and FPGAs (Field Programmable Gate Arrays) may be configured as at least one, but is not limited thereto.
  • the processor 2400 may be an application processor (AP) that executes an application.
  • AP application processor
  • the processor 2400 may obtain the user's voice input from the audio signal received through the input unit 2100 .
  • the processor 2400 may execute an application that performs an operation of the device 2000 or an external device based on the user's voice input, and the user for additionally controlling the device 2000 or the external device through the executed application can receive voice input from The processor 2400, for example, when a voice input for executing a predetermined voice assistant application such as “S Voice” or “bixby” is received, the corresponding voice assistant application and may generate a control command based on the additionally input user's voice input and user's movement information.
  • a predetermined voice assistant application such as “S Voice” or “bixby”
  • the processor 2400 identifies the user's voice input by the actual user's utterance, except for the non-voice section not caused by the user's utterance from the received audio signal.
  • the processor 2400 may analyze an audio signal input through the input unit 2100 to identify an utterance section by the user's utterance.
  • the processor 2400 extracts a feature vector of the received audio signal and analyzes the feature vector to identify the user's voice input by the user's utterance.
  • the processor 2400 may extract a feature vector due to the user's utterance by applying a deep neural network (DNN) to the feature vector extracted from the audio signal.
  • DNN deep neural network
  • the processor 2400 compares the feature vector input during the user's speech section with each model using an acoustic model, a language model, and a pronunciation lexicon, and scores the input speech signal You can get a word string for .
  • the processor 2400 combines the user's voice signal input through the input unit 2100 with movement information according to the user's movement obtained from the sensing unit 2300 to control the device 2000 or an external device. can create The processor 2400 may determine the type of the control command based on the user's voice signal, and determine the attribute value of the control command based on the user's motion information. For example, when a voice input such as "volume" is input, the processor 2400 identifies a user's intention to control the volume during a control operation of the device 2000 or an external device, and The attribute value may be determined using the user's motion information.
  • the processor 2400 may obtain information on the user's up, down, left, right, or front and rear movement information through the sensing unit 2300 , and may generate a control command according to the user's movement. Specifically, when a user movement in an upper direction or a forward direction is sensed through the sensing unit 2300 , the processor 2400 may increase an attribute value related to a control command. Also, when a user's movement in a lower direction or a backward direction is sensed through the sensing unit 2300 , the processor 2400 may decrease an attribute value related to a control command.
  • the processor 2400 identifies a user's intention to control the volume during a control operation of the device 2000 or an external device, and The attribute value may be determined using the user's motion information.
  • the device 2000 is a device mounted on the user's ear, such as a smart earphone
  • the processor 2400 is When the user's head moves from the bottom to the top according to the movement of the user's head, a control command to perform a volume up operation is generated, and the voice input "volume” or “volume down” is recognized
  • a control command for performing a volume down operation may be generated.
  • the direction in which the attribute value is to be changed according to the user's motion information is not limited thereto and may be changed.
  • the processor 2400 controls the user through the sensing unit 2300 .
  • a control command for controlling the movement of the controlled target may be generated according to up, down, left and right, or forward and backward motion information.
  • the processor 2400 when the device 2000 or the external device to be controlled includes a display module for visually outputting information, the processor 2400 through the sensing unit 2300 according to the user's up, down, left, right, or forward and backward movement information A control command for controlling the scrolling of an image output through the display module may be generated.
  • the sensing unit 2300 includes an acceleration sensor 2310 and a gyroscope sensor 2320 , and may acquire user movement information.
  • the sensing unit 2300 may detect a user's motion and generate an electrical signal or data value related to the sensed user's motion.
  • the sensing unit 2300 measures the amount of change in pitch, roll, and yaw based on three axes of x, y, and z, and obtains tilt information of the device 2000 and acceleration in each axis direction. and the user's movement information may be obtained by using the tilt information and acceleration obtained based on the three axes of the device 2000 .
  • the sensing unit 1000 may detect a vibration caused by the user's utterance to identify the user's utterance state.
  • the memory 2200 may store commands set so that the processor 2400 generates a control command for controlling the device 2000 or an external device based on the user's voice input and motion information.
  • the memory 2200 may include, for example, random access memory (RAM), static random access memory (SRAM), read-only memory (ROM), electrically erasable programmable read-only memory (EEPROM), and programmable read memory (PROM). -Only Memory), but is not limited to the above-described example.
  • the device 2000 may communicate with an external device through a predetermined network using the communication unit 2500 .
  • the communication unit 2500 may include one or more communication processors supporting wired communication or wireless communication.
  • Networks include Local Area Networks (LANs), Wide Area Networks (WANs), Value Added Networks (VANs), mobile radio communication networks, satellite networks, and combinations thereof. It is a data communication network in a comprehensive sense that enables each network constituent entity to communicate smoothly with each other, and may include a wired Internet, a wireless Internet, and a mobile wireless communication network.
  • Wireless communication is, for example, wireless LAN (Wi-Fi), Bluetooth, Bluetooth low energy, Zigbee, WFD (Wi-Fi Direct), UWB (ultra wideband), infrared communication (IrDA, infrared Data Association) ), NFC (Near Field Communication), etc. may be there, but is not limited thereto.
  • the output unit 2600 outputs a sound signal or a video signal to the outside.
  • the output unit 2600 may include a speaker or a receiver that outputs a sound signal to the outside, or a display module that visually provides information to the outside.
  • FIG 3 is a diagram for explaining a voice recognition process of the device 2000 according to an embodiment of the present disclosure.
  • the processor 2400 of the device 2000 detects a vibration signal 310 caused by the user's utterance from the audio signal 330 obtained from the input unit 2100 , or detects a vibration signal 310 by the user from the sensing unit 2300 .
  • the vibration signal 310 due to ignition may be detected.
  • the processor 2400 identifies a time interval T1 of t1 to t2, a time interval T2 from t3 to t4, and a time interval T3 from t5 to t6, which are intervals in which the vibration is detected, as the interval in which the user actually uttered. .
  • the processor 2400 may remove noise in a section in which vibration is not detected among the audio signals 330 and identify the user voice input 320 in word units using only the audio signal during a time section in which vibration is detected.
  • the external noise signal may be removed by applying the ASR algorithm.
  • the processor 2400 may use vibration information generated during the user's speech in order to identify the user's voice input by the actual user's speech from the received audio signal 330 .
  • the vibration generated when the user's utterance is to be detected by the input unit 2100 or the sensing unit 2300 of the device 2000 attached to the ear through the bone.
  • the processor 2400 may analyze the vibration signal 310 to identify whether the user is uttering and a utterance section by the user's utterance among the input audio signals 330 .
  • the processor 2400 may determine the audio signal input in T1 , T2 , and T3 , which is a section in which vibration is detected among the audio signal 330 , as the user's voice input by the actual user's utterance.
  • the processor 2400 may determine that the audio signal received in sections T1, T2, and T3 in which no vibration is detected is noise or an audio signal generated by another external user.
  • the processor 2400 analyzes the voice input input in the sections T1, T2, and T3 in which vibration is sensed by the user's utterance through the vibration information 310, and in the time period T1 of t1 to t2
  • a user voice input such as “Bixby” uttered, “volume” uttered during a time period T2 of t3 to t4, and “UP” uttered during a time period T3 of t5 to t6 may be recognized.
  • the sensing unit 2300 of the device 2000 acquires the user's motion information 340 after the time t1 when it is determined that the user's voice input is started, and controls the user's voice input using the user's voice input and the user's motion information. information can be generated.
  • FIG. 4 is a diagram for explaining user movement information according to an embodiment of the present disclosure.
  • the motion information may be information on roll, pitch, and yaw obtained based on three axes of x, y, and z.
  • a roll represents a rotational movement about the x-axis
  • a pitch indicates a rotational movement about the y-axis
  • a yaw indicates a rotational movement about the z-axis.
  • 4 shows motion information according to the user's motion in the pitch direction among roll, pitch, and yaw.
  • the device 1000 may acquire user movement information by detecting changes in roll, pitch, and yaw acquired for a predetermined time. For example, if the reference angle value of the pitch angle when the user is facing the front is 0, when the user raises his/her head relative to the front, the pitch angle increases from 0 degrees (deg) to the + direction, Conversely, when the user drops his head down with respect to the front, the pitch angle may be set to decrease from 0 degrees to the - direction.
  • the clockwise direction of rolling the head in the direction of the user's right ear with respect to the user is + direction
  • the counterclockwise direction of rolling the head in the direction of the user's left ear may be set to the - direction.
  • the direction in which the user turns his head clockwise from the front direction to the right ear is the + direction
  • the user's The direction of turning the head counterclockwise from the frontal direction to the left ear direction may be set to the - direction.
  • the pitch angle, roll angle, and the + direction and the - direction of the yaw angle can be changed.
  • the device 1000 may estimate the user's movement after the predetermined reference time by measuring changes in the pitch angle, the roll angle, and the yaw angle obtained after the predetermined reference time based on the predetermined reference time.
  • FIG. 5 is a diagram for explaining a voice recognition process and generation of control information using user's motion information according to an embodiment of the present disclosure.
  • the device 2000 determines the user's movement direction and movement size from the user's movement information obtained after the utterance period start point based on the utterance period start point, and based on the determined movement direction and movement size, Control information can be created. Specifically, the device 2000 may determine the amount of change in the specific attribute value according to the control command determined according to the user's voice input, based on the movement direction and the movement size.
  • the attribute value is a parameter related to the control command, and may be determined according to the type of the device to be controlled and the type of the control command. For example, when the device to be controlled is a speaker or earphone outputting sound and the control command is related to the volume, the attribute value determined based on the user's movement may be the volume level. In addition, when a device to be controlled, such as a robot cleaner, a drone, or a robot pet, can be driven by itself and the control command is related to movement (eg, "Move"), the attribute value determined based on the user's movement is to avoid It may be a movement direction and a movement speed of the xyz axis of the control target.
  • the processor 2400 may generate a control command to control the front, back, left, right, and movement speed of the drivable device to be controlled according to the user's movement.
  • the control command is a command related to a change in illuminance such as “change lighting”, similarly to the volume control described above
  • the processor 2400 generates a light bulb based on the user's movement information. can increase or decrease the brightness of
  • the device to be controlled is a display device that outputs an image and the control command is related to movement of the output image, such as “move image” or “scroll image”
  • the attribute value may indicate the direction and size of the output image.
  • the processor 2400 may generate a control command to move the output image in a direction consistent with the user's movement according to the user's head movement.
  • the attribute value may be an on/off value of the controlled device.
  • the processor 2400 may turn on/off the power of the controlled device based on the user's movement information. Which of the xyz axis directions is set as the on/off direction may be changed.
  • the device 2000 executes a voice assistant application, and then executes a voice assistant application of a control command to be controlled through a user's voice input. type can be determined.
  • a predefined keyword for example, a keyword such as “Bixby”
  • the device 2000 may determine that the user's intention is to control the volume.
  • the device 2000 may determine the type of the control command by directly identifying a word such as “Volume” excluding a keyword such as “Bixby” that is input in advance.
  • the device 2000 may recognize the user's voice input after the vibration is sensed and perform a voice processing process.
  • the device 2000 presets a control command corresponding to the user's small voice input specified in advance, and generates a control command based on the user's movement information without a separate voice recognition process when the small voice input of the user is input. can do.
  • a control command corresponding to a small murmur preset by the user may be preset.
  • the device 2000 determines whether it corresponds to the murmur preset by the user by comparing it with a preset pattern, and the murmur signal preset by the user is received. If it is determined, a separate voice recognition process may be skipped thereafter, and a preset attribute value of a control command may be determined based on the user's motion information.
  • the device 2000 receives the user's murmur signal, the user's murmur signal is a preset signal for volume control. Determine if the pattern is the same. And, when it is determined that the user's muttering signal is a signal for volume control, the device 2000 may determine the volume control as a control command type, and then determine the volume value based on the acquired user's movement information. As such, when a preset small voice signal or a control command corresponding to a murmur is preset, the user speaks only a preset small voice signal without speaking the voice command in a loud voice, and controls the device 2000 through movement. can
  • the device 2000 may determine an attribute value related to the user's control command by using the user's movement information input after the user's utterance section.
  • the sensing unit 2300 of the device 2000 determines the user's movement direction and movement size through the user's pitch, roll, and yaw movement acquired based on the xyz axis, and is related to the control command based on the movement direction and the size of the movement.
  • An attribute value may be determined.
  • the processor 2400 may determine the user's movement by using the movement direction and movement magnitude information obtained in the xyz axis direction. According to an embodiment, in order to determine the user's movement, the processor 2400 may use the movement direction input in each axis direction or an extreme value of an angle obtained based on the xyz axis.
  • the pitch angle at the reference time at which the user's motion information is acquired is referred to as the reference pitch angle 0 degrees, and when the user raises his head, the pitch angle increases from 0 degrees to the + direction, and conversely, when the user lowers his head Assume that the pitch angle decreases from 0 degrees in the negative direction.
  • a change in which the pitch angle increases or decreases in the + or - direction may be referred to as a movement direction.
  • the processor 2400 may determine a movement in which the user raises his/her head or lowers his/her head. In the above example, it may be defined as a pitch movement in the + direction in which the pitch angle increases when the user raises his head upward, and a pitch movement in the - direction in which the pitch angle decreases when the user lowers the whale.
  • the processor 2400 may determine the user's movement through the motion information in which the pitch angle is increased when the user performs an operation to raise the head up after the user's utterance together with the control command "volume” or "volume up”. Also, when the pitch angle is increased after the user's utterance time, the volume of the current device may be increased. In addition, when the processor 2400 performs an operation of bending the user's head down after the user's utterance together with a control command such as "volume” or "volume down", the user's movement through motion information with a reduced pitch angle may be determined, and when the pitch angle is decreased after the user's utterance point, the volume of the current device may be decreased.
  • the movement information obtained in the xyz axis direction may be repeatedly increased and decreased.
  • the user's movement may be analyzed using the extreme value.
  • the processor 2400 determines that the user has performed the action of raising his/her head twice. can do. That is, when the pitch angle increases and then decreases, the processor 2400 may determine that the user's intention is to perform an operation of raising the head upward, and the user's intention is that the amount of change in the pitch angle is two maximum values. It can be expressed in the form of a graph with When it is determined that the user raises his/her head twice together with the control command "volume" or "volume up", the processor 2400 may increase the volume of the current device by two steps.
  • the processor 2400 may determine that the user performs the action of bowing the head down twice. When it is determined that the user lowers the head twice together with the control command "volume” or "volume down", the processor 2400 may decrease the volume of the current device by two steps.
  • the processor 2400 may detect the user's movement only when the user's movement is greater than or equal to a predetermined threshold in order to prevent the control operation from reacting sensitively to the minute user's movement.
  • the processor 2400 may determine that the user has raised his head twice only when the pitch angles ⁇ 1 and ⁇ 2 of the two maximum values P1 and P2 are greater than a predetermined upper threshold value. If the pitch value of the extreme value is smaller than the predetermined upper limit threshold, the processor 2400 may determine that the user does not raise his/her head in order to perform the volume up operation.
  • the processor 2400 may similarly determine that the user has lowered the head twice only when the pitch angles ⁇ 3 and ⁇ 4 of the two minimum values P3 and P4 are smaller than a predetermined lower limit threshold value.
  • the absolute value of the upper limit threshold value or the lower limit threshold value is set to a smaller value, control information may be generated to more sensitively respond to a user's movement. In order to prevent a case in which the user reacts too sensitively to a small movement of the user, the absolute value of the upper limit threshold value or the lower limit threshold value may be increased.
  • the device 2000 may determine an attribute value related to a control command based on the size of the movement as well as the direction of the movement. Specifically, the device 2000 may determine an attribute value related to the control command to be linearly proportional to or inversely proportional to the motion size. For example, the processor 2400 increases the volume of the current device by one step when motion information of a user having a pitch angle of 30 degrees is obtained along with a control command of “volume” or “volume up”, and the pitch angle is 60 When the movement information of the user is obtained, the volume of the current device may be increased by two steps.
  • a predetermined threshold time from the end of the user's utterance Motion information acquired before this elapses may be used.
  • the processor 2400 is Control information may be generated using the motion information 530 .
  • the processor 2400 determines the volume of the current device when the motion information 530 in the pitch direction having the maximum value P5 is obtained within a predetermined threshold time after the voice command “Bixby volume” is terminated. can increase
  • the processor 2400 may generate a control command by using motion information obtained before a preset end keyword voice is input.
  • the processor 2400 may include, for example, the user's movement information acquired before the preset end keywords such as "OK”, “Finished”, “Thanks”, “Done”, “Stop”, and “End” are recognized. can be used to generate control commands.
  • the processor 2400 recognizes the predetermined end keyword from the time of the user's utterance.
  • the volume level of the current device may be continuously increased as many times as the number of motions of raising the user's head up until it is reached.
  • the processor 2400 may analyze the user's movement information without using the end keyword, and determine that the control operation is stopped when the user's movement returns to the initial position at the start of the utterance.
  • the processor 2400 controls the volume value according to the movement of the user's head after the user's voice command "Bixby volume” or "volume” When returning, the volume control operation can be stopped.
  • 6 and 7 are diagrams for explaining a process of determining an attribute value of control information based on motion information according to an embodiment.
  • the processor 2400 may determine an attribute value related to a control command by using the motion information detected by the sensing unit 2300 .
  • the processor 2400 may change the attribute value in stages in consideration of the direction and magnitude of the movement, or may change the attribute value to be linearly proportional or inversely proportional to the movement.
  • the processor 2400 determines whether to increase or decrease the attribute value related to the control command in consideration of the detected movement direction.
  • the processor 2400 may determine an attribute value related to a control command by using motion information in the direction of at least one of the xyz axes. For example, the processor 2400 uses a pitch movement rotating about the y-axis to determine that the property value related to the control command is increased when the pitch angle has a positive value, and when the pitch angle has a - value, the property It can be decided to decrease the value. As shown in FIG.
  • the processor 2400 may change the attribute value stepwise during time period T1 to T2. (620).
  • the processor 2400 linearly changes the attribute value during time T1 to T2 It can be done (720). For example, the processor 2400 increases the volume of the device stepwise or linearly while maintaining a state in which the pitch angle is greater than a predetermined threshold with a control command of “volume” or “volume up”. can do it
  • FIG. 8 is a diagram for explaining a process of determining an attribute value of control information based on motion information according to another embodiment of the present disclosure.
  • the processor 2400 may determine an amount of change in motion information based on a threshold value in order to prevent a control operation from reacting sensitively to a minute user's movement. Specifically, the processor 2400 determines a case in which motion information on any one of the xyz axes is changed from a value smaller than the threshold value Th to a larger value in a + direction, and a case in which the motion information is changed from a value larger than Th to a smaller value in a negative direction is movement in the - direction. is determined to have occurred, and the user's movement may be determined by analyzing the number of movements in the + direction and the number of movements in the - direction.
  • the processor 2400 analyzes the motion information 800 obtained from the sensing unit 2300 , and when a motion having a larger value from a value smaller than the threshold Th1 is obtained, the + direction , and a motion having a small value from a value greater than the threshold value Th1 is obtained, it may be determined that a movement in the -direction has occurred.
  • the processor 2400 determines three + direction movements and two - direction movements having the maximum values of P1, P3, and P4. can do. Since the change in motion between the P1 maximum value and the P2 maximum value is within a range larger than the Th1 threshold value, it is not determined that an additional + direction movement has occurred and may be ignored.
  • the processor 2400 may set a threshold value for determining movement in the + direction and a threshold value for determining movement in the - direction differently. Assuming that the threshold value for determining the movement in the + direction is Th1 and the threshold value for determining the movement in the - direction is Th2, the processor 2400 determines a case in which a motion having a larger value from a value smaller than the threshold value Th1 is obtained. It may be determined that a movement in the + direction has occurred, and a case in which a movement having a small value from a value greater than the threshold value Th2 is obtained may be determined as occurring a movement in the - direction. In FIG.
  • the processor 2400 when the movement in the + direction and the - direction is determined based on the two threshold values of Th1 and Th2, the processor 2400 performs three +-direction movements with the maximum values of P1, P3, and P4 and one - Able to judge directional movement. Since the movement change between the P2 maximum value and the P3 maximum value is changed within a range larger than the Th2 threshold value, it may be determined that the -direction movement does not occur among the movement changes between the P2 maximum value and the P3 maximum value. As such, when determining the movement in the + direction and the - direction based on the two threshold values, the processor 2400 may prevent the control operation from reacting sensitively to the minute user's movement.
  • the processor 2400 may determine an attribute value related to control information. For example, when the control information is the volume, the volume level may be increased according to the number of movements in the + direction, or the volume level may be decreased according to the number of movements in the - direction.
  • 9A is a diagram illustrating a control system including a device generating control information and an external device to be controlled according to an exemplary embodiment.
  • the device 910 generates control information based on the user's voice input and motion information, and transmits the control information to the external device 930 connected through the network 920 to the external device 930 .
  • the device 910 may generate a control command based on a user's voice input and motion information by a voice assistant program installed by default, and use the control command to control the device 1000 itself.
  • the device 910 may be a wearable device such as AR (Augmented Reality) glasses, a smart watch, a smart lens, a smart bracelet, and smart clothing, or a mobile device such as a smart phone, a smart tablet, a computer, or a notebook computer, It is not limited thereto.
  • AR Augmented Reality
  • a smart watch a smart lens
  • a smart bracelet and smart clothing
  • a mobile device such as a smart phone, a smart tablet, a computer, or a notebook computer, It is not limited thereto.
  • the external device 930 is a smart light bulb, a smart pet, a robot cleaner, a display device, a smart phone, a tablet PC, a PC, a smart TV, a mobile phone, a personal digital assistant (PDA), a laptop, a media player, a server, a micro server, a GPS ( global positioning system) devices, e-book terminals, digital broadcast terminals, navigation devices, kiosks, MP3 players, digital cameras, home appliances, and other mobile or non-mobile computing devices, but are not limited thereto.
  • PDA personal digital assistant
  • the network 920 includes a local area network (LAN), a wide area network (WAN), a value added network (VAN), a mobile radio communication network, a satellite communication network, and a mutual network thereof. It is a data communication network in a comprehensive sense that includes a combination and enables each network constituent entity shown in FIG. 9A to communicate smoothly with each other, and may include a wired Internet, a wireless Internet, and a mobile wireless communication network.
  • 9B is a diagram illustrating a control system including a device for generating control information and an external device to be controlled according to another exemplary embodiment.
  • the device 910 transmits the user's voice input and motion information to the first external device 940 connected through the network 920, and the first external device 940 transmits the user's voice input and motion information.
  • Control information may be generated based on the motion information.
  • the control command generated by the first external device 940 may be used again to control the device 910 or another second external device 950 , or may be used to control the first external device 940 .
  • the device 910 transmits the user's voice input and motion information to the first external device 940, and the first external device 940 generates control information based on the user's voice input and motion information, and , the device 910 may receive a control command generated by the first external device 940 and perform a control operation based on the received control information.
  • the device 910 transmits the user's voice input and motion information for controlling the second external device 950 to the first external device 940 , and the first external device 940 receives the user's voice input and motion information.
  • the control information may be generated based on the motion information, and the second external device 950 may receive the control information generated by the first external device 950 and perform a control operation based on the received control information.
  • FIG. 10 is a flowchart of a method for a device to provide control information according to an embodiment.
  • the input unit 2100 receives an external audio signal.
  • the processor 2400 executes the corresponding voice assistant application, and the input unit 2100 receives an external audio signal including the user's voice input. can be controlled to receive.
  • the input unit 2100 or the sensing unit 2300 acquires vibration information caused by the user's utterance.
  • the processor 2400 analyzes the audio signal input through the input unit 2100 to identify a speech signal pattern by the user's utterance to identify the utterance section, or is generated by the user's utterance detected by the sensing unit 2300 . By detecting the vibration to identify the speech section, vibration information for identifying the user's speech section may be obtained.
  • the processor 2400 identifies the user's voice input by the actual user's utterance, excluding the non-voice section not caused by the user's utterance from the received audio signal.
  • the processor 2400 controls the device 2000 or an external device by combining the user's voice signal input through the input unit 2100 and movement information according to the user's movement obtained from the sensing unit 2300 . It is possible to generate control information for The processor 2400 may determine the type of the control command based on the user's voice signal, and determine the attribute value of the control command based on the user's motion information. In order to determine the attribute value of the control command, the processor 2400 determines the user's movement direction and movement size from the user's movement information obtained after the utterance period start point on the basis of the utterance period start point, and the determined movement direction and movement An attribute value of the control information may be determined based on the size.
  • the processor 2400 determines the user's movement direction and movement size through the user's pitch, roll, and yaw movements obtained from the sensing unit 2300 on the basis of the xyz axis, and controls based on the movement direction and the movement size.
  • An attribute value related to the command may be determined.
  • the processor 2400 may use the generated control command to control the device 2000 itself, or transmit the generated control command to an external device connected through a network to control the external device.
  • FIG. 11 is a detailed flowchart of a method for a device to provide control information according to an embodiment of the present disclosure.
  • the input unit 2100 receives an audio signal.
  • the input unit 2100 or the sensing unit 2300 acquires vibration information by the user's utterance, and the sensing unit 2300 includes an acceleration sensor 2310 and a gyroscope sensor 2320, and the user's Motion information can be obtained.
  • the sensing unit 2300 may detect a user's motion and generate an electrical signal or data value related to the sensed user's motion.
  • the processor 2400 analyzes the audio signal input through the input unit 2100 to identify a speech signal pattern by the user's speech to identify the speech section, or the user's speech detected by the sensing unit 2300 .
  • the user's speech state is determined by detecting the vibration generated by the
  • the processor 2400 identifies the user's voice input due to the actual user's speech, except for the non-voice section that is not caused by the user's speech from the received audio signal.
  • the processor 2400 controls the device 2000 or an external device by combining the user's voice signal input through the input unit 2100 and movement information according to the user's movement obtained from the sensing unit 2300 . It is possible to generate control information for The processor 2400 may determine the type of the control command based on the user's voice signal, and determine the attribute value of the control command based on the user's motion information. In order to determine the attribute value of the control command, the processor 2400 determines the user's movement direction and movement size from the user's movement information obtained after the utterance period start point on the basis of the utterance period start point, and the determined movement direction and movement An attribute value of the control information may be determined based on the size.
  • FIG. 12 is a flowchart illustrating a method of controlling an external device according to an embodiment of the present disclosure.
  • the device 1200 may receive an audio signal, and in operation S1220 , the device 1200 may obtain vibration information and movement information of the user due to the user's utterance.
  • the device 1200 identifies a section of the input audio signal due to the user's utterance, and identifies the user's voice input due to the actual user's utterance.
  • the device 1200 generates control information for controlling the external device 1250 by combining the user's voice signal and movement information according to the user's movement, and in operation S1240, the device 1200 connects to a predetermined network.
  • the generated control information is transmitted to the connected external device 1250 through the In operation S1260 , the processor included in the external device 1250 changes the state of the external device by performing a control operation according to the received control information.
  • FIG. 13 is a flowchart illustrating a method of controlling an external device according to another embodiment of the present disclosure.
  • the device 1300 receives an audio signal, and in operation S1311 , the device 1300 acquires vibration information and user movement information due to a user's utterance.
  • the device 1300 transmits the acquired audio signal, vibration information, and user movement information to the external device 1350 .
  • the external device 1350 identifies a section of the input audio signal due to the user's utterance, and identifies the user's voice input due to the actual user's utterance.
  • the external device 1350 identifies a section by the user's utterance among the input audio signals and identifies the user's voice input by the actual user's utterance.
  • the external device 1350 generates control information by combining the user's voice signal and movement information according to the user's movement.
  • the control information may be control information for controlling the device 1300 .
  • the external device 1350 transmits the generated control information to the device 1300 , and the device 1300 receiving the control information in operation S1331 performs a control operation according to the received control information. ) can be changed.
  • the control information may be control information for controlling the external device 1380 .
  • the control information generated by the external device 1350 in operation S1332 is directly transmitted to another external device 1380, or the device 1300 that receives the control information generated by the external device 1350 in operation S1333. may transmit the control information to the external device 1380 again.
  • the processor included in the external device 1380 may perform a control operation according to the received control information.
  • FIG. 14 is a block diagram illustrating a configuration of an electronic device 2000 according to an embodiment of the present disclosure.
  • the electronic device 2000 illustrated in FIG. 14 may include the same components as the devices described with reference to FIGS. 1 to 13 , and the same components may perform all of the operations and functions described with reference to FIGS. 1 to 13 . Accordingly, components of the electronic device 2000 that have not been described so far will be described below.
  • the electronic device 2000 includes a user input unit 1100 , an output unit 1200 , a control unit 1300 , a sensing unit 1400 , a communication unit 1500 , an A/V input unit 1600 , and a memory. (1700) may be included.
  • the user input unit 1100 means a means for a user to input data for controlling the electronic device 2000 .
  • the user input unit 1100 includes a key pad, a dome switch, and a touch pad (contact capacitive method, pressure resistance film method, infrared sensing method, surface ultrasonic conduction method, integral type).
  • a tension measurement method a piezo effect method, etc.
  • a jog wheel a jog switch, and the like, but is not limited thereto.
  • the user input unit 1100 may receive a user input necessary to generate conversation information to be provided to the user.
  • the output unit 1200 may output an audio signal, a video signal, or a vibration signal, and the output unit 1200 may include a display unit 1210 , a sound output unit 1220 , and a vibration motor 1230 . .
  • the vibration motor 1230 may output a vibration signal.
  • the vibration motor 1230 may output a vibration signal corresponding to the output of audio data or video data (eg, a call signal reception sound, a message reception sound, etc.).
  • the sensing unit 1400 may detect a state of the electronic device 2000 or a state around the electronic device 2000 , and transmit the sensed information to the controller 1300 .
  • the sensing unit 1400 includes a magnetic sensor 1410 , an acceleration sensor 1420 , a temperature/humidity sensor 1430 , an infrared sensor 1440 , a gyroscope sensor 1450 , and a position sensor. (eg, GPS) 1460 , a barometric pressure sensor 1470 , a proximity sensor 1480 , and at least one of an illuminance sensor 1490 , but is not limited thereto. Since a function of each sensor can be intuitively inferred from the name of a person skilled in the art, a detailed description thereof will be omitted.
  • the communication unit 1500 may include components for performing communication with other devices.
  • the communication unit 1500 may include a short-range communication unit 1510 , a mobile communication unit 1520 , and a broadcast receiving unit 1530 .
  • Short-range wireless communication unit 151 Bluetooth communication unit, BLE (Bluetooth Low Energy) communication unit, short-range wireless communication unit (Near Field Communication unit), WLAN (Wi-Fi) communication unit, Zigbee (Zigbee) communication unit, infrared ( It may include an IrDA, infrared Data Association) communication unit, a Wi-Fi Direct (WFD) communication unit, an ultra wideband (UWB) communication unit, an Ant+ communication unit, and the like, but is not limited thereto.
  • BLE Bluetooth Low Energy
  • WLAN Wi-Fi
  • Zigbee Zigbee
  • infrared It may include an IrDA, infrared Data Association) communication unit, a Wi-Fi Direct (WFD) communication unit, an ultra wideband (UWB) communication unit, an Ant+ communication unit, and the like, but is not limited thereto.
  • the mobile communication unit 1520 transmits/receives a radio signal to and from at least one of a base station, an external terminal, and a server on a mobile communication network.
  • the wireless signal may include various types of data according to transmission/reception of a voice call signal, a video call signal, or a text/multimedia message.
  • the broadcast receiver 1530 receives a broadcast signal and/or broadcast-related information from the outside through a broadcast channel.
  • the broadcast channel may include a satellite channel and a terrestrial channel.
  • the electronic device 2000 may not include the broadcast receiver 1530 .
  • the communication unit 1500 may transmit/receive information necessary to generate conversation information to be provided to the first user with the second interactive electronic device 3000 , other devices, and the server.
  • the A/V (Audio/Video) input unit 1600 is for inputting an audio signal or a video signal, and may include a camera 1610 , a microphone 1620 , and the like.
  • the camera 1610 may obtain an image frame such as a still image or a moving picture through an image sensor in a video call mode or a shooting mode.
  • the image captured through the image sensor may be processed through the processor 1300 or a separate image processing unit (not shown).
  • the image frame processed by the camera 1610 may be stored in the memory 1700 or transmitted to the outside through the communication unit 1500 .
  • Two or more cameras 1610 may be provided according to the configuration of the terminal.
  • the microphone 1620 receives an external sound signal and processes it as electrical voice data.
  • the microphone 1620 may receive an acoustic signal from an external device or a speaker.
  • the microphone 1620 may use various noise removal algorithms for removing noise generated in the process of receiving an external sound signal.
  • the memory 1700 may store a program for processing and control of the controller 1300 , and may store data input to or output from the electronic device 2000 .
  • the memory 1700 may include a flash memory type, a hard disk type, a multimedia card micro type, a card type memory (eg, SD or XD memory), and a RAM.
  • RAM Random Access Memory
  • SRAM Static Random Access Memory
  • ROM Read-Only Memory
  • EEPROM Electrically Erasable Programmable Read-Only Memory
  • PROM Programmable Read-Only Memory
  • magnetic memory magnetic disk
  • magnetic disk may include at least one type of storage medium among optical disks.
  • Programs stored in the memory 1700 may be classified into a plurality of modules according to their functions, for example, may be classified into a UI module 1710 , a touch screen module 1720 , a notification module 1730 , and the like. .
  • the UI module 1710 may provide a specialized UI, GUI, or the like that interworks with the electronic device 2000 for each application.
  • the touch screen module 1720 may detect a touch gesture on the user's touch screen and transmit information about the touch gesture to the controller 1300 .
  • the touch screen module 1720 according to some embodiments may recognize and analyze a touch code.
  • the touch screen module 1720 may be configured as separate hardware including a controller.
  • the notification module 1730 may generate a signal for notifying the occurrence of an event in the electronic device 2000 .
  • Examples of events generated in the electronic device 2000 include call signal reception, message reception, key signal input, schedule notification, and the like.
  • the notification module 1730 may output a notification signal in the form of a video signal through the display unit 1210 , may output a notification signal in the form of an audio signal through the sound output unit 1220 , and the vibration motor 1230 . It is also possible to output a notification signal in the form of a vibration signal through
  • the electronic device 2000 described in the present disclosure may be implemented as a hardware component, a software component, and/or a combination of a hardware component and a software component.
  • the electronic device 2000 includes a processor, arithmetic logic unit (ALU), application specific integrated circuits (ASICs), digital signal processors (DSPs), digital signal processing devices (DSPDs), programmable logic devices (PLDs), and microcontrollers. It may be implemented using one or more general purpose computers or special purpose computers, such as a computer, microprocessor, or any other device capable of executing and responding to instructions.
  • Software may comprise a computer program, code, instructions, or a combination of one or more of these, which configures a processing device to operate as desired or is independently or collectively processed You can command the device.
  • the software may be implemented as a computer program including instructions stored in a computer-readable storage medium.
  • the computer-readable recording medium includes, for example, a magnetic storage medium (eg, read-only memory (ROM), random-access memory (RAM), floppy disk, hard disk, etc.) and an optically readable medium (eg, CD-ROM). (CD-ROM), DVD (Digital Versatile Disc), etc.
  • the computer-readable recording medium is distributed among computer systems connected through a network, so that the computer-readable code can be stored and executed in a distributed manner.
  • the medium may be readable by a computer, stored in a memory, and executed on a processor.
  • the computer is an apparatus capable of calling a stored instruction from a storage medium and operating according to the disclosed embodiment according to the called instruction, and may include the electronic devices 1000 and 2000 according to the disclosed embodiment.
  • the computer-readable storage medium may be provided in the form of a non-transitory storage medium.
  • 'non-transitory' means that the storage medium does not include a signal and is tangible, and does not distinguish that data is semi-permanently or temporarily stored in the storage medium.
  • the electronic devices 1000 and 2000 or the method according to the disclosed embodiments may be provided as included in a computer program product.
  • Computer program products may be traded between sellers and buyers as commodities.
  • the computer program product may include a software program, a computer-readable storage medium in which the software program is stored.
  • the computer program product is a product in the form of a software program distributed electronically through a manufacturer of the electronic device 1000 or 2000 or an electronic market (eg, Google Play Store, App Store) (eg, downloadable products). application (downloadable application)).
  • the storage medium may be a server of a manufacturer, a server of an electronic market, or a storage medium of a relay server temporarily storing a software program.
  • the computer program product in a system consisting of a server and a terminal, may include a storage medium of a server or a storage medium of a terminal.
  • a third device eg, a smart phone
  • the computer program product may include a storage medium of the third device.
  • the computer program product may include the software program itself transmitted from the server to the terminal or third device, or transmitted from the third device to the terminal.
  • one of the server, the terminal, and the third device may execute the computer program product to perform the method according to the disclosed embodiments.
  • two or more of the server, the terminal, and the third device may execute the computer program product to distribute the method according to the disclosed embodiments.
  • a server may execute a computer program product stored in the server, and may control a terminal communicatively connected with the server to perform the method according to the disclosed embodiments.
  • the third device may execute a computer program product to control the terminal communicatively connected to the third device to perform the method according to the disclosed embodiment.
  • the third device may download the computer program product from the server and execute the downloaded computer program product.
  • the third device may execute the computer program product provided in a preloaded state to perform the method according to the disclosed embodiments.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Human Computer Interaction (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • Acoustics & Sound (AREA)
  • Multimedia (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Signal Processing (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • General Health & Medical Sciences (AREA)
  • User Interface Of Digital Computer (AREA)

Abstract

La présente invention concerne : un dispositif pour reconnaître plus précisément une entrée de parole d'un utilisateur, et effectuer une opération de commande en utilisant l'entrée de parole de l'utilisateur, l'entrée de parole étant reconnue en analysant un signal généré lorsque l'utilisateur réalise un énoncé, ou en utilisant des informations de vibration pour identifier le signal généré par l'énoncé de l'utilisateur parmi des signaux audio entrés dans le dispositif ; et un procédé de commande associé.
PCT/KR2020/003007 2020-02-27 2020-03-03 Dispositif pour générer des informations de commande sur la base d'un état d'énoncé d'un utilisateur, et procédé de commande associé WO2021172641A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020200024554A KR20210109722A (ko) 2020-02-27 2020-02-27 사용자의 발화 상태에 기초하여 제어 정보를 생성하는 디바이스 및 그 제어 방법
KR10-2020-0024554 2020-02-27

Publications (1)

Publication Number Publication Date
WO2021172641A1 true WO2021172641A1 (fr) 2021-09-02

Family

ID=77491689

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2020/003007 WO2021172641A1 (fr) 2020-02-27 2020-03-03 Dispositif pour générer des informations de commande sur la base d'un état d'énoncé d'un utilisateur, et procédé de commande associé

Country Status (2)

Country Link
KR (1) KR20210109722A (fr)
WO (1) WO2021172641A1 (fr)

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20100119250A (ko) * 2009-04-30 2010-11-09 삼성전자주식회사 모션 정보를 이용하는 음성 검출 장치 및 방법
KR101625668B1 (ko) * 2009-04-20 2016-05-30 삼성전자 주식회사 전자기기 및 전자기기의 음성인식방법
US9459176B2 (en) * 2012-10-26 2016-10-04 Azima Holdings, Inc. Voice controlled vibration data analyzer systems and methods
KR20180098079A (ko) * 2017-02-24 2018-09-03 삼성전자주식회사 비전 기반의 사물 인식 장치 및 그 제어 방법
KR20190099988A (ko) * 2018-02-19 2019-08-28 주식회사 셀바스에이아이 기준 화자 모델을 이용한 음성 인식 장치 및 이를 이용한 음성 인식 방법

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR101625668B1 (ko) * 2009-04-20 2016-05-30 삼성전자 주식회사 전자기기 및 전자기기의 음성인식방법
KR20100119250A (ko) * 2009-04-30 2010-11-09 삼성전자주식회사 모션 정보를 이용하는 음성 검출 장치 및 방법
US9459176B2 (en) * 2012-10-26 2016-10-04 Azima Holdings, Inc. Voice controlled vibration data analyzer systems and methods
KR20180098079A (ko) * 2017-02-24 2018-09-03 삼성전자주식회사 비전 기반의 사물 인식 장치 및 그 제어 방법
KR20190099988A (ko) * 2018-02-19 2019-08-28 주식회사 셀바스에이아이 기준 화자 모델을 이용한 음성 인식 장치 및 이를 이용한 음성 인식 방법

Also Published As

Publication number Publication date
KR20210109722A (ko) 2021-09-07

Similar Documents

Publication Publication Date Title
WO2021036644A1 (fr) Procédé et appareil d'animation à commande vocale basés sur l'intelligence artificielle
WO2020189850A1 (fr) Dispositif électronique et procédé de commande de reconnaissance vocale par ledit dispositif électronique
WO2020013428A1 (fr) Dispositif électronique pour générer un modèle asr personnalisé et son procédé de fonctionnement
KR102623272B1 (ko) 전자 장치 및 이의 제어 방법
US9613618B2 (en) Apparatus and method for recognizing voice and text
WO2020122677A1 (fr) Procédé d'exécution de fonction de dispositif électronique et dispositif électronique l'utilisant
WO2020045835A1 (fr) Dispositif électronique et son procédé de commande
WO2018182201A1 (fr) Procédé et dispositif de fourniture de réponse à une entrée vocale d'utilisateur
WO2021071110A1 (fr) Appareil électronique et procédé de commande d'appareil électronique
WO2020153785A1 (fr) Dispositif électronique et procédé pour fournir un objet graphique correspondant à des informations d'émotion en utilisant celui-ci
CN107430856B (zh) 信息处理系统和信息处理方法
WO2018124633A1 (fr) Dispositif électronique et procédé de délivrance d'un message par celui-ci
WO2020080635A1 (fr) Dispositif électronique permettant d'effectuer une reconnaissance vocale à l'aide de microphones sélectionnés d'après un état de fonctionnement, et procédé de fonctionnement associé
WO2016206646A1 (fr) Procédé et système pour pousser un dispositif de machine à générer une action
WO2021118229A1 (fr) Procédé de fourniture d'informations et dispositif électronique prenant en charge ce procédé
WO2021071271A1 (fr) Appareil électronique et procédé de commande associé
JP6798258B2 (ja) 生成プログラム、生成装置、制御プログラム、制御方法、ロボット装置及び通話システム
WO2020231151A1 (fr) Dispositif électronique et son procédé de commande
WO2021172641A1 (fr) Dispositif pour générer des informations de commande sur la base d'un état d'énoncé d'un utilisateur, et procédé de commande associé
EP3850623A1 (fr) Dispositif électronique et son procédé de commande
JP7468360B2 (ja) 情報処理装置および情報処理方法
WO2020130734A1 (fr) Dispositif électronique permettant la fourniture d'une réaction en fonction d'un état d'utilisateur et procédé de fonctionnement correspondant
WO2020116766A1 (fr) Procédé pour générer un modèle de prédiction d'utilisateur pour identifier un utilisateur par des données d'apprentissage, dispositif électronique auquel est appliqué ledit modèle, et procédé pour appliquer ledit modèle
WO2024029875A1 (fr) Dispositif électronique, serveur intelligent et procédé de reconnaissance vocale adaptative d'orateur
WO2022177063A1 (fr) Dispositif électronique et son procédé de commande

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20921873

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20921873

Country of ref document: EP

Kind code of ref document: A1