WO2022102945A1 - Dispositif électronique et procédé de commande associé - Google Patents

Dispositif électronique et procédé de commande associé Download PDF

Info

Publication number
WO2022102945A1
WO2022102945A1 PCT/KR2021/012891 KR2021012891W WO2022102945A1 WO 2022102945 A1 WO2022102945 A1 WO 2022102945A1 KR 2021012891 W KR2021012891 W KR 2021012891W WO 2022102945 A1 WO2022102945 A1 WO 2022102945A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound source
microphone
source component
audio signal
objects
Prior art date
Application number
PCT/KR2021/012891
Other languages
English (en)
Korean (ko)
Inventor
박민규
김호연
이형선
Original Assignee
삼성전자(주)
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 삼성전자(주) filed Critical 삼성전자(주)
Publication of WO2022102945A1 publication Critical patent/WO2022102945A1/fr

Links

Images

Classifications

    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • H04R3/005Circuits for transducers, loudspeakers or microphones for combining the signals of two or more microphones
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • G01S3/808Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems
    • GPHYSICS
    • G01MEASURING; TESTING
    • G01SRADIO DIRECTION-FINDING; RADIO NAVIGATION; DETERMINING DISTANCE OR VELOCITY BY USE OF RADIO WAVES; LOCATING OR PRESENCE-DETECTING BY USE OF THE REFLECTION OR RERADIATION OF RADIO WAVES; ANALOGOUS ARRANGEMENTS USING OTHER WAVES
    • G01S3/00Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received
    • G01S3/80Direction-finders for determining the direction from which infrasonic, sonic, ultrasonic, or electromagnetic waves, or particle emission, not having a directional significance, are being received using ultrasonic, sonic or infrasonic waves
    • G01S3/802Systems for determining direction or deviation from predetermined direction
    • G01S3/808Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems
    • G01S3/8083Systems for determining direction or deviation from predetermined direction using transducers spaced apart and measuring phase or time difference between signals therefrom, i.e. path-difference systems determining direction of source
    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R3/00Circuits for transducers, loudspeakers or microphones
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/01Aspects of volume control, not necessarily automatic, in sound systems
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04RLOUDSPEAKERS, MICROPHONES, GRAMOPHONE PICK-UPS OR LIKE ACOUSTIC ELECTROMECHANICAL TRANSDUCERS; DEAF-AID SETS; PUBLIC ADDRESS SYSTEMS
    • H04R2430/00Signal processing covered by H04R, not provided for in its groups
    • H04R2430/20Processing of the output signals of the acoustic transducers of an array for obtaining a desired directivity characteristic
    • H04R2430/25Array processing for suppression of unwanted side-lobes in directivity characteristics, e.g. a blocking matrix

Definitions

  • the present invention relates to an electronic device and a method for controlling the same, and more particularly, to an electronic device for displaying an image including an object and identifying an object corresponding to a sound source component, and a method for controlling the same.
  • the music tracking function has been in the spotlight.
  • the sound source tracking function is a technology that tracks the location of the sound source through analysis of the received signal.
  • the sound source tracking function is being applied to various fields such as voice recognition, and since it can improve convenience such as voice recognition, the field of using the sound source tracking function is gradually increasing.
  • the conventional sound source tracking function is performed for a single sound source or is performed only for a single representative sound source among a plurality of sound sources, not only is it impossible to perform an integrated tracking of a plurality of sound sources, but also adaptive tracking in response to the movement of the sound source There is an impossible problem with this. This problem causes inconvenience to the sound source tracking function, and is a cause of lowering the usability.
  • An object of the present invention is to provide an electronic device capable of improving the efficiency and usability of a sound source tracking function by enabling integrated tracking of a plurality of sound sources and adaptive tracking corresponding to the movement of sound sources, and a control method thereof will do
  • To an electronic device including a processor for performing an operation related to the content with respect to any one object corresponding to the identified position of the microphone among the plurality of objects in the displayed image based on the arrangement of the plurality of microphones can be achieved by
  • the processor identifies, among the plurality of sound source components, any one sound source component having a large frequency component in the same frequency band as the frequency component of the audio signal as a sound source component corresponding to the audio signal.
  • the processor identifies a position in the image corresponding to the identified microphone, and identifies any one object close to the identified position among a plurality of objects displayed on the image as an object corresponding to the position of the microphone .
  • the processor displays an object corresponding to the position of the microphone to be distinguishable from other objects.
  • the processor displays the object corresponding to the sound source component to be distinguishable from other objects.
  • the processor displays a user interface indicating a mutual positional relationship between the identified microphone and an object corresponding to the position of the microphone.
  • the processor displays an object corresponding to a sound source component selected according to a user input among the plurality of sound source components to be distinguishable from other objects.
  • the processor adjusts a size of the sound source component corresponding to an object selected according to a user input among the plurality of objects.
  • the processor updates a correspondence between the microphone and the sound source component based on the frame of the audio signal.
  • An object of the present invention is to display an image including a plurality of objects based on a video signal included in the content; obtaining a plurality of sound source components according to frequency characteristics from a plurality of audio signals included in the content acquired with a plurality of microphones; identifying a microphone that has obtained an audio signal corresponding to the sound source component from among the plurality of microphones based on a correlation between the frequency characteristic between the audio signal and the sound source component during reproduction of the content; and performing an operation related to the content with respect to any one object corresponding to the identified position of the microphone among the plurality of objects in the displayed image based on the arrangement of the plurality of microphones. It can also be achieved by the control method of
  • any one sound source component having a large frequency component in the same frequency band as the frequency component of the audio signal from among the plurality of sound source components is selected as the audio
  • the method further includes identifying a sound source component corresponding to the signal.
  • the performing of the operation related to the content may include: identifying a location in the image corresponding to the identified microphone; and identifying any one object close to the identified position as an object corresponding to the position of the microphone from among the plurality of objects displayed on the image.
  • the performing the operation related to the content further includes displaying an object corresponding to the position of the microphone to be distinguishable from other objects.
  • the performing the operation related to the content further includes displaying an object corresponding to the sound source component to be distinguishable from other objects.
  • the performing of the content-related operation may further include displaying a user interface indicating a mutual positional relationship between the identified microphone and an object corresponding to the position of the microphone.
  • the displaying to be distinguishable from other objects further includes displaying an object corresponding to a sound source component selected according to a user input from among the plurality of sound source components so as to be distinguishable from other objects.
  • the performing the operation related to the content further includes performing an operation of adjusting the size of the sound source component corresponding to the object selected according to a user input among the plurality of objects.
  • the performing the operation related to the content further includes updating a correspondence between the microphone and the sound source component based on the frame of the audio signal.
  • the above object of the present invention is to provide a computer-readable code, in which a computer program including a code for performing a control method of an electronic device is stored, a plurality of objects based on a video signal included in the content.
  • displaying an image including; obtaining a plurality of sound source components according to frequency characteristics from a plurality of audio signals included in the content acquired with a plurality of microphones; identifying a microphone that has obtained an audio signal corresponding to the sound source component from among the plurality of microphones based on a correlation between the frequency characteristic between the audio signal and the sound source component during reproduction of the content; and performing an operation related to the content with respect to any one object corresponding to the identified position of the microphone among a plurality of objects in the displayed image based on the arrangement of the plurality of microphones It can also be achieved by a recording medium on which a computer readable program is recorded.
  • an electronic device capable of improving the efficiency and usability of a sound source tracking function by enabling integrated tracking of a plurality of sound sources and adaptive tracking in response to the movement of sound sources, and a method for controlling the same .
  • FIG 1 illustrates an electronic device according to an embodiment of the present invention.
  • FIG. 2 shows an example of the configuration of the electronic device of FIG. 1 .
  • FIG. 3 shows a specific example of the configuration of the electronic device of FIG. 1 .
  • FIG. 4 shows an example of a control method for the electronic device of FIG. 1 .
  • FIG. 5 shows a specific example of identifying the correspondence between the sound source component and the microphone based on the audio signal of the sound source component and the microphone in relation to operation S43 of FIG. 4 .
  • FIG. 6 shows a specific example of identifying a relationship between a sound source component and an object based on a position of a microphone corresponding to the sound source component in relation to operation S44 of FIG. 4 .
  • FIG. 7 illustrates a specific example of distinguishing and displaying an object corresponding to a sound source component as an example of an operation related to content with respect to an object corresponding to a position of a microphone in relation to operation S44 of FIG. 4 .
  • FIG. 8 is another example of an operation related to content with respect to an object corresponding to the position of the microphone, in relation to operation S44 of FIG. 4 , and shows a specific example indicating a mutual positional relationship between the microphone and the object.
  • FIG. 9 illustrates a specific example of performing an operation related to content in response to an event for a sound source component in relation to operation S44 of FIG. 4 .
  • FIG. 10 illustrates a specific example of performing an operation related to content in response to an event on an object in relation to operation S44 of FIG. 4 .
  • FIG. 11 shows a specific example of updating the correspondence between the sound source component and the microphone in relation to operation S43 of FIG. 4 .
  • FIG. 12 shows a specific example of updating an operation related to content according to the updated correspondence between the sound source component and the microphone of FIG. 11 .
  • FIG 1 illustrates an electronic device according to an embodiment of the present invention.
  • the electronic device 10 includes an image display device such as a TV, a smart phone, a tablet, a portable media player, a wearable device, a video wall, an electronic picture frame, and the like, as well as a set-top that does not have a display 13 . It is implemented with various types of devices, such as image processing devices such as boxes, household appliances such as refrigerators and washing machines, and information processing devices such as computer bodies. In addition, the electronic device 10 is implemented as an AI speaker, an AI robot, etc. equipped with an artificial intelligence (AI) function. The type of the electronic device 10 is not limited thereto. Hereinafter, for convenience of description, it is assumed that the electronic device 10 is implemented as a TV.
  • AI artificial intelligence
  • the electronic device 10 displays the image 5 based on the video signal.
  • the image 5 is output through the display 13 .
  • the video signal includes video signals of various contents.
  • the content includes, but is not limited to, various types of multimedia content such as news, drama, and movie.
  • the image 5 includes a plurality of objects 1 , 2 , 3 . Objects include, but are not limited to, people, animals, things, and the like.
  • the electronic device 10 outputs audio based on the audio signal. Audio is output through the speaker 15 .
  • the speaker 15 includes a built-in speaker provided in the display device 10 or an external speaker provided outside. However, for convenience of description, it is assumed that the speaker 15 is a built-in speaker.
  • the audio signal includes an audio signal of the content corresponding to the video signal of the content. Accordingly, the audio based on the audio signal is outputted corresponding to the image 5 based on the video signal.
  • the audio signal includes audio signals of various types of content.
  • the audio signal may include a plurality of sound source components.
  • the plurality of sound source components may correspond to the plurality of objects 1, 2, 3 in the image 5 .
  • the electronic device 10 extracts a plurality of sound source components from the audio signal, and identifies the relationship between the extracted sound source components and the plurality of objects 1 , 2 , and 3 .
  • the first sound source component, the second sound source component, and the third sound source component extracted from the audio signal are applied to the first object 1, the second object 2, and the third object 3 in the image 5, respectively. Correspondence can be identified.
  • the electronic device 10 identifies the relationship between the sound source component and the plurality of microphones ( 60 in FIG. 6 ) that have received the audio signal. 6 for convenience of explanation, the electronic device 10 extracts a first sound source component or a third sound source component from the audio signal obtained by the plurality of microphones 60, and the extracted first sound source component or the second sound source component It can be identified that the three sound source components correspond to the first microphones (61 of FIG. 6 ) to the third microphones (63 of FIG. 6 ) of the plurality of microphones 60 , respectively.
  • the correspondence between the sound source component and the microphone may be identified based on whether there is a correlation between the sound source component and the audio signal obtained with the microphone. For example, when the correlation between the first sound source component and the audio signal acquired through the first microphone 61 is high, the first sound source component may be identified as corresponding to the first microphone 61 . Relevance identification will be described in more detail with reference to FIG. 5 . In this way, the correspondence between the plurality of sound source components and the plurality of microphones can be identified.
  • the electronic device 10 identifies the relationship between the sound source component and the object by identifying the object corresponding to the position of the microphone. For convenience of explanation, assuming that the first sound source component corresponds to the first microphone 61 and the first object 1 corresponds to the position of the first microphone 61 , the first sound source component corresponds to the first object It can be identified as corresponding to (1). The process of identifying the relationship between the sound source component and the object by identifying the object corresponding to the position of the microphone will be described in more detail with reference to FIGS. 3 and 6 .
  • the electronic device 10 performs a content-related operation with respect to an object identified as corresponding to the position of the microphone. For example, when the correspondence between the first sound source component and the first object 1 is identified, the electronic device 10 displays the user interface 4 indicating the correspondence between the first sound source component and the first object 1 . can be displayed.
  • the operation related to the content with respect to the object is not limited to the above, it may be implemented in various ways. This will be described in more detail with reference to FIGS. 7 and 8 .
  • the electronic device 10 displays an image 5 including a plurality of objects 1, 2, and 3, and an object corresponding to each sound source component based on an audio signal obtained by the plurality of microphones 60 may be identified, and various operations related to content may be performed on the identified object.
  • each sound source component can be identified based on the positions of the plurality of microphones 60 from which the audio signal is obtained, it is possible not only to perform integrated sound source tracking for the sound source component, but also to examine the relationship between the sound source component and the object. Integrated identification is possible. Therefore, compared to individual sound source tracking for a single sound source, the efficiency and usability of the sound source tracking function can be improved.
  • FIG. 2 shows an example of the configuration of the electronic device of FIG. 1 .
  • the configuration of the electronic device 10 will be described in detail with reference to FIG. 2 .
  • the electronic device 10 may be implemented as various types of devices, and thus the present embodiment does not limit the configuration of the electronic device 10 .
  • the electronic device 10 is not implemented as a display device such as a TV.
  • the electronic device 10 may not include components for displaying an image, such as the display 13 .
  • the electronic device 10 when the electronic device 10 is implemented as a set-top box, the electronic device 10 outputs an image signal to an external TV through the interface unit 11 .
  • the electronic device 10 includes an interface unit 11 .
  • the interface unit 11 transmits/receives data by connecting to an external device or the like. However, since the present invention is not limited thereto, the interface unit 11 connects to various devices connected through a network.
  • the interface unit 11 includes a wired interface unit.
  • the wired interface unit includes a connector or port to which an antenna capable of receiving a broadcast signal according to a broadcasting standard such as terrestrial/satellite broadcasting is connected, or a cable capable of receiving a broadcast signal according to the cable broadcasting standard is connected.
  • the electronic device 10 may have a built-in antenna capable of receiving a broadcast signal.
  • the wired interface includes a connector or port according to video and/or audio transmission standards, such as HDMI port, DisplayPort, DVI port, Thunderbolt, Composite video, Component video, Super Video, SCART, etc. includes
  • the wired interface unit includes a connector or port according to a universal data transmission standard such as a USB port.
  • the wired interface unit includes a connector or a port to which an optical cable can be connected according to an optical transmission standard.
  • the wired interface unit includes an internal audio receiver.
  • the wired interface unit is connected to an external audio device having an audio receiver and includes a connector or a port capable of receiving or inputting an audio signal from the audio device.
  • the wired interface unit is connected to an audio device such as a headset, earphone, or external speaker, and includes a connector or port capable of transmitting or outputting an audio signal to the audio device.
  • the wired interface unit includes a connector or port according to a network transmission standard such as Ethernet.
  • the wired interface unit is implemented as a LAN card connected to a router or a gateway by wire.
  • the wired interface unit is connected to an external device such as a set-top box, an optical media player, or an external display device, a speaker, a server, etc. in a 1:1 or 1:N (N is a natural number) method through the connector or port, so that the corresponding It receives a video/audio signal from an external device or transmits a video/audio signal to the corresponding external device.
  • the wired interface unit may include a connector or a port for separately transmitting video/audio signals.
  • the wired interface unit may be embedded in the electronic device 10 , or implemented in the form of a dongle or a module to be detachably attached to the connector of the electronic device 10 .
  • the interface unit 11 includes a wireless interface unit.
  • the wireless interface unit is implemented in various ways corresponding to the implementation form of the electronic device 10 .
  • the wireless interface unit uses wireless communication such as RF (Radio Frequency), Zigbee, Bluetooth, Wi-Fi, UWB (Ultra-Wideband) and NFC (Near Field Communication) as a communication method.
  • the wireless interface unit is implemented as a wireless communication module for performing wireless communication with the AP according to the Wi-Fi method or a wireless communication module for performing one-to-one direct wireless communication such as Bluetooth.
  • the wireless interface unit transmits and receives data packets by wirelessly communicating with an external device on the network.
  • the wireless interface unit includes an IR transmitter and/or an IR receiver capable of transmitting and/or receiving an IR (Infrared) signal according to an infrared communication standard.
  • the wireless interface unit receives or inputs a remote controller signal from the remote controller or other external device through the IR transmitter and/or the IR receiver, or transmits or outputs a remote controller signal to the remote controller or other external device.
  • the electronic device 10 transmits/receives a remote controller signal to and from a remote controller or other external device through a wireless interface unit of another method such as Wi-Fi or Bluetooth.
  • the remote controller includes a smart phone and the like, and a remote controller application is installed on the smart phone or the like.
  • a smartphone or the like performs a function of a remote controller, for example, a function of controlling the electronic device 10 through an installed application.
  • These remote controller applications are installed in various external devices such as AI speakers and AI robots.
  • the electronic device 10 When the video/audio signal received through the interface unit 11 is a broadcast signal, the electronic device 10 further includes a tuner for tuning the received broadcast signal for each channel.
  • the electronic device 10 includes a communication unit 12 .
  • the communication unit 12 is connected to an external device and the like to transmit video/audio signals.
  • the communication unit 12 includes at least one of a wired interface unit and a wireless interface according to a design method, and performs at least one function of the wired interface unit and the wireless interface.
  • the electronic device 10 includes a display 13 .
  • the display 13 includes a display panel capable of displaying an image on the screen.
  • the display panel may have various screen sizes. For example, it may be provided in various forms having different screen sizes, such as a vertical type and a horizontal type.
  • the display panel is provided with a light-receiving structure such as a liquid crystal type or a self-luminous structure such as an OLED type.
  • the display 13 may further include additional components according to the structure of the display panel.
  • the display panel is a liquid crystal type
  • the display 13 includes a liquid crystal display panel, a backlight unit for supplying light, and liquid crystal and a panel driving substrate for driving the liquid crystal of the display panel.
  • the display 13 is omitted when the electronic device 10 is implemented as a set-top box or the like.
  • the electronic device 10 includes a user input unit 14 .
  • the user input unit 14 includes various types of input interface related circuits that are provided to allow a user to operate in order to receive a user input.
  • the user input unit 14 may be configured in various forms depending on the type of the electronic device 10 , for example, a mechanical or electronic button unit of the electronic device 10 , a touch pad, a touch screen installed on the display 13 , etc. There is this.
  • the electronic device 10 includes a speaker 15 .
  • the speaker 15 may be implemented as a speaker that outputs audio based on an audio signal.
  • the speaker includes an internal speaker or an external speaker provided in an external device. When audio is output through the external speaker, the audio signal may be transmitted to the external device through the interface unit 11 .
  • the communication unit 12, the display 13, the user input unit 14, the speaker 15, etc. have been described as separate components from the interface unit 11, they may be configured to be included in the interface unit 11 depending on a design method. there is.
  • the electronic device 10 includes a storage unit 16 .
  • the storage unit 16 stores digitized data.
  • the storage unit 16 includes storage of non-volatile properties capable of preserving data regardless of whether or not power is provided.
  • the storage includes a flash memory, a hard-disc drive (HDD), a solid-state drive (SSD), a read only memory (ROM), and the like.
  • the storage unit 16 is loaded with data to be processed by the processor 6 , and includes a memory having a volatile property that cannot store data when power is not provided.
  • the memory includes a buffer, a random access memory, and the like.
  • the electronic device 10 includes a processor 6 .
  • the processor 6 includes one or more hardware processors implemented with a CPU, a chipset, a buffer, a circuit, etc. mounted on a printed circuit board, and may be implemented as a SOC (System on Chip) depending on a design method.
  • the processor 6 includes modules corresponding to various processes such as a demultiplexer, a decoder, a scaler, an audio digital signal processor (DSP), and an amplifier.
  • DSP audio digital signal processor
  • some or all of these modules are implemented as SOC.
  • a module related to image processing such as a demultiplexer, a decoder, and a scaler may be implemented as an image processing SOC
  • an audio DSP may be implemented as a chipset separate from the SOC.
  • the configuration of the electronic device 10 is not limited to that shown in FIG. 2 , some of the above-described components may be excluded or include components other than the above-described components according to a design method.
  • the electronic device 10 may include a camera.
  • the camera photographs the front of the electronic device 10 . Presence, movement, etc. of the user may be identified in the image captured by the camera.
  • the camera is implemented as a CMOS (Complementary Metal Oxide Semiconductor) or CCD (Charge Coupled Device) type camera.
  • CMOS Complementary Metal Oxide Semiconductor
  • CCD Charge Coupled Device
  • the camera is not limited to the internal camera, and includes a camera provided in an external device. In this case, an image signal obtained by the camera from an external device may be received or input through the interface unit 11 .
  • the electronic device 10 may include an audio receiver.
  • the audio receiver may be provided in the main body of the electronic device 10 , but is not limited thereto, and thus may be provided outside.
  • a voice recognition function may be performed with respect to a voice command received through the audio receiver.
  • the voice recognition function includes performing voice recognition processing on a voice command to obtain a recognition result, and an operation corresponding to the obtained recognition result.
  • Speech recognition processing includes a speech-to-text (STT) processing process of converting a voice command into text data, and a command identification and execution process of identifying a command indicated by the text data and performing an operation indicated by the identified command do.
  • STT speech-to-text
  • All of the voice recognition processing may be executed in the electronic device 10 , but in consideration of the system load and required storage capacity, at least a part of the process is performed by at least one server communicatively connected to the electronic device 10 through a network. is carried out
  • at least one server performs an STT processing process
  • the electronic device 10 performs a command identification and execution process.
  • the at least one server may perform both the STT processing process and the command identification and execution process, and the electronic device 10 may only receive the result from the at least one server.
  • the reception of the voice command may be performed by the audio receiver, or the voice command may be received through a remote controller separated from the main body.
  • the remote controller includes a smartphone, as described above.
  • a remote controller is used, a voice signal corresponding to a voice command is received from the remote controller, and voice recognition processing is performed on the received voice signal.
  • the processor 6 of the electronic device 10 builds an AI system by applying an AI technology using a rule-based or AI algorithm to at least some of data analysis, processing, and result information generation for the above-described operations.
  • the AI system is a computer system that implements human-level intelligence, and the machine learns and judges on its own, and the recognition rate improves the more it is used.
  • AI technology is composed of elemental technologies that simulate functions such as cognition and judgment of the human brain using at least one of machine learning, a neural network, or a deep learning algorithm.
  • the element technologies are linguistic understanding technology that recognizes human language/text, visual understanding technology that recognizes objects as if they were human eyes, reasoning/prediction technology that logically infers and predicts by judging information, and uses human experience information as knowledge data. It may include at least one of a knowledge expression technology that is processed with
  • Linguistic understanding is a technology for recognizing and applying/processing human language/text, and includes natural language processing, machine translation, dialogue system, question and answer, and speech recognition/synthesis.
  • Visual understanding is a technology for recognizing and processing objects like human vision, and includes object recognition, object tracking, image search, human recognition, scene understanding, spatial understanding, image improvement, and the like.
  • Inferential prediction is a technology for logically reasoning and predicting by judging information, and includes knowledge/probability-based reasoning, optimization prediction, preference-based planning, and recommendation.
  • Knowledge expression is a technology that automatically processes human experience information into knowledge data, and includes knowledge construction (data generation/classification) and knowledge management (data utilization).
  • the processor 6 performs the functions of the learning unit and the recognition unit together.
  • the learning unit performs a function of generating a learned neural network network
  • the recognition unit performs a function of recognizing, inferring, predicting, estimating, and judging data using the learned neural network network.
  • the learning unit creates or updates the neural network.
  • the learning unit acquires learning data to generate a neural network.
  • the learning unit acquires the learning data from the storage unit 16 or from the outside.
  • the learning data may be data used for learning of the neural network, and the neural network may be trained by using the data obtained by performing the above-described operation as learning data.
  • the learning unit performs preprocessing on the acquired training data before training the neural network using the training data, or selects data to be used for learning from among a plurality of training data. For example, the learning unit processes the learning data in a preset format, filters, or adds/remove noise to form data suitable for learning. The learning unit generates a neural network set to perform the above-described operation by using the pre-processed learning data.
  • the learned neural network is composed of a plurality of neural network networks or layers. Nodes of the plurality of neural networks have weights, and the plurality of neural networks are connected to each other so that an output value of one neural network is used as an input value of another neural network.
  • Examples of neural networks include Convolutional Neural Network (CNN), Deep Neural Network (DNN), Recurrent Neural Network (RNN), Restricted Boltzmann Machine (RBM), Deep Belief Network (DBN), Bidirectional Recurrent Deep Neural Network (BRDNN) and Including models such as Deep Q-Networks.
  • the recognition unit acquires target data to perform the above-described operation.
  • the target data is obtained from the storage unit 16 or is obtained from the outside.
  • the target data may be data to be recognized by the neural network.
  • the recognition unit performs preprocessing on the acquired target data before applying the target data to the learned neural network, or selects data to be used for recognition from among a plurality of target data. For example, the recognition unit processes target data into a preset format, filters, or adds/remove noise to form data suitable for recognition.
  • the recognition unit obtains an output value output from the neural network by applying the preprocessed target data to the neural network.
  • the recognition unit obtains a probability value or a reliability value together with the output value.
  • FIG. 3 shows a specific example of the configuration of the electronic device of FIG. 1 .
  • each configuration described below may be an operation in which the process of the processor 6 executing the program stored in the storage unit 16 is divided by roles.
  • the electronic device 10 includes a frequency analyzer 31 .
  • the frequency analyzer 31 performs frequency analysis on the audio signal of the content.
  • the audio signal is an audio signal corresponding to the video signal of the content, and may be obtained through a plurality of microphones 60 including the first microphone 61 to the third microphone 63 .
  • Frequency analysis includes analysis of frequency characteristics of an audio signal.
  • the frequency characteristic includes a pattern, a waveform, a period, an intensity, and the like for a frequency component of an audio signal.
  • the frequency analysis unit 31 provides 0.
  • the frequency analyzer 31 performs frequency analysis on the audio signals obtained for each of the plurality of microphones 60 .
  • the frequency analysis unit 31 performs frequency analysis on the audio signal received through the first microphone 61 , the audio signal received through the second microphone 62 , and the audio signal received through the third microphone 63 .
  • the frequency analysis includes analysis of patterns, waveforms, periods, strengths, etc. of the frequency components of the audio signals acquired for each of the plurality of microphones 60 .
  • the frequency analysis result may be one in which the gain, intensity, etc. of the audio signal for each of the plurality of microphones 60 are appropriately adjusted, and may be sampled if necessary.
  • the frequency analysis results for the audio signals for each of the plurality of microphones 60 are provided to the correlation analysis unit 33 .
  • the electronic device 10 includes a sound source component analyzer 32 .
  • the sound source component analyzer 32 receives a frequency analysis result for the audio signal from the frequency analyzer 31 .
  • the sound source component analyzer 32 separates a plurality of sound source components from the audio signal based on the frequency analysis result of the audio signal.
  • the sound source component analyzer 32 may separate the plurality of sound source components according to whether or not they correspond to a specific frequency component, for example, in consideration of the fact that frequency characteristics may be different for each of the plurality of sound source components.
  • Blind Source Separation (BBS) algorithms such as Independent Component Analysis (ICA) and Geometric Source Separation (GSS) may be used to separate sound components, but The present invention is not limited thereto.
  • the sound source component analyzer 32 provides information on a plurality of separated sound source components to the association analyzer 33 .
  • the electronic device 10 includes a correlation analysis unit 33 .
  • the correlation analysis unit 33 receives the frequency analysis results for the audio signals for each of the plurality of microphones 60 from the frequency analysis unit 31 , and receives information on the plurality of sound source components from the sound source component analysis unit 32 . .
  • the correlation analyzer 33 analyzes the correlation between the sound source component and the audio signal of the microphone by using the frequency analysis result of the audio signal for each of the plurality of microphones 60 and information on the plurality of sound source components.
  • the first sound source component is the most dominant or has a major influence on the audio signal of the first microphone 61 . It is based on the principle that relevance, similarity, etc. will be the highest in frequency characteristics.
  • Such correlation analysis may be performed based on Equation [1] below.
  • Equation [1] means the correlation coefficient (R) between the audio signal of the (t)-th microphone and the (n)-th sound source component, and (*) means a conjugated complex number when the audio signal is a complex number.
  • the calculation of the correlation coefficient R may be performed for each frequency band.
  • the correlation analysis unit 33 calculates the correlation coefficient (R) by using Equation [1], and identifies the sound source component having the highest correlation coefficient (R) with respect to the audio signal of each microphone.
  • the sound source component having the highest correlation coefficient R with respect to the audio signal of the first microphone 61 is the first sound source component.
  • the highest correlation coefficient (R) between the audio signal of the first microphone 61 and the first sound source component is that the first sound source component is the most important sound source component in the audio signal of the first microphone 61, and thus the first sound source component and the correlation between the audio signals of the first microphone 61 is the highest.
  • the correlation analyzer 33 calculates the first sound source component and the first microphone 61 based on the correlation coefficient R between the first sound source component and the audio signal of the first microphone 61 calculated through Equation [1]. can identify the relationship between the audio signals of
  • the association analysis unit 33 may perform standardization on Equation [1] by using Equation [2] below in calculating the association coefficient (R).
  • Equation [2] When Equation [2] is divided by Equation [1], the correlation coefficient N on which Equation [1] is standardized can be obtained.
  • the specific sound source component may include a sound source component having a size, intensity, etc. of a frequency greater than that of other sound source components.
  • the association analysis unit 33 identifies a relationship between a sound source component and an audio signal of a microphone through association analysis, and identifies a relationship between a sound source component and a microphone according to the relationship between the sound source component and the audio signal of the microphone. For example, the association analyzer 33 may identify the relationship between the first sound source component and the first microphone 61 according to the relationship between the first sound source component and the audio signal of the first microphone 61 .
  • the correlation analysis unit 33 provides information about the relationship between the sound source component and the microphone to the location tracking unit 34 .
  • the electronic device 10 includes a location tracking unit 34 .
  • the location tracking unit 34 receives information about the relationship between the sound source component and the microphone from the correlation analysis unit 33 .
  • the position tracking unit 34 identifies an object corresponding to the sound source component by using information on the relationship between the sound source component and the microphone and information on the position of the microphone.
  • the information about the position of the microphone may be received together with or separately from the audio signal.
  • an image 5 obtained by photographing a plurality of real objects may be displayed on the display 13 .
  • a plurality of objects 1 , 2 , and 3 corresponding to the plurality of real objects 71 , 72 , 73 are displayed.
  • Audio of the first to third sound source components uttered by the plurality of real objects 71 , 72 , and 73 are received as audio signals through the plurality of microphones 60 .
  • the location tracking unit 34 identifies the location of the first microphone 61 among the plurality of microphones 60 based on the arrangement environment information of the plurality of microphones 60 .
  • the arrangement environment information of the plurality of microphones 60 may be included in the information regarding the positions of the microphones described above.
  • the position tracking unit 34 includes a plurality of microphones 60 arranged in a circular shape with the same angle based on the arrangement environment information of the plurality of microphones 60 , and the first microphone 61 is selected from among the plurality of microphones 60 . It can be identified as being located on the leftmost side.
  • the position tracking unit 34 identifies the positions of the plurality of real objects 71, 72, 73 that have uttered the first sound source component to the third sound source component, and identifies the first It is possible to identify the first real object 71 that is closest to the position of the microphone 61 .
  • Location identification includes identification with respect to at least one of a distance or a direction.
  • the location tracking unit 34 uses the following Equation [3] to identify the actual location of the first object 71 .
  • Equation [3] is a signal in the frequency domain of the audio signal acquired with the first microphone 61, is a signal in the frequency domain of the audio signal acquired by the second microphone 62 . in a certain frequency band
  • the value of t that maximizes is the time difference between the audio signals arriving at the first microphone 61 and the second microphone 62 .
  • Equation [3] is related to the GCC-PHAT value of a generalized cross-correlation function (GCC) that is normally used for tracking the location of the actual first object 71 . Accordingly, the value of t at which the value of Equation [3] becomes the maximum is calculated while changing the value of t for each predetermined frequency band. The value of t may be different for each frequency band, which is due to external factors such as noise and other measurement errors for each frequency band. Accordingly, the value of t at which the GCC-PHAT value becomes the maximum is calculated while observing the change of the GCC-PHAT value according to the change of the value of t for each frequency band.
  • GCC generalized cross-correlation function
  • the actual position of the first object 71 may be identified based on a time difference according to the calculated value of t. It is assumed that the audio signal of the real first object 71 is first received by the first microphone 61 closer to the real first object 71 and is received later by the value of t by the second microphone 62 . The actual position of the first object 71 may be found by calculating the angle ⁇ between the first microphone 61 and the second microphone 62 and the actual first object 71 .
  • the difference ( ⁇ S) between the distance from the real first object 71 to the first microphone 61 and the distance from the real first object 71 to the second microphone 62 is obtained by the following equation [4] can be expressed together.
  • (v) is the speed of the audio signal
  • (d) is the arrangement distance of the first microphone 61 and the second microphone 62.
  • the arrangement of the plurality of microphones 60 it is assumed that the first microphone 61 and the second microphone 62 have the same angle and are arranged in a circle, but the present invention is not limited thereto. Includes shapes arranged in a straight line. Accordingly, the angle ⁇ between the first and second microphones 61 and 62 and the actual first object 71 may be calculated through the following Equation [5].
  • the actual position of the first object 71 may be estimated based on the angle ⁇ calculated by Equation [5]. In the same way, the positions of the actual second object 72 and the actual third object 73 may be estimated.
  • the position tracking unit 34 may identify the positions of the plurality of real objects 71 , 72 , 73 by utilizing equations [3] to [5]. The above position tracking principle is applicable in 3D space. If the number of microphones is increased, the position of each object in 3D space can be estimated.
  • the location tracking unit 34 may identify the first object 1 corresponding to the first real object 71 from among the plurality of objects 1, 2, 3 in the image 5 . To this end, the position tracking unit 34 identifies the positions of the virtual microphones 50 corresponding to the plurality of microphones 60 in the image 5 . Information about the location of the virtual microphone 50 may be included in the information about the location of the microphone. Depending on the design method, the location of the virtual microphone 50 may be set to the lower center of the screen on which the image 5 is displayed, but is not limited thereto.
  • the location tracking unit 34 includes a first virtual microphone corresponding to the first microphone 61 among the virtual microphones 50 ( 51 ), the first object 1 closest to the location may be identified as corresponding to the first real object 71 .
  • the location tracking unit 34 may identify the first object 1 corresponding to the first real object 71 as the first object 1 corresponding to the first sound source component.
  • the relationship between the sound source component and the object may be updated. For example, when an audio signal of a microphone highly correlated with a first sound source component corresponding to the first object 71 changes as the first object 71 actually moves, the first sound source component and the first object 1 ) can be updated. More specifically, the association analysis unit 33 performs association analysis for each frame of the audio signal, and determines the relationship between the first sound source component and the first microphone 61 through the association analysis, the first sound source component and the third microphone 63 . ) can be updated. The correlation analyzer 33 may provide the updated information on the relationship between the first sound source component and the third microphone 63 to the location tracking unit 34 .
  • the position tracking unit 34 as previously identified based on the position of the first microphone 61, the actual position of the first object 71 corresponding to the first sound source component, is located at the position of the third microphone 63. Based on the first sound source component, the updated position of the actual first object 71 may be identified.
  • the location tracking unit 34 utilizes the following Equation [6] to identify the updated location of the first real object 71 .
  • the location tracking unit 34 may identify the location of the real object updated based on the new frame and the location of the real object based on the previous frame of the audio signal.
  • ( ⁇ ) in Equation [6] is to consider noise and external factors.
  • the location tracking unit 34 may identify the location of the real object by using the sound source tracking model.
  • the sound source tracking model may be prepared for each sound source component, and may be prepared based on Equation [6]. For example, the location tracking unit 34 generates a first sound source tracking model corresponding to the first sound source component, and updates the actual first object 71 corresponding to the first sound source component based on the first sound source tracking model. location can be tracked.
  • the position tracking unit 34 identifies the updated position of the real first object 71 and is the closest to the real third microphone 63 based on the updated relationship between the first sound source component and the third microphone 63 . It can be identified that the first object 71 corresponds to the first sound source component.
  • the location tracking unit 34 determines that the first object 1 closest to the location of the third virtual microphone 53 corresponding to the third microphone 63 among the virtual microphones 50 is detected as the first real object 71 . ) can be identified as corresponding to The location tracking unit 34 identifies the first object 1 corresponding to the first sound source component based on the relationship between the first sound source component and the first real object 71 even when the first real object 71 moves. can do.
  • the electronic device 10 includes an operation performing unit 35 .
  • the operation performing unit 35 performs an operation related to content with respect to the identified object. For example, when the audio of the first sound source component is output through the speaker 15 , the operation performing unit 35 may have a user interface ( 4) can be displayed. However, since the present invention is not limited thereto, the operation performing unit 35 may perform various operations based on the relationship between the first sound source component and the first object 1 .
  • the electronic device 10 displays an image 5 including a plurality of objects 1, 2, and 3, and an object corresponding to each sound source component based on an audio signal obtained by the plurality of microphones 60 may be identified, and various operations related to content may be performed on the identified object.
  • the electronic device 10 can identify the object corresponding to each sound source component based on the positions of the plurality of microphones 60 from which the audio signal is obtained, not only can the electronic device 10 identify the relationship between the sound source component and the object, but also the object. It is possible to adaptively identify the relationship between sound source components and objects according to the movement of
  • FIG. 4 shows an example of a control method for the electronic device of FIG. 1 .
  • the processor 6 displays the image 5 including a plurality of objects based on the video signal included in the content (S41).
  • the image 5 may include a plurality of objects 1 , 2 , and 3 .
  • the processor 6 acquires a plurality of sound source components according to frequency characteristics from a plurality of audio signals included in the content acquired with the plurality of microphones 60 (S42).
  • the processor 6 identifies the microphone that has obtained the audio signal corresponding to the sound source component among the plurality of microphones 60 based on the audio signal of the microphone and the sound source component during content reproduction (S43). For example, the processor 6 may identify the relationship between the first sound source component having the highest correlation coefficient and the audio signal of the first microphone 61 through the correlation analysis.
  • the processor 6 determines any one object corresponding to the position of the microphone corresponding to the identified sound source component among the plurality of objects 1, 2, 4 in the image 5 based on the arrangement of the plurality of microphones 60 to perform an operation related to the content (S44). For example, the processor 6 identifies the first object 1 corresponding to the position of the first microphone 61 in the image 5 based on the arrangement environment of the plurality of microphones 60 .
  • the electronic device 10 can identify the object corresponding to each sound source component based on the positions of the plurality of microphones 60 from which the audio signal is obtained, it is possible to not only identify the sound source component and the relationship between the objects in an integrated way. Rather, it is possible to adaptively identify the relationship between the sound source component and the object according to the movement of the object.
  • FIG. 5 shows a specific example of identifying the correspondence between the sound source component and the microphone based on the audio signal of the sound source component and the microphone in relation to operation S43 of FIG. 4 .
  • the processor 6 extracts a plurality of sound source components from the audio signals received through the plurality of microphones 60 , and analyzes correlations between the plurality of sound source components and the audio signals of the plurality of microphones 60 .
  • the frequency components of the first to third sound source components extracted from the audio signal are as shown in FIG. 5 .
  • the processor 6 identifies a first sound source component having a frequency component similar to that of the audio signal acquired with the first microphone 61 from among the plurality of sound source components.
  • the processor 6 may identify the first sound source component corresponding to the audio signal of the first microphone 61 based on the magnitude of the frequency component as well as the similarity between the frequency components. For example, the processor 6 may identify a first sound source component having a large frequency component in the same frequency band as the frequency component of the audio signal of the first microphone 61 .
  • the processor 6 may identify the first sound source component as corresponding to the first microphone 61 . In the same way, the processor 6 may identify that the second sound source component corresponds to the second microphone 62 and the third sound source component corresponds to the third microphone 63 .
  • the processor 6 can identify the correspondence between the sound source component and the microphone based on the correlation between the frequency characteristic between the sound source component and the audio signal received with the microphone. Therefore, it is possible to prepare a condition for identifying the relationship between the sound source component and the object, which will be described below with reference to FIG. 6 .
  • FIG. 6 shows a specific example of identifying a relationship between a sound source component and an object based on a position of a microphone corresponding to the sound source component in relation to operation S44 of FIG. 4 .
  • the processor 6 displays an image 5 including a plurality of objects 1 , 2 , and 3 on the display 13 .
  • an image of an actual beach is displayed on the display 13 , and there are a plurality of real objects 71 , 72 , and 73 on the actual beach.
  • the first real object 71 is on the left side of the real beach
  • the real second object 72 is on the upper center
  • the real third object 73 is on the right side.
  • the plurality of microphones 60 includes a microphone array in which the first microphone 61 , the second microphone 62 , and the third microphone 63 are disposed at the same angle, and are located below the center of the actual beach.
  • the first microphone 61 , the second microphone 62 , and the third microphone 63 are arranged to be close to the actual first object 71 , the actual second object 72 , and the third object 73 , respectively.
  • the processor 6 may receive the arrangement environment information of the plurality of microphones 60 together with or separately from the audio signal.
  • the processor 6 identifies the relationship between the sound source component and the microphone through correlation analysis between the sound source component and the microphone audio signal. For example, in the processor 6 , the first sound source component corresponds to the first microphone 61 , the second sound source component corresponds to the second microphone 62 , and the third sound source component corresponds to the third microphone 63 . can be identified as corresponding.
  • the processor 6 identifies the object in the image 5 corresponding to the sound source component based on the arrangement environment information of the plurality of microphones 60 and information on the relationship between the sound source component and the microphone identified through correlation analysis. More specifically, the processor 6 identifies the position of the first microphone 61 among the plurality of microphones 60 based on the arrangement environment information of the plurality of microphones 60 . For example, the processor 6 may identify that the first microphone 61 is located at the leftmost position among the plurality of microphones 60 .
  • the processor 6 identifies the positions of the plurality of real objects 71, 72, 73 that have uttered the first to third sound source components, and the first microphone ( 61), it is possible to identify the first real object 71 located closest to the position.
  • the processor 6 may identify that the first sound source component is uttered from the first real object 71 .
  • the processor 6 identifies the first object 1 corresponding to the first real object 71 from among the plurality of objects 1, 2, 3 in the image 5 . To this end, the processor 6 identifies the positions of the virtual microphones 50 corresponding to the plurality of microphones 60 in the image 5 . Depending on the design method, the position of the virtual microphone 50 may be set to the lower center of the screen on which the image 5 is displayed so as to correspond to the positions of the plurality of microphones 60 , but is not limited thereto.
  • the processor 6 is the closest to the position of the first virtual microphone 51 corresponding to the first microphone 61 among the virtual microphones 50 . It can be identified that the first object 1 at the location corresponds to the first real object 71 .
  • the processor 6 may identify the first object 1 corresponding to the first sound source component based on the relationship between the first sound source component and the first real object 71 .
  • the processor 6 determines that the third sound source component corresponds to the third microphone 63 , and the third real object 73 closest to the position of the third microphone 63 is the third in the image 5 . Corresponds to the object 3, and as a result, it can be identified that the third object 3 corresponds to the third sound source component.
  • the processor 6 can identify the object corresponding to each sound source component based on the positions of the plurality of microphones 60 from which the audio signal is obtained, it is possible to integrally identify the relationship between the sound source component and the object.
  • FIG. 7 illustrates a specific example of distinguishing and displaying an object corresponding to a sound source component as an example of an operation related to content with respect to an object corresponding to a position of a microphone in relation to operation S44 of FIG. 4 .
  • the first object 1 in the image 5 corresponds to the first sound source component.
  • the processor 6 may display the first object 1 to be distinguished from the second object 2 and the third object 3 which are other objects in the image 5 .
  • the second object 2 and the third object 3 do not correspond to the first sound source component, but may correspond to the second sound source component and the third sound source component, respectively.
  • the processor 6 may apply the effect 81 for emphasizing the first object 1 to distinguish the first object 1 from other objects. Similarly, the processor 6 may display the second object 2 corresponding to the second sound source component or the third object 3 corresponding to the third sound source component to be distinguished from other objects in the image 5 . .
  • the processor 6 may allow different effects to be applied to each of the plurality of objects 1, 2, and 3, thereby enabling the plurality of objects 1, 2, and 3 to be distinguished from each other.
  • the processor 6 may make it possible to distinguish each other by differentiating colors, sizes, gradations, blurring, etc. applied to each other.
  • the processor 6 may display the first object 1 to be distinguishable even by the first microphone 61 corresponding to the first sound source component. For example, the processor 6 identifies the first object 1 as corresponding to the first microphone 61 among the plurality of objects 1, 2, 3 in the image 5, and the first object 1 By displaying to be distinguished from other objects, the first object 1 can be distinguished from the second object 2 and the third object 3 that do not correspond to the first microphone 61 .
  • the processor 6 may display the object corresponding to the sound source component to be distinguished from other objects based on the relationship between the sound source component and the object. Accordingly, it is possible to provide visual information on whether the integrated identification and adaptive identification of the relationship between the sound source component and the object is performed.
  • FIG. 8 is another example of an operation related to content with respect to an object corresponding to the position of the microphone, in relation to operation S44 of FIG. 4 , and shows a specific example indicating a mutual positional relationship between the microphone and the object.
  • the processor 6 identifies the relationship between the plurality of microphones 60 and the plurality of real objects 71 , 72 , 73 by utilizing information about the positions of the plurality of microphones 60 , and , it has been described that the relationship between the plurality of microphones 60 and the plurality of objects 1, 2, and 3 in the image 5 can be identified.
  • the processor 6 may identify the mutual positional relationship between the plurality of microphones 60 and the plurality of real objects 71 , 72 , 73 . For example, the processor 6 may identify that the position of the first actual microphone 61 is closest to the position of the first object 71 .
  • the processor 6 determines the mutual position between the plurality of microphones 60 and the plurality of objects 1, 2, 3 based on the mutual positional relationship between the plurality of microphones 60 and the plurality of real objects 71, 72, 73. relationship can be identified. For example, the processor 6 determines that the position of the virtual first microphone 51 in the image 5 corresponding to the first real microphone 61 is the position of the first object 1 corresponding to the first real object 71 . You can identify the closest to the location.
  • the processor 6 may display a user interface 4 indicating a mutual location between the virtual first microphone 51 and the first object 1 in the image 5 .
  • the processor 6 may indicate that the position of the first object 1 is closest to the position of the virtual first microphone 51 in the image 5 through an arrow or the like.
  • the processor 6 may display the user interface 4 indicating the mutual positional relationship between the microphone and the object as an operation related to the content with respect to the object. Accordingly, it is possible to provide visual information whether integrated identification and adaptive identification of the relationship between the microphone and the object is performed.
  • FIG. 9 illustrates a specific example of performing an operation related to content in response to an event for a sound source component in relation to operation S44 of FIG. 4 .
  • the processor 6 identifies whether there is an event for the sound source component. For example, a user interface for selecting at least one of a plurality of sound source components may be displayed, and when a sound source component is selected according to a user input, it may be identified as an event for the sound source component.
  • the event for the sound source component is not limited to the selection of the sound source component according to the user input.
  • the processor 6 performs an operation based on the sound source component in response to an event on the sound source component. For example, audio may be output based on the sound source component. For convenience of explanation, assuming that the first sound source component is selected according to a user input, audio based on the first sound source component may be output through the speaker 15 .
  • the operation based on the sound source component includes an operation on an object corresponding to the sound source component.
  • the processor 6 produces an effect 81 of emphasizing the first object 1 so that the first object 1 corresponding to the first sound source is distinguished from other objects.
  • the user interface 4 indicating the mutual positional relationship between the first object 1 corresponding to the first sound source component and the first microphone 61 may be displayed.
  • the processor 6 may perform various operations with respect to the object corresponding to the sound source component in response to the event for the sound source component. Accordingly, it is possible to provide more diverse visual information on whether the integrated identification and adaptive identification of the relationship between sound source components and objects is performed.
  • FIG. 10 illustrates a specific example of performing an operation related to content in response to an event on an object in relation to operation S44 of FIG. 4 .
  • the processor 6 identifies whether an event is an object. For example, a user interface for selecting at least one of the plurality of objects 1, 2, 3 in the image 5 is displayed, and when an object is selected according to a user input, an event for the object can be identified. .
  • the event for the object is not limited to the selection of the object according to the user input.
  • the processor 6 performs an operation on the sound source component according to the identified event. For example, audio based on a sound source component corresponding to the selected object is output. For convenience of explanation, assuming that the first object 1 is selected according to a user input, a first sound source component corresponding to the first object 1 is identified, and audio based on the first sound source component is transmitted to the speaker 15 ) can be printed. Since the location of the first object 1 is on the left side of the screen, the audio output direction may be set to correspond to the left side of the screen.
  • the processor 6 may identify the first sound source component corresponding to the selected first object 1 and adjust the size of the first sound source component. For example, the processor 6 may adjust the size of the first sound source component to be greater than the size of the sound source component corresponding to another object.
  • the processor 6 displays an effect 81 to distinguish the first object 1 from other objects, or the first object 1 and the first microphone 61 ) may display a user interface 4 indicating a mutual positional relationship between them.
  • the processor 6 may perform various operations in response to an event on the object. Accordingly, it is possible to provide more diverse visual information on whether the integrated identification and adaptive identification of the relationship between sound source components and objects is performed.
  • FIG. 11 shows a specific example of updating the correspondence between the sound source component and the microphone in relation to operation S43 of FIG. 4 .
  • the processor 6 identifies the correspondence between the sound source component and the microphone based on the audio signal.
  • the audio signal is composed of a plurality of frames, and the processor 6 can identify the correspondence between the sound source component and the microphone for each frame.
  • the identification of the correspondence for each frame includes not only the case of identifying the correspondence for each single frame, but also the case of identifying the correspondence for each predetermined number of frames. That is, it includes a case of identifying a correspondence with respect to at least one frame periodically or aperiodically.
  • the processor 6 may identify the first sound source component as corresponding to the first microphone 61 based on the first frame among the plurality of frames of the audio signal. After a predetermined time has elapsed, it may be identified that the first sound source component corresponds to the second microphone 62 based on the second frame of the audio signal. In this case, as described above with reference to FIG. 3 , the correspondence between the first microphone 61 and the first sound source component identified based on the first frame is the correspondence between the second microphone 62 and the first sound source component. can be updated with Similarly, it can be identified that the first sound source component corresponds to the third microphone 63 based on the third frame, and in this case, the correspondence relationship between the third microphone 63 and the first sound source component can be updated. .
  • FIG. 12 shows a specific example of updating an operation related to content according to the updated correspondence between the sound source component and the microphone of FIG. 11 .
  • the processor 6 may update the correspondence between the sound source component and the microphone for each frame of the audio signal.
  • the processor 6 identifies that the microphone corresponding to the first sound source component is changed according to the movement of the first object 1 . That is, it is identified that the first sound source component sequentially corresponds to the first microphone 61 , the second microphone 62 , and the third microphone 63 , and the correspondence between the first sound source component and the first microphone 61 is determined.
  • the correspondence between the first sound source component and the third microphone 63 may be updated.
  • the processor 6 may update the operation based on the relationship between the first sound source component and the first object 1 .
  • the processor 6 detects that the first sound source component includes the first microphone 61 , the second microphone 62 and By sequentially corresponding to the third microphone 63, the user interface 4 is sequentially configured to correspond to the position of the first microphone 61, the position of the second microphone 62, and the position of the third microphone 63. can be updated.
  • the processor 6 may update the operation related to the content with respect to the object in response to the updated correspondence between the sound source component and the microphone. Accordingly, it is possible to not only adaptively identify the relationship between the sound source component and the object, but also provide information on the adaptive identification to the user.
  • Various embodiments disclosed in this document are implemented as software including one or more instructions stored in a storage medium readable by a machine such as the electronic device 10 .
  • the processor 6 of the electronic device 10 calls at least one of the one or more instructions stored from the storage medium and executes it. This enables a device such as the electronic device 10 to be operated to perform at least one function according to the called at least one command.
  • the one or more instructions include code generated by a compiler or code executable by an interpreter.
  • the device-readable storage medium is provided in the form of a non-transitory storage medium.
  • the 'non-transitory storage medium' is a tangible device and only means that it does not contain a signal (eg, electromagnetic wave), and this term refers to a case in which data is semi-permanently stored in a storage medium and a case in which data is temporarily stored. case is not distinguished.
  • the 'non-transitory storage medium' includes a buffer in which data is temporarily stored.
  • the computer program product includes instructions of software executed by a processor, as mentioned above.
  • Computer program products are traded between sellers and buyers as commodities.
  • the computer program product is distributed in the form of a machine-readable storage medium (eg, CD-ROM), or via an application store (eg, Play StoreTM) or between two user devices (eg, smartphones).
  • Direct, online distribution eg, download or upload).
  • at least a portion of the computer program product eg, a downloadable app
  • a machine-readable storage medium such as a memory of a manufacturer's server, a server of an application store, or a relay server. Temporarily saved or created temporarily.

Landscapes

  • Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Radar, Positioning & Navigation (AREA)
  • General Physics & Mathematics (AREA)
  • Remote Sensing (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Multimedia (AREA)
  • Otolaryngology (AREA)
  • General Health & Medical Sciences (AREA)
  • Circuit For Audible Band Transducer (AREA)

Abstract

La présente invention concerne un dispositif électronique qui affiche une image comprenant une pluralité d'objets d'après un signal vidéo inclus dans le contenu, obtient une pluralité de composantes de source sonore à partir d'une pluralité de signaux audio inclus dans le contenu selon des caractéristiques de fréquence, la pluralité de signaux audio étant obtenue au moyen d'une pluralité de microphones, identifie un microphone, parmi la pluralité de microphones, qui obtient un signal audio correspondant à un composant de source sonore, d'après le composant de source sonore et le signal audio du microphone pendant la lecture du contenu, puis effectue, d'après l'agencement de la pluralité de microphones, une opération relative au contenu pour tout objet correspondant à la position du microphone, parmi la pluralité d'objets dans l'image.
PCT/KR2021/012891 2020-11-13 2021-09-17 Dispositif électronique et procédé de commande associé WO2022102945A1 (fr)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
KR1020200151797A KR20220065370A (ko) 2020-11-13 2020-11-13 전자장치 및 그 제어방법
KR10-2020-0151797 2020-11-13

Publications (1)

Publication Number Publication Date
WO2022102945A1 true WO2022102945A1 (fr) 2022-05-19

Family

ID=81602475

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/KR2021/012891 WO2022102945A1 (fr) 2020-11-13 2021-09-17 Dispositif électronique et procédé de commande associé

Country Status (2)

Country Link
KR (1) KR20220065370A (fr)
WO (1) WO2022102945A1 (fr)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024063463A1 (fr) * 2022-09-23 2024-03-28 삼성전자주식회사 Dispositif électronique pour ajuster un signal audio associé à un objet représenté par l'intermédiaire d'un dispositif d'affichage, et procédé associé

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120018686A (ko) * 2010-08-23 2012-03-05 주식회사 팬택 주변 소리 정보를 이용하여 다양한 사용자 인터페이스를 제공하는 단말기 및 그 제어방법
KR20150068112A (ko) * 2013-12-11 2015-06-19 삼성전자주식회사 오디오를 추적하기 위한 방법 및 전자 장치
US20160004405A1 (en) * 2014-07-03 2016-01-07 Qualcomm Incorporated Single-channel or multi-channel audio control interface
KR20160026605A (ko) * 2014-09-01 2016-03-09 삼성전자주식회사 오디오 파일 재생 방법 및 장치
KR101888391B1 (ko) * 2014-09-01 2018-08-14 삼성전자 주식회사 음성 신호 관리 방법 및 이를 제공하는 전자 장치

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
KR20120018686A (ko) * 2010-08-23 2012-03-05 주식회사 팬택 주변 소리 정보를 이용하여 다양한 사용자 인터페이스를 제공하는 단말기 및 그 제어방법
KR20150068112A (ko) * 2013-12-11 2015-06-19 삼성전자주식회사 오디오를 추적하기 위한 방법 및 전자 장치
US20160004405A1 (en) * 2014-07-03 2016-01-07 Qualcomm Incorporated Single-channel or multi-channel audio control interface
KR20160026605A (ko) * 2014-09-01 2016-03-09 삼성전자주식회사 오디오 파일 재생 방법 및 장치
KR101888391B1 (ko) * 2014-09-01 2018-08-14 삼성전자 주식회사 음성 신호 관리 방법 및 이를 제공하는 전자 장치

Also Published As

Publication number Publication date
KR20220065370A (ko) 2022-05-20

Similar Documents

Publication Publication Date Title
WO2019059505A1 (fr) Procédé et appareil de reconnaissance d'objet
WO2019124963A1 (fr) Dispositif et procédé de reconnaissance vocale
WO2021096233A1 (fr) Appareil électronique et son procédé de commande
WO2020251074A1 (fr) Robot à intelligence artificielle destiné à fournir une fonction de reconnaissance vocale et procédé de fonctionnement associé
WO2016175424A1 (fr) Terminal mobile, et procédé de commande associé
WO2020246640A1 (fr) Dispositif d'intelligence artificielle pour déterminer l'emplacement d'un utilisateur et procédé associé
WO2021206221A1 (fr) Appareil à intelligence artificielle utilisant une pluralité de couches de sortie et procédé pour celui-ci
WO2022102945A1 (fr) Dispositif électronique et procédé de commande associé
WO2021215547A1 (fr) Dispositif et procédé de maison intelligente
WO2020251096A1 (fr) Robot à intelligence artificielle et procédé de fonctionnement associé
WO2021002493A1 (fr) Dispositif passerelle intelligent et système de commande le comprenant
WO2020256169A1 (fr) Robot destiné à fournir un service de guidage au moyen d'une intelligence artificielle, et son procédé de fonctionnement
WO2022014734A1 (fr) Terminal de commande d'un dispositif sonore sans fil et procédé associé
WO2022055107A1 (fr) Dispositif électronique de reconnaissance vocale et son procédé de commande
WO2022050653A1 (fr) Dispositif électronique et son procédé de commande
WO2022098204A1 (fr) Dispositif électronique et procédé de fourniture de service de réalité virtuelle
WO2022114532A1 (fr) Dispositif électronique et procédé de commande associé
WO2022065662A1 (fr) Dispositif électronique et son procédé de commande
WO2022154166A1 (fr) Procédé permettant de fournir une fonction de création de contenu et dispositif électronique prenant en charge celui-ci
WO2022065663A1 (fr) Dispositif électronique et son procédé de commande
WO2021071166A1 (fr) Appareil électronique et son procédé de commande
WO2022131432A1 (fr) Dispositif électronique et son procédé de commande
WO2022255730A1 (fr) Dispositif électronique et son procédé de commande
WO2022092530A1 (fr) Dispositif électronique et son procédé de commande
WO2022025376A1 (fr) Procédé et dispositif pour fournir des données audio ar

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 21892115

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 21892115

Country of ref document: EP

Kind code of ref document: A1