CN115767407A

CN115767407A - Sound generating method and device for executing the same

Info

Publication number: CN115767407A
Application number: CN202211375358.3A
Authority: CN
Inventors: 裴永植
Original assignee: Whoborn Inc
Current assignee: Whoborn Inc
Priority date: 2018-11-09
Filing date: 2019-11-08
Publication date: 2023-03-07

Abstract

A sound generating method and an apparatus for performing the same are disclosed. The sound generation method of one embodiment includes: a step of obtaining an actual sound generated in the actual space and a playing sound generated in the virtual space; and a step of generating a combined sound generated in a mixed reality in which the real space and the virtual space are mixed by combining the real sound and the playback sound.

Description

Sound generating method and device for executing the same

Technical Field

The following embodiments relate to a sound generating method and an apparatus for performing the same.

Background

Recently, it is possible to provide users with various 3D audio-only contents. For example, recently, it is possible to output independently recorded 3D sound, providing 3D audio-only contents to a user.

The 3D audio-only content may be, as 3D sound, various contents generated using the independently recorded 3D sound as described above. For example, directivity and automated calculation are recently being performed on general 2D sound (or stereo) to generate diverse 3D audio-dedicated contents. The 3D audio-only content may be sound content to which 3D acoustic transform and output technology is applied to general 2D sound.

Compared with the original 3D sound content, the 3D audio special content has the advantages of high immersion and reproducible reality.

Disclosure of Invention

Embodiments may provide a technique of combining actual sound occurring in a real space with virtual sound occurring in a virtual space to generate combined sound occurring in mixed reality in which the real space and the virtual space are mixed.

In addition, the embodiments may provide a technique of converting each of a plurality of 2D object sounds of a 2D soundtrack into a 3D object sound and generating a 3D soundtrack reflecting the 3D object sound.

The sound generation method of an embodiment may include: a step of obtaining the actual sound generated in the actual space and the playing sound generated in the virtual space; and a step of combining the actual sound and the playback sound to generate a combined sound generated in a mixed reality in which the actual space and the virtual space are mixed.

The step of generating may comprise: a step of selecting at least one actual object sound among a plurality of actual object sounds included in the actual sound; a step of selecting at least one virtual object sound among a plurality of virtual object sounds included in the playback sound; a step of combining the at least one real object sound and the at least one virtual object sound to generate the combined sound.

The plurality of real object sounds may be sounds generated from a plurality of real objects located in the real space.

The plurality of virtual object sounds may be sounds generated from a plurality of virtual objects located in the virtual space.

The step of selecting at least one actual object sound may include: a step of identifying the plurality of actual object sounds based on characteristics of the object sounds; a step of selectively extracting the at least one actual object sound among the plurality of actual object sounds based on an actual sound selection condition.

The step of identifying may include: a step of removing a noise sound from the actual sound based on a noise filtering technique; a step of identifying the plurality of actual object sounds from the noise-removed actual sounds based on at least one of a frequency and a volume of the object sounds.

The step of generating may comprise: adjusting a volume of the at least one actual object sound based on a position of an actual object corresponding to the at least one actual object sound; and a step of combining the at least one real object sound whose volume has been adjusted with the at least one virtual object sound to generate the combined sound.

The step of adjusting may comprise: a step of deciding a position of the real object in the real space based on a sound obtaining time of the at least one real object sound; adjusting a volume of the sound of the at least one actual object based on a spaced distance between the position of the user and the position of the actual object.

The apparatus of one embodiment may comprise a memory comprising instructions, a processor to execute the instructions, the processor to obtain actual sound occurring in a real space and played sound occurring in a virtual space, combine the actual sound and the played sound to generate a combined sound occurring in a mixed reality of the real space and the virtual space.

The processor may select at least one real object sound among a plurality of real object sounds included in the real sound, select at least one virtual object sound among a plurality of virtual object sounds included in the playback sound, and combine the at least one real object sound with the at least one virtual object sound to generate the combined sound.

The processor may identify the plurality of actual object sounds based on characteristics of the object sounds, and selectively extract the at least one actual object sound among the plurality of actual object sounds based on an actual sound selection condition.

The processor may remove noise sounds from the actual sounds based on a noise filtering technique, and identify the plurality of actual guest sounds from the actual sounds from which the noise sounds are removed based on at least one of a frequency and a volume of the guest sounds.

The processor may adjust a volume of the at least one real object sound based on a position of a real object corresponding to the at least one real object sound, and generate the combined sound by combining the volume-adjusted at least one real object sound with the at least one virtual object sound.

The processor may determine a position of the at least one real object in the real space based on a sound obtaining time of the at least one real object sound, and adjust a volume of the at least one real object sound based on a spaced distance between the position of the user and the position of the real object.

A sound generation method of another embodiment includes: extracting a plurality of 2D object sounds included in the 2D soundtrack; a step of converting the plurality of 2D object sounds into a plurality of 3D object sounds by applying a plurality of binaural effects to the plurality of 2D object sounds, respectively; generating a 3D soundtrack based on the plurality of 3D object sounds.

The plurality of 2D object sounds may be sounds separated by one of a frequency and an object in the 2D soundtrack.

The step of extracting may include: a step of extracting the plurality of 2D object sounds by separating the 2D sound tracks by frequency band using an equalizer effect (equalizer effect).

The step of extracting may include: and a step of separating the 2D soundtracks by object using sound detection (sound detection) to extract the sound of the plurality of 2D objects.

The step of transforming may comprise: generating a first 3D object sound by applying a first binaural effect to a first 2D object sound among the plurality of 2D object sounds; and generating a second 3D guest sound by applying a second binaural effect to a second 2D guest sound among the plurality of 2D guest sounds.

The first binaural effect and the second binaural effect may be different from each other or the same binaural effect as each other.

The generating of the first 3D object sound may include: determining a first 3D localization of the first 2D object sound; applying the first 3D positioning and the first binaural effect to a first 2D object sound to generate the first 3D object sound.

The generating of the second 3D object sound may include: a step of determining a second 3D localization for the second 2D object sound, different from the first 3D localization; applying the second 3D localization and the second binaural effect to the second 2D object sound to generate the second 3D object sound.

The generating may include generating the 3D soundtrack by combining the plurality of 3D object sounds.

A sound generation apparatus according to another embodiment may include a memory including instructions, and a processor configured to execute the instructions, the processor may extract a plurality of 2D object sounds included in a 2D sound track, apply a plurality of binaural effects to the plurality of 2D object sounds, respectively, convert the plurality of 2D object sounds into a plurality of 3D object sounds, and generate the 3D sound track based on the plurality of 3D object sounds.

The plurality of 2D object sounds may be sounds separated by one of frequency and object in the 2D soundtrack.

The processor may extract the plurality of 2D object sounds by separating the 2D sound tracks by frequency band using an equalizer effect (equalizer effect).

The processor may extract the plurality of 2D object sounds by separating the 2D soundtrack by object using sound detection (sound detection).

The processor may apply a first binaural effect to a first 2D guest sound among the plurality of 2D guest sounds to generate a first 3D guest sound, and apply a second binaural effect to a second 2D guest sound among the plurality of 2D guest sounds to generate a second 3D guest sound.

The processor may determine a first 3D localization for the first 2D object sound, apply the first 3D localization and the first binaural effect to the first 2D object sound to generate the first 3D object sound.

The processor may determine a second 3D localization for the second 2D guest sound differently from the first 3D localization, apply the second 3D localization and the second binaural effect to the second 2D guest sound to generate the second 3D guest sound.

The processor may generate the 3D soundtrack by combining the plurality of 3D object sounds.

Drawings

FIG. 1 shows a schematic block diagram of a sound generation system of an embodiment.

Fig. 2 shows a schematic block diagram of the sound generation device shown in fig. 1.

Fig. 3 shows one example for explaining the sound providing apparatus shown in fig. 1.

Fig. 4 shows an example for explaining the sound providing apparatus shown in fig. 1 or an example for explaining the first providing apparatus shown in fig. 3.

Fig. 5 shows an example for explaining the sound generating apparatus shown in fig. 1 or an example for explaining the second providing apparatus shown in fig. 3.

Fig. 6 shows one example for explaining the sound output apparatus shown in fig. 1.

Fig. 7 shows another example for explaining the sound output apparatus shown in fig. 1.

Fig. 8 shows one example for explaining a sound output device as an in-ear headphone.

Fig. 9 shows another example for explaining a sound output device as an in-ear headphone.

FIG. 10 shows one example of a combined sound used to illustrate one embodiment.

Fig. 11 is a sequence diagram for explaining the operation of the sound generating apparatus shown in fig. 1.

Fig. 12 shows a sound generation system of another embodiment.

Fig. 13 shows an example for explaining the operation of the sound generation device shown in fig. 12.

Fig. 14 shows a sequence diagram for explaining the action of the processor shown in fig. 13.

Best mode for carrying out the invention

The embodiments are described in detail below with reference to the accompanying drawings. However, various modifications may be made to the embodiments, and the scope of the claims of the patent application is not limited or restricted by such embodiments. It should be understood that all changes, equivalents, and alternatives to the embodiments are intended to be embraced within the scope of the claims.

The terminology used in the examples is for the purpose of description only and is not to be construed in a limiting sense. Singular expressions include plural expressions as long as they are not explicitly expressed differently in the language. In the present specification, the terms "comprising" or "having" are intended to specify the presence of the features, numerals, steps, actions, components, parts, or combinations thereof described in the specification, and should not be construed as excluding the presence or addition of one or more other features, numerals, steps, actions, components, parts, or combinations thereof.

The terms first, second, etc. may be used to describe various elements, but the elements are not limited by the terms. The term is used only for the purpose of distinguishing one constituent element from another constituent element, and for example, a first constituent element may be named a second constituent element, and similarly, a second constituent element may also be named a first constituent element, without departing from the scope of the concept of the embodiment.

Unless otherwise defined, all terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which the embodiments belong, including technical or scientific terms. Terms such as those defined in commonly used dictionaries should be interpreted as having a meaning that is consistent with their meaning in the context of the relevant art and will not be interpreted in an overly or overly formal sense unless expressly so defined herein.

In the description with reference to the drawings, the same constituent elements are denoted by the same reference numerals regardless of the reference numerals, and redundant description thereof will be omitted. In describing the embodiments, when it is judged that a specific description on a related known technology may unnecessarily obscure the gist of the embodiments, the detailed description is omitted.

A module (module) in this specification may mean hardware capable of executing functions and actions of each name described in the specification, may mean computer program code capable of executing a specific function and action, or may mean an electronic recording medium, such as a processor or a microprocessor, on which computer program code capable of executing a specific function and action is loaded.

In other words, the module may mean a functional and/or structural combination of hardware for executing the technical idea of the present invention and/or software for driving the hardware.

The embodiments are described in detail below with reference to the accompanying drawings. However, the scope of the patent application is not limited or restricted by these examples. Like reference symbols in the various drawings indicate like elements.

The sound generation system 10 includes a sound providing apparatus 100 and a sound generating apparatus 300.

The sound providing apparatus 100 can provide the playback sound (play sound) to the sound generating apparatus 300 after generating (or recording) the playback sound to be provided to the user (or listener). The played sound can be varied, such as a 3D sound source and a 3D Virtual Reality (VR) sound content.

The played sound may be sound occurring in a virtual space. The virtual space may be a 3D virtual space (or 3D virtual reality) that is embodied to provide 3D sound reflecting a sense of space and a sense of presence.

The sound providing apparatus 100 may provide the 2D soundtrack to the sound generating apparatus 300 after generating (or recording) the 2D soundtrack.

The 2D soundtrack may be a sound that a listener can listen to in stereo or in a unidirectional soundtrack of a single soundtrack. For example, 2D sound tracks can be varied, such as 2D sound sources, 2D voices, and 2D Virtual Reality (VR) sounds.

The 2D soundtrack may include multiple object sounds. The plurality of object sounds may be, as 2D sounds, object sounds generated from a plurality of objects, respectively.

The sound generating apparatus 300 may combine the actual sound generated in the actual space with the virtual sound generated in the virtual space to generate a combined sound (or mixed sound) generated in a mixed reality in which the actual space and the virtual space are mixed.

Therefore, the sound generation device 300 can provide a sound with a high immersion feeling that the user thinks it is placed in a mixed space in which the real space and the virtual space are mixed.

The sound generating apparatus 300 can provide a customized (or personalized) 3D sound for a user by selectively combining an actual sound and a virtual sound to provide various stereo sounds.

The sound generating apparatus 300 does not completely remove the actual sound, but mixes the actual sound with the broadcast sound to provide, thereby allowing the user to recognize the situation occurring in the actual space and ensuring the safety of the user.

The sound generation device 300 can convert each of the plurality of 2D object sounds of the 2D soundtrack into a 3D object sound, and generate a 3D soundtrack that reflects the 3D object sound.

Therefore, the sound generation device 300 can provide various forms of highly immersive 3D sound (or 3D content) by reflecting the 3D directivity of the 3D effect to each 2D object sound.

The sound generation apparatus 300 generates a 3D soundtrack using only a 2D soundtrack, so that a 3D soundtrack can be easily generated.

The sound generation device 300 can generate a 3D soundtrack that can be used for tinnitus treatment, tinnitus diagnosis, and the like by directional reproduction of the 3D soundtrack that is a feature of 3D sound (or 3D audio). For example, the sound generation apparatus 300 may react the band-based position reproduction to a 3D soundtrack, generating a 3D soundtrack that may be used for substantial tinnitus treatment and tinnitus diagnosis.

The sound output device 500 can obtain an actual sound (real sound) generated in the actual space. The actual space may be the space where the user who wants to listen to the combined sound is located.

For example, the sound output device 500 may track the head of the user (or perform head tracking), and sense (or sense, or obtain) the head direction of the user (or the gaze point of the user, the line of sight of the user).

The sound output device 500 may obtain a 3D actual sound based on the direction of the head of the user through the microphones of the sound output device 500 positioned in the directions of both sides of the user. The two side directions may be right and left ear directions as directions corresponding to the ears of the user.

The sound output device 500 may transmit the actual sound, the sound obtaining information about the actual sound, and/or the head direction information of the user to the sound generating device 300.

The sound output device 500 may receive the combined sound transmitted from the sound generation device 300 and output (provide) to the user.

Therefore, the user can listen to the combined sound generated in the mixed reality in which the real space and the virtual space are mixed, through the sound output device 500.

The sound providing apparatus 100, the sound generating apparatus 300, and the sound output apparatus 500 are configured separately from each other, but are not limited thereto. For example, the sound providing apparatus 100 may be included in the sound generating apparatus 300, and the sound generating apparatus 300 may be included in the sound output apparatus 500.

Fig. 2 shows a schematic block diagram of the sound generating apparatus shown in fig. 1.

The sound generation apparatus 300 may include a communication module 310, a memory 330, and a processor 350.

The communication module 310 may receive the 2D soundtrack or play sound transmitted from the sound providing apparatus 100 and transmit to the processor 350.

The communication module 310 may receive the actual sound transmitted from the sound output device 100, sound acquisition time information regarding the actual sound, and/or information regarding the head direction of the user and transmit the same to the processor 350.

The communication module 310 may receive the combined sound transmitted from the processor 350 and transmit to the sound output device 100.

The memory 330 may store instructions (or programs) that can be executed by the processor 350. For example, the instructions may include instructions for performing the actions of processor 350 and/or the actions of the components of processor 350.

Processor 350 may process data stored in memory 330. The processor 350 may execute computer readable code (e.g., software) stored in the memory 330 and instructions (instructions) induced by means of the processor 350.

Processor 350 may be a data processing apparatus embodied in hardware with circuitry having a physical structure for performing desired actions. For example, the desired actions may include program contained code or instructions (instructions).

For example, a data processing apparatus embodied in hardware may include a microprocessor (micro processor), a central processing unit (central processing unit), a processor core (processor core), a multi-core processor (multi-core processor), a multiprocessor (multi processor), an Application-Specific Integrated Circuit (ASIC), and a Field Programmable Gate Array (FPGA).

The processor 350 may control the overall operation of the sound generating apparatus 300. For example, the processor 350 may control the operations of the respective components (310 and 330) of the sound generating apparatus 300.

The processor 350 may obtain an actual sound transmitted from the sound output device 500, sound obtaining time information about the actual sound, and/or information about the head direction of the user.

The actual sound may include a plurality of actual object sounds. The plurality of real object sounds may be sounds generated from a plurality of real objects located in the real space, respectively. The sound generated from the actual object may be an object sound corresponding to the actual object. The actual objects may be various, such as characters, animals, utensils, etc. located in the actual space. The object sounds corresponding to the actual objects may be various, such as voices of characters, sounds of animals, footsteps, sounds of vehicles, etc., which are located in the actual space.

The sound obtaining time information on the actual sound may include sound obtaining times of respective ones of the plurality of actual object sounds. The sound obtaining times of the actual object sound obtained in the right ear direction and the actual object sound obtained in the left ear direction may be different from each other.

The processor 350 may obtain the playback sound transmitted from the sound providing apparatus 100.

The playback sound may include a plurality of virtual object sounds. The plurality of virtual object sounds may be sounds generated from a plurality of virtual objects disposed in the virtual space, respectively. The sound generated from the virtual object may be an object sound (object sound) that is pre-recorded and/or pre-generated as an object sound corresponding to the virtual object. When the played sound is a 3D sound source, the virtual object may be a variety of objects constituting a sound source such as a drum, a guitar, a bass, a singing, and the like. When the playback sound is 3DVR sound content, the virtual object may be a variety of objects corresponding to the 3DVR sound content and constituting 3DVR sound content such as a person, an animal, and a physical object included in the 3D virtual reality. When the played sound is a 3D sound source, the object sound corresponding to the virtual object may be various sounds constituting the sound source, such as a pre-recorded drum sound, guitar sound, bass sound, and singing sound. When the broadcast sound is a 3DVR sound content, the object sound corresponding to the virtual object may be various sounds constituting the 3DVR sound content, such as a character voice, a sound of an animal calling and footstep, and a sound of a vehicle whistling.

The processor 350 may selectively combine the actual sound and the played sound to generate a combined sound.

First, the processor 350 may select at least one real object sound among a plurality of real object sounds included in the real sound.

For example, the processor 350 may identify a plurality of actual object sounds included in the actual sound based on the stored characteristics of the object sounds. The characteristic of the object sound may be a frequency characteristic and a volume characteristic of the object sound.

The processor 350 may remove the noise sound from the actual sound based on a noise filtering technique. For example, the processor 350 may analyze noise occurring in the real space, and remove a sound corresponding to a general noise sound. The noise sound may be a sound corresponding to a general noise. The noise sound may be a sound much higher than a sound corresponding to a normal audible frequency.

The processor 350 may identify a plurality of actual object sounds among the actual sounds from which the noise sounds are removed, based on the frequencies and/or volumes of the object sounds stored in advance. For example, the processor 350 may detect a sound corresponding to the frequency and/or volume of a previously stored object sound among the actual sounds from which the noise sound is removed, and recognize the detected sound as a plurality of actual object sounds.

The processor 350 may selectively extract at least one real object sound occurring from a dangerous object dangerous to the user and/or an object of interest focused on by the user among the plurality of real object sounds based on the real sound selection condition. The actual sound selection condition may be set to select object sound corresponding to the dangerous object and the attention object among a plurality of actual object sounds. The dangerous object and the attention object may be preset by a user.

Thereafter, the processor 350 may select at least one virtual object sound among a plurality of virtual object sounds included in the play sound.

For example, the processor 350 may select all of the plurality of virtual object sounds or select a portion thereof based on the user's motion. The user's actions may be varied, such as the number of rotations of the user's head, the speed of rotation of the head, etc.

When the number of rotations of the head of the user is greater than or equal to the threshold number of rotations and/or the rotational speed of the head of the user is greater than or equal to the threshold rotational speed, the processor 350 may select all of the plurality of virtual object sounds.

When the number of rotations of the head of the user is less than the threshold number of rotations and/or the number of rotations of the head of the user is less than the threshold number of rotations, the processor 350 may select a portion of the plurality of virtual object sounds.

As described above, the method of selecting all or a part of the plurality of virtual object sounds is described, but the present invention is not limited thereto. The processor 350 may select the virtual object sound by applying a manner of selecting all or a part of the plurality of virtual object sounds in opposition to each other. For example, the processor 350 may select all of the plurality of virtual object sounds when the number of head rotations is less than the critical number of rotations and/or the head rotation speed is less than the critical rotation speed. The processor 350 may select a part of the plurality of virtual object sounds when the head rotation number is equal to or greater than the threshold rotation number and/or the head rotation speed is equal to or greater than the threshold rotation speed.

When selecting a portion among the plurality of virtual object sounds, the processor 350 may select a virtual object sound corresponding to a virtual object located in the direction of the head of the user among the plurality of virtual object sounds based on the direction of the head of the user.

Finally, the processor 350 may combine the at least one real object sound with the at least one virtual object sound to generate a combined sound.

For example, the processor 350 may adjust the volume of the at least one actual object sound based on the position of the actual object corresponding to the at least one actual object sound.

The processor 350 may determine a position of the real object corresponding to the at least one real object sound in the real space based on the sound obtaining time with respect to the at least one real object sound.

The processor 350 may adjust the volume of the sound of the at least one actual object based on the separation distance between the position of the user and the position of the actual object.

For example, the processor 350 may adjust the volume of the at least one actual object sound based on the critical volume corresponding to the separation distance and the volume of the at least one actual object sound. The critical volume corresponding to the separation distance may be preset. The threshold volume may be a volume range set according to the distance separating the user and the object, and may be a volume range that is not dangerous to the user.

When the volume of the at least one actual object sound is higher than the threshold volume, the processor 350 may turn down the volume of the at least one actual object sound so as to be within the threshold volume range.

When the volume of the at least one actual object sound is lower than the threshold volume, the processor 350 may turn up the volume of the at least one actual object sound so as to be within the threshold volume range.

The processor 350 may combine the volume-adjusted at least one real object sound with the at least one virtual object sound to generate a combined sound.

The processor 350 may extract a plurality of 2D object sounds included in the 2D soundtrack after the 2D soundtrack is obtained through the communication module 310. The plurality of 2D object sounds may be sounds separated from the 2D soundtrack by one of frequency and object.

As an example, the processor 350 may extract a plurality of 2D object sounds included in the 2D sound track by separating the 2D sound track by frequency band using an equalizer effect (EQ: equalizer effect).

As another example, the processor 350 may extract a plurality of 2D object sounds included in the 2D soundtrack by separating the 2D soundtrack by object using sound detection (sound detection).

The processor 350 may apply a plurality of binaural effects (binaural effects) to the plurality of 2D object sounds, respectively, to transform the plurality of 2D object sounds into a plurality of 3D object sounds. Each of the plurality of 3D object sounds may be a 3D binaural sound obtained by converting a 2D object sound into a 3D object sound.

The processor 350 may apply a first binaural effect to a first 2D object sound among the plurality of 2D object sounds to generate a first 3D object sound.

For example, the processor 350 may determine a first 3D localization of a first 2D object sound. The processor 350 may apply a first 3D localization and a first binaural effect to the first 2D object sound to generate a first 3D object sound. The first 3D guest sound may be a 3D sound that converts the first 2D guest sound into a 3D sound.

Processor 350 may apply a second binaural effect to a second 2D guest sound among the plurality of 2D guest sounds to generate a second 3D guest sound.

For example, the processor 350 may decide a second 3D localization for the second 2D object sound differently from the first 3D localization. The processor 350 may apply a second 3D localization and a second binaural effect to the second 2D object sound to generate a second 3D object sound. The second 3D guest sound may be a 3D sound that converts the second 2D guest sound into a 3D sound.

The processor 350 may generate a 3D soundtrack based on the plurality of 3D object sounds. The 3D soundtrack may be a soundtrack that transforms 2D sound of the 2D soundtrack into 3D sound.

For example, the processor 350 may combine (multiply) the plurality of 3D object sounds to generate a combined 3D soundtrack of the plurality of 3D object sounds.

For convenience of explanation, the sound generation device 300 is assumed to be embodied in the sound output device 500, and the 2D soundtrack is assumed to be a 2D sound source.

Fig. 3 shows an example for explaining the sound providing apparatus shown in fig. 1, fig. 4 shows an example for explaining the first providing apparatus shown in fig. 3, and fig. 5 shows an example for explaining the second providing apparatus shown in fig. 3.

The sound providing apparatus 100 may be an MP3 player as an electronic apparatus of the sound generating apparatus 300 that generates a playback sound of realistic 3D audio contents and provides the playback sound to a user. The played sound may be various, such as a 3D sound source, 3D voice, and a 3D Virtual Reality (VR) sound.

The electronic device may be various devices such as a Personal Computer (PC), a data server, or a portable electronic device. The portable electronic device may be embodied in a laptop (laptop) computer, a mobile phone, a smart phone (smart phone), a tablet (tablet) PC, a Mobile Internet Device (MID)), a Personal Digital Assistant (PDA), an Enterprise Digital Assistant (EDA), a digital camera (digital still camera), a digital video camera (digital video camera), a Portable Multimedia Player (PMP), a PND (personal navigation device or portable navigation device), a hand-held game console (hand-held game console), an electronic book (e-book), and a smart device (smart device). At this time, the smart device may be embodied by a smart watch (smart watch) or a smart band (smart band).

The sound providing apparatus 100 includes a first providing apparatus 110 and a second providing apparatus 130.

The first providing device 110 may provide pre-recorded object sound or general 2D audio only sound to the second providing device 130. For example, the first providing device 110 may provide the object sound or the general 2D audio-only sound to the second providing device 130 in a wired manner and/or a wireless manner. The common 2D audio only sound may be a common mono track or stereo and multi-channel audio. The wired system may be various wired systems such as USB (universal serial bus), displayport, and HDMI (high definition multimedia interface). The wireless mode may be various wireless modes such as Wi-Fi (wireless fidelity) and bluetooth.

The second providing apparatus 130 may generate the playback sound by reflecting the object sound in a 3D virtual space or converting a general 2D audio only sound into a 3D sound.

For example, the second providing apparatus 130 may generate the playing sound by using a binaural recording (binaural recording) technology and/or a binaural effect (binaural effect) technology. The binaural recording technology may be a technology of recording 3D sound using a 3D microphone. The 3D microphone may be various, for example, a 360-degree microphone, a microphone composed of a plurality of microphones, and the like. The binaural effect may be a technique of generating a 3D sound through a stereo speaker based on a sound direction spatial recognition technique by a sound transfer difference due to human binaural positions.

The second providing device 130 may configure the virtual object in the 3D virtual space using the sound information of the compass Mems and the accelerator Mems, so that the object sound corresponding to the virtual object may be reflected in the 3D virtual space. The second providing apparatus 110 may generate a playback sound generated in a 3D virtual space in which the object sound is reflected.

The second providing apparatus 130 may convert the general 2D audio-only sound into the 3D audio-only sound, and generate the playback sound converted into the 3D audio-only sound. For example, the played sound converted into the 3D audio-dedicated sound may be 3D content in which a non-3D sound (or a non-3D audio, a non-3D sound source) or multichannel sound (or multichannel audio) such as a 5.1 channel is converted (or converted) into a 3D sound (or a 3D audio). The 3D content may be various, such as a 3d5.1 channel, a 3d10.1 channel, and so on.

The second providing apparatus 130 may provide the play sound in various ways.

For example, the second providing device 130 may provide the second information in a ratio of 1: n-mode, which provides a playback sound to the sound output device 500 used by each of a plurality of users. 1: the N system may be a broadcast system in which a broadcast sound is provided to a plurality of users.

The second providing device 130 may provide the second information in a number N: n-mode, a plurality of played sounds are selectively provided to the sound output device 500 used by a plurality of users, respectively. N: the N-mode may be a customized mode in which a plurality of playback sounds are selectively provided to a plurality of users.

The second providing device 130 may provide the second information in a number N: mode 1, the plurality of reproduced sounds are all provided to the sound output apparatus 500 used by a single user. N: the 1 mode may be a service-intensive multiple access mode that allows multiple played voices to be provided to a single user.

The second providing device 130 can provide the playing sound to the sound output device 500 in the wired manner and/or the wireless manner.

Fig. 6 shows one example for explaining the sound output apparatus shown in fig. 1, and fig. 7 shows another example for explaining the sound output apparatus shown in fig. 1.

The sound output device 500 may be a device used by a user for listening to a played sound or a combined sound.

The sound output device 500 may be embodied in a wearable (wearable) type, an in-ear (in-ear) type, an on-ear (on-ear) type, and a brain wave translation (brain trans) type.

For example, the sound output device 500 may be an MP3 player embodied in wearable type, in-ear type, on-ear type, and brain wave translation type. The MP3 player may be configured with a built-in battery for independent operation, including wireless communication and a processor. The wearable type may be a type combined with an article conveniently worn by a user. The wearable type can be a hair band, a shoulder attachment type device, a coat attachment type device such as a pullover and/or jacket and/or space suit, goggles, glasses and the like. The in-ear type may be an earphone. The ear-sticking type can be a headset, a helmet and the like. The brain wave translation type may be a brain wave transmission device.

The sound output device 500 may be implemented in a sensory device using HMD, smart glasses, a See-through (See-thru) display device, a multi-modal (e.g., five-sensor), or a bone conduction audio device.

When the sound output device 500 is in-ear type or in-ear type, the sound output device 500 can directly output the playing sound or the combined sound to the ear of the user, so that the user can listen directly.

When the sound output device 500 is wearable or brain wave translation, the sound output device 500 can sense the position of the ear of the user, and indirectly output the playing sound or the combined sound to the ear of the user, so that the user can listen indirectly.

The sound output device 500 can precisely track the head of the user by using a compass Mems, a gyroscope, an accelerator Mems, and the like, and can obtain the actual sound generated in the actual space in three dimensions.

In addition, the sound output device 500 may provide various functions such as an energy harvesting (energy harvesting) function, a black box (black box) function, etc., as additional functions. For example, the sound output device 500 has an energy collecting function of converting heat of a user contact part into electric energy or converting peripheral Radio Frequency (RF) sound into electric energy, and converting kinetic energy of the listener's movement into electric energy, and can be driven without an additional energy supply source.

When the sound output device 500 provides the black box function, the black box may embody a substantial storage location inside and/or outside. The black box can store data by various methods such as storage using an internal memory and/or an external memory and storage of a block chain. The external storage location may be an external storage location such as a cloud connection. The black box may utilize a security key such as PKI as the accessor right required for security.

The black box may be a camera built-in black box and/or an audio black box and/or a physical sensor based black box. The audio black box can store peripheral sound in real time and/or store audio data transmitted and received in the driver, and can interpret sound according to the storage time position. The audio black box may be a sound-based black box that interprets a location through audio, including 3D audio storage and storage of location information of each object, which are easy to analyze after danger and the like.

The black box may be a black box having various functions. The black box may have a real-time storage function. For example, the black box may include real-time telephony, real-time streaming media, real-time ambient recording functions, including functions that can be played when necessary. In addition, the black box may further include a function of storing and keeping real-time information.

As one example, the black box may sense an event as a peripheral sound, stored in real time at specific time intervals. For example, the black box may sense a conversation session, an important session, an accident occurrence, etc., and store (or record) data several minutes before and after the start of the sensing occurrence time. At this time, the black box may not be always stored but perform event-based storage.

As another example, the black box may store position information of an object. For example, the black box may sense and interpret an object in a specific space or a vocal object, an animal, and/or a person as an object, and store position information of the object as 3D information. At this time, the black box may be stored in a manner of reflecting several minutes before or after the start of the sensing occurrence time, a specific time, and the like.

As another example, the black box may store audio data and information data such as contents of a call transmitted and received by the driver, a sound source and a streaming audio during playback, or the like in real time, or store the data reflecting a specific time or the like.

As another example, the black box may be used as an interface for space storage, pointer control, or the like, by performing recognition by voice, such as voice recognition, and performing object recognition and interpretation based on a 3D position by space and object recognition. When a plurality of people have a conversation in a space, it is difficult for the user to recognize who is the user, and thus the black box can perform 3D space recognition based on voice, discriminate the instructor, and perform control recognition. At this time, the black box can be used to store in a three-dimensional storage space in real time and to store by object.

For convenience of explanation, it is assumed that the sound output device 500 is an in-ear headphone.

Fig. 8 shows an example for explaining the sound output device as an in-ear headphone, and fig. 9 shows another example for explaining the sound output device as an in-ear headphone.

The sound output device 500 may be a plurality of

earphones

510, 530 worn by the user.

The first earphone 510 may include a first microphone 511, a first speaker 513, and a first processor 515 as an earphone to be worn on the left ear of the user.

The second headset 530 may include a second microphone 531, a second speaker 533, and a second processor 535 as a headset to be worn on the right ear of the user.

The first earphone 510 and the second earphone 530 may include the sound generation apparatus 300.

The first processor 515 and the second processor 535 may share data with each other.

The first processor 515 and the second processor 535 may filter noise in the real space through the first microphone 511 and the second microphone 531 to obtain real sound. For example, the first processor 515 and the second processor 535 may analyze noise information around the user to obtain an actual sound from which noise is removed by a noise reduction function. At this time, the sound obtaining time of the actual sound obtained by the first microphone 511 and the sound obtaining time of the actual sound obtained by the second microphone 513 may be different.

The first and second processors 515 and 535 may recognize an actual sound as a 3D actual sound corresponding to an actual space based on a sound obtaining time difference of the actual sound obtained through the first and second microphones 511 and 531.

The first processor 515 and the second processor 535 can track the head of the user by using the compass Mems to obtain the head direction of the user. In this case, the first processor 515 and the second processor 535 may perform head tracking (head tracking) using a gyroscope, an accelerator Mems, or the like, in addition to the compass Mems.

The first and second processors 515 and 535 may transmit the actual sound, the sound obtaining time of the actual sound obtained through the first microphone 511, the sound obtaining time of the actual sound obtained through the second microphone 513, and the user head direction-related information to the sound generation apparatus 300.

The first processor 515 and the second processor 535 may output the play sound or the combined sound through the first speaker 513 and the second speaker 533.

When outputting the playback sound, the first processor 515 and the second processor 535 may output the playback sound to which the 3D sound effect is applied through 3D audio conversion after obtaining the playback sound by the MP3 player as the sound providing apparatus 100. The played sound to which the 3D sound effect is applied may be a sound to which a 3D audio effect such as a binaural effect is applied. The sound to which the 3D audio effect is applied may be a sound in which a multi-channel 3D audio effect is reflected according to the number of speakers. Sounds reflecting the multi-channel 3D audio effect may be various, for example, 5.1 channel sounds, 7.1 channel sounds, 10.1 channel sounds, and the like.

As shown in fig. 8, the microphones 511 and 531) and the speakers 531 and 533 are illustrated as 2 pieces, but are not limited thereto. For example, the microphones may be embodied in a plurality to obtain actual sound. The speakers may be embodied in plurality, outputting a played sound or a combined sound.

Fig. 10 shows an example of combined sound for explaining one embodiment.

The combined sound may be a 3D sound that occurs in a mixed space in which a virtual space and a real space are mixed. Virtual spaces can be diverse, such as street patterns, forests, travel places, memory spaces, universe spaces, and so forth. The actual space may be a space where the listener is currently located, such as a coffee shop or a restaurant. The mixing space may be varied, such as a coffee shop located on a street, a restaurant located in a forest, and so on.

Specifically, when the user is located in a coffee house and the user listens to virtual reality sounds that occur in the space, the mixed space may be a space in which the coffee house is mixed with the space.

When the user is located in a restaurant and the user listens to virtual reality sounds that occur in hawaii, the mixed space may be a space in which the restaurant is mixed with hawaii.

The virtual object sound generated in the virtual space may be a 3D sound reflecting the distance (sound size), position (sound direction), and movement (changes in sound size and direction) of a virtual object located in the virtual space.

The actual object sound generated in the actual space may be a 3D sound reflecting the distance (sound size), position (sound direction), and movement (changes in sound size and direction) of the actual object in the actual space.

The processor 350 may obtain the actual sound 1110 through the first and second microphones 511 and 531 of the sound output device 500.

The processor 350 may filter the plurality of actual object sounds 1120 among the actual sounds based on a filter (filter). The filter (filter) may be an audio filter in various ways such as a real time filter (real time filter). For example, the processor 350 may remove the noise sound 1130 from the actual sound based on a noise filtering technique.

The processor 350 may detect the sound 1140 corresponding to the frequency and volume of the object sound among the actual sound from which the noise is removed, based on at least one of the frequency and volume of the object sound.

The processor 350 may recognize the detected sounds as a plurality of actual object sounds 1140.

The processor 350 may obtain the playback sound 1160 transmitted from the sound providing apparatus 500.

The processor 350 may combine at least one of the plurality of real object sounds of the real sound with at least one of the plurality of virtual object sounds of the play sound to generate a combined sound 1170.

The processor 350 may provide the combined sound to the user 1180 through the first speaker 513 and the second speaker 533 of the sound output device 500.

Next, a sound generation system according to another embodiment will be described with reference to fig. 12 to 14.

Fig. 12 shows a sound generation system of another embodiment.

The technical matters described with reference to fig. 1 to 12 can be similarly applied to the respective configurations of fig. 12 to 14.

The sound generation system 20 includes a sound providing apparatus 100 and a sound generating apparatus 300.

The sound providing apparatus 100 may provide the 2D soundtrack, which is a general 2D audio-only sound, to the sound generating apparatus 300. The general 2D audio-only sound may be a 2D sound recorded without applying a 3D sound effect.

For example, the sound providing apparatus 100 may generate a 2D soundtrack composed of a plurality of object sounds.

The sound providing apparatus 100 can record sounds respectively played by a variety of musical instruments to generate a 2D soundtrack.

The sound providing apparatus 100 can combine object sounds of respective musical instruments recorded in advance (or generated in advance) to generate a 2D soundtrack.

The sound providing apparatus 100 may transmit the 2D sound track to the sound generating apparatus 300 in a wired communication method and/or a wireless communication method. The wired communication method may be a communication method using various wired communication methods such as USB (universal serial bus), displayport, and HDMI (high definition multimedia interface). The wireless communication method may be a communication method using various wireless communication methods such as Wi-Fi (wireless fidelity) and bluetooth.

The sound generation device 300 may place a plurality of 2D object sounds in the 3D virtual space, respectively, and convert the 2D soundtrack into a 3D soundtrack reflecting the 3D virtual space, using a binaural effect technique.

The binaural effect may be a technique of generating a 3D sound through a stereo speaker based on a sound direction spatial recognition technique by a sound transfer difference due to human binaural positions.

The 3D soundtrack may be a 3D audio only sound that reflects the spatial, live, and directionality of the sound. The 3D audio-only sound may be a 3D sound obtained by converting a multichannel 2D sound (or multichannel 2D audio) such as a non-3D sound (or non-3D audio, or non-3D sound source) or a 5.1 channel sound into a 3D sound. The 3D audio-only sound may be a 3D sound of a variety of channels such as a 3D2 channel, a 3d5.1 channel, and a 3d10.1 channel.

The sound generation apparatus 300 may provide the 3D soundtrack to the electronic apparatus in various ways.

The electronic device may be various devices such as a voice output device, a Personal Computer (PC), a data server, and a portable electronic device. The portable electronic device may be embodied in a laptop (laptop) computer, a mobile phone, a smart phone (smart phone), a tablet (tablet) PC, a Mobile Internet Device (MID)), a Personal Digital Assistant (PDA), an Enterprise Digital Assistant (EDA), a digital camera (digital still camera), a digital video camera (digital video camera), a Portable Multimedia Player (PMP), a PND (personal navigation device or portable navigation device), a hand-held game console (hand-held game console), an electronic book (e-book), and a smart device (smart device). At this time, the smart device may be embodied in a smart watch (smart watch) or a smart band (smart band).

For example, the sound generation apparatus 300 may generate sound in a ratio of 1: n method, a 3D soundtrack is provided to sound output devices used by a plurality of listeners, respectively. 1: the N-mode may be a broadcast type mode in which a 3D soundtrack is provided to a plurality of listeners.

The sound generation apparatus 300 may convert the sound into N: n-way, a 3D soundtrack is selectively provided to sound output devices respectively used by a plurality of listeners. N: the N-way may be a custom-type way of having multiple 3D soundtracks selectively available to multiple listeners.

The sound generation apparatus 300 may convert the sound into N: mode 1, a plurality of 3D sound tracks are all provided to a sound output device used by a single listener. N: the 1 mode may be a service-intensive multiple access mode that allows multiple 3D soundtracks to be provided to a single user.

The sound generation device 300 may provide the 3D sound track to the sound output device by the wired communication method and/or the wireless communication method described above.

The sound output device can be embodied in wearable type, in-ear type, on-ear type, and brain wave translation type.

The wearable type may be a type of article conveniently worn in conjunction with (or worn by) a listener. The wearable type can be a hair band, a shoulder attachment device, an attachment device on a pullover and/or jacket and/or space suit, goggles, glasses and the like. The in-ear type may be an earphone. The ear-sticking type can be a headset, a helmet and the like. The brain wave translation type may be a brain wave transmission device.

The audio output device may be embodied in a sensory device using HMD, smart glasses, a See-through (See-thru) display device, a multi-modal (e.g., five-sensor) display device, or a bone conduction audio device.

Fig. 13 is a diagram showing an example for explaining the operation of the sound generation device shown in fig. 12.

The sound generation apparatus 300 may include a communication module 310, a memory 330, and a processor 350. The basic technical matters regarding the

respective configurations

310, 330, and 350 are substantially the same as those described in fig. 3.

The processor 350 may obtain a 2D soundtrack. The 2D soundtrack may be a 2-channel stereo or a 1-channel mono type sound.

The processor 350 may separate the 2D soundtrack according to object and frequency, and extract a plurality of 2D object sounds included in the 2D soundtrack.

For example, the processor 350 may detect a plurality of 2D object sounds included in the 2D soundtrack using sound detection. The processor 350 may separate the 2D sound tracks for objects corresponding to the detected 2D object sounds, and extract a plurality of 2D object sounds as the object sounds, respectively. The plurality of 2D object sounds may be various musical instrument sounds such as violin sounds, drum sounds, guitar sounds, bass sounds, electronic organ sounds, and trumpet sounds.

The processor 350 may manage (or store) the plurality of 2D object sounds as a soundtrack (or a track) by respectively indexing object names (or names) corresponding to the plurality of 2D object sounds. For example, the processor 350 may index a violin to violin sounds and manage as a first 2D soundtrack. The processor 350 may index the drum to the drum sound and manage as the second 2D soundtrack. The processor 350 may index the guitar to the guitar sound and manage as a third 2D soundtrack. The processor 350 may index bass into a bass sound and manage as a fourth 2D soundtrack. The processor 350 may index the electronic organ to the electronic organ sound and manage as a fifth 2D sound track. The processor 350 may index the trumpet to the trumpet sound and manage as a sixth 2D soundtrack.

The processor 350 may variously decide the 3D location of the first 2D soundtrack to the sixth 2D soundtrack.

The processor 350 may apply different 3D localization and binaural effects to the first to sixth 2D soundtracks to transform the first to sixth 2D soundtracks into the first to sixth 3D soundtracks. At this time, the processor 350 may apply (or render) a binaural effect to the first to sixth 2D sound tracks separated into the plurality of channels after separating the first to sixth 2D sound tracks into the plurality of channels.

The processor 350 may integrate the first to sixth 3D soundtracks to generate a 3D soundtrack. The 3D soundtrack may be a multi-channel 3D sound that transforms a stereo or mono 2D soundtrack and applies a binaural effect.

The processor 350 may obtain the 2D soundtrack 610 transmitted from the sound providing apparatus 100.

The processor 350 may extract a plurality of 2D object sounds 630 included in the 2D soundtrack by separating the 2D soundtrack in frequency and/or object using an equalizer effect and/or sound detection technique.

The processor 350 may convert the plurality of 2D object sounds into the plurality of 3D object sounds 650 by applying a plurality of binaural effects to the plurality of 2D object sounds, respectively, through a binaural effect technique.

Processor 350 may integrate the plurality of 3D object sounds to generate a 3D soundtrack 670 that transforms the 2D soundtrack into 3D sounds.

The method of the embodiment may be embodied in the form of program instructions that can be executed by various computer devices and recorded on a computer-readable medium. The computer readable medium may include program commands, data files, data structures, etc., alone or in combination. The program commands recorded in the medium may be specially designed and constructed for the embodiments, or may be well known and available to those skilled in computer software. Examples of the computer readable recording medium include magnetic media (magnetic media) such as hard disks, floppy disks, and magnetic tapes, optical recording media (optical media) such as CD-ROMs (compact disc-read only drives), DVDs (digital versatile discs), magneto-optical media (magnetic-optical media) such as floppy discs (floppy discs), and hardware devices specially configured to store and execute program commands such as Read Only Memories (ROMs), random Access Memories (RAMs), flash memories, and the like. In the example of the program command, there are not only machine language code made by means of a compiler, but also high-level language code that can be run by means of a computer using an interpreter or the like. The hardware devices may be configured to operate as one or more software modules and vice versa in order to perform the operations of the embodiments.

The software may comprise a computer program (computer program), code, command (instruction) or a combination of one or more of them, constituting the processing means so as to make it operate as desired, or instructing the processing means independently or in combination. The software and/or data may be embodied (embodied) permanently or temporarily in some type of machine, component (component), physical device, virtual device (virtual equipment), computer storage medium or device, or signal wave (signal wave) for transmission in order to be analyzed by or provide commands or data to the processing device. The software may also be distributed over network coupled computer systems for storage or execution in a distributed fashion. The software and data may be stored in one or more computer-readable recording media.

As described above, although the embodiments have been described with reference to the drawings, it is apparent that various technical modifications and variations can be made by those skilled in the art. For example, the described techniques may be performed in an order different from that of the described methods, or components of the described systems, structures, devices, circuits, and the like may be combined or combined in a manner different from that of the described methods, or may be replaced or substituted by other components or equivalents, or may achieve appropriate results even when substituted.

Accordingly, various embodiments, examples, and equivalents to the claims are intended to be included within the scope of the claims that follow.

Claims

1. A sound generation method, comprising:

extracting a plurality of 2D object sounds included in the 2D soundtrack;

a step of converting the plurality of 2D object sounds into a plurality of 3D object sounds by applying a plurality of binaural effects to the plurality of 2D object sounds, respectively;

generating a 3D soundtrack based on the plurality of 3D object sounds.

2. The sound generation method according to claim 1,

the plurality of 2D object sounds are sounds separated by a frequency in the 2D soundtrack and one of the objects.

3. The sound generation method according to claim 1,

the step of extracting comprises: and a step of separating the 2D sound tracks by frequency band by using an equalizer effect, and extracting the plurality of 2D object sounds.

4. The sound generation method according to claim 1,

the step of extracting comprises: and separating the 2D sound tracks according to objects by sound detection, and extracting the sound of the plurality of 2D objects.

5. The sound generation method according to claim 1,

the step of transforming comprises: generating a first 3D object sound by applying a first binaural effect to a first 2D object sound among the plurality of 2D object sounds; and generating a second 3D guest sound by applying a second binaural effect to a second 2D guest sound among the plurality of 2D guest sounds.

6. The sound generation method according to claim 5,

the first binaural effect and the second binaural effect are different from each other or the same binaural effect as each other.

7. The sound generation method according to claim 5,

the generating of the first 3D object sound includes: determining a first 3D localization of the first 2D object sound; applying the first 3D positioning and the first binaural effect to a first 2D object sound to generate the first 3D object sound.

8. The sound generation method according to claim 7,

the generating of the second 3D object sound includes: a step of determining a second 3D localization for the second 2D object sound, different from the first 3D localization; a step of generating the second 3D object sound by applying the second 3D localization and the second binaural effect to the second 2D object sound.

9. The sound generation method according to claim 1,

the generating step includes generating the 3D soundtrack by integrating the plurality of 3D object sounds.

10. A sound generation apparatus comprising:

a memory comprising instructions;

a processor for executing the instructions,

the processor extracts a plurality of 2D object sounds included in a 2D soundtrack, applies a plurality of binaural effects to each of the plurality of 2D object sounds, converts the plurality of 2D object sounds into a plurality of 3D object sounds, and generates a 3D soundtrack based on the plurality of 3D object sounds.

11. The apparatus of claim 10, wherein,

the plurality of 2D object sounds are sounds separated by one of frequency and object in the 2D soundtrack.

12. The apparatus of claim 10, wherein,

the processor extracts the plurality of 2D object sounds by separating the 2D soundtracks by frequency band using an equalizer effect.

13. The apparatus of claim 10, wherein,

the processor separates the 2D soundtrack by object using sound detection to extract the plurality of 2D object sounds.

14. The apparatus of claim 10, wherein,

the processor applies a first binaural effect to a first 2D guest sound among the plurality of 2D guest sounds to generate a first 3D guest sound, and applies a second binaural effect to a second 2D guest sound among the plurality of 2D guest sounds to generate a second 3D guest sound.

15. The apparatus of claim 14, wherein,

16. The apparatus of claim 14, wherein,

the processor determines a first 3D localization for the first 2D object sound, and generates the first 3D object sound by applying the first 3D localization and the first binaural effect to the first 2D object sound.

17. The apparatus of claim 16, wherein,

the processor determines a second 3D localization for the second 2D object sound differently from the first 3D localization, applies the second 3D localization and the second binaural effect to the second 2D object sound, and generates the second 3D object sound.

18. The apparatus of claim 10, wherein,

the processor integrates the plurality of 3D object sounds to generate the 3D soundtrack.