WO2017173776A1

WO2017173776A1 - Method and system for audio editing in three-dimensional environment

Info

Publication number: WO2017173776A1
Application number: PCT/CN2016/098055
Authority: WO
Inventors: 向裴; 安德森阿丽西亚·玛丽
Original assignee: 向裴
Priority date: 2016-04-05
Filing date: 2016-09-05
Publication date: 2017-10-12

Abstract

A method and system for audio editing in a three-dimensional (3D) environment. The system (100) for audio editing in a 3D environment (150) comprises: an environment input unit (101) configured to process loaded 3D data; an audio input unit (102) configured to process a loaded audio element; a rendering unit (103) configured to create according to the processed 3D data a 3D environment (150); an environment operation unit (104) configured to locate sound generation sources of the audio element and identify the same as objects (170-1, 170-2, 170-3, …, 170-n) in the 3D environment (150); and a DAW unit (150) configured to edit sounds generated by the objects (170-1, 170-2, 170-3, …, 170-n) in the 3D environment (150). A user can identify the objects (170-1, 170-2, 170-3, …, 170-n) in the 3D environment (150) to be the sound generation sources, thereby creating an immersive audio track for use in virtualization or 3D environments.

Description

Audio editing method and system in 3D environment

Technical field

The present invention relates generally to sound scenes, and more particularly to audio editing methods and systems for use in a three dimensional environment.

Background technique

Traditional audio mixing technology allows users to operate audio tracks with high precision. Digital Audio Workstations (DAWs) are now widely used to monitor audio information received from multiple channels. These DAW systems enable users to manipulate variables such as quality, duration, volume balance, and more. Although useful, traditional DAW systems do not provide an intuitive sound mixing option for spatial operation of sound. Various multi-channel sound formats attempt to enable spatial operations. These formats enable the user to specify a speaker that wants to broadcast a particular sound at a particular time. However, these formats do not compensate for user movement in a three-dimensional (3D) environment.

Summary of the invention

The present invention is directed to solving the above drawbacks. Because it is a system for specifying the exact location of a sound generation source, the present invention is capable of creating an ideal sound scene within a 3D environment. That is, the present invention enables a sound engineer to specify the source of various sounds within the environment by environmental movement as well as operator displacement and head rotation. In this way, the user can intuitively operate the sound within the 3D environment.

In addition to ascertaining the location of the sound source, the present invention can also be used as a DAW capable of processing audio tracks of various objects from a 3D environment. That is, the present invention allows a user to specify an object such as a character, an animal, a vehicle, a river, or the like as a sound generation source. The user can then perform a mixing operation on any of the sounds associated with these objects of the 3D environment.

According to a first aspect of the present invention, there is provided an audio editing method for use in a three-dimensional (3D) environment, comprising: processing loaded 3D data; processing loaded audio material; constructing a 3D environment using the processed 3D data; The sound source of the material is located in an object in the 3D environment; Edit the sound produced by objects in the 3D environment.

In the audio editing method according to the first aspect of the present invention, the virtual console is constructed in the constructed 3D environment such that the user controls the objects and sounds in the 3D environment by operating the virtual console.

In the audio editing method according to the first aspect of the present invention, while the user moves in the 3D environment, the object in the 3D environment is designated as the sound generation source.

In the audio editing method according to the first aspect of the present invention, the editing of the sound generated by the object in the 3D environment further includes: presenting the sound generated by the object in the 3D environment in the form of a soundtrack; and mixing and performing the soundtrack Format to create a new audio file.

In the audio editing method according to the first aspect of the present invention, when the object moves in the 3D environment, changes in the position and propagation of the sound due to the movement of the object are modeled and reflected in the soundtrack.

In the audio editing method according to the first aspect of the present invention, the object that generates the sound is indicated by a visual mark that displays information about the current track so that the user can track the motion of the object in the 3D environment.

In the audio editing method according to the first aspect of the present invention, the sound propagation condition is modeled to construct a multi-user environment in which ambient sound is projected to each user as a function of the user's position in the 3D environment.

In the audio editing method according to the first aspect of the present invention, the new audio file conforms to an industry standard format.

In the audio editing method according to the first aspect of the present invention, the new audio file is saved in a database or uploaded to a remote computer or data center.

According to a second aspect of the present invention, there is provided an audio editing for use in a three dimensional (3D) environment The system comprises: an environment input unit for processing the loaded 3D data; an audio input unit for processing the loaded audio material; a rendering unit for constructing the 3D environment using the processed 3D data; and an environmental operation unit for The sound generation source of the audio material is positioned in an object in the 3D environment; and the digital audio workstation unit is used to edit the sound produced by the object in the 3D environment.

In an audio editing system according to a second aspect of the present invention, the rendering unit constructs a virtual console in a built 3D environment, such that a user controls the environmental operating unit and the digital audio workstation unit by operating a virtual console operating.

In the audio editing system according to the second aspect of the present invention, the environment operating unit is further configured to cause a user to move in a 3D environment, and specify an object in the 3D environment to be used as a sound generation source while the user moves in the 3D environment.

In an audio editing system according to the second aspect of the present invention, the digital audio workstation unit is further configured to present sounds generated by objects in a 3D environment in the form of audio tracks and to mix and format the soundtracks to create new ones. Audio file.

In the audio editing system according to the second aspect of the present invention, when the object moves in the 3D environment, the digital audio workstation unit models changes in sound generation position and propagation due to object movement, and is reflected in the audio track in.

In the audio editing system according to the second aspect of the present invention, the object that generates the sound is indicated by a visual mark that displays information about the current track so that the user can track the motion of the object in the 3D environment.

In the audio editing system according to the second aspect of the present invention, the environment operating unit further models a sound propagation condition to construct a multi-user environment, wherein the ambient sound is projected to each user as the user in a 3D environment The function of the position.

In the audio editing system according to the second aspect of the present invention, the new audio file conforms to the industry standard format.

In an audio editing system according to the second aspect of the present invention, the digital audio workstation unit further saves the new audio file in a database or uploads it to a remote computer, data center.

In accordance with the method and system of the present invention, a user can operate a sound scene in a virtualized 3D environment. More specifically, the user can recognize that objects in the 3D environment are sound generation sources, and operate sounds generated by these objects. In accordance with the present invention, a user will be able to create an immersive audio track (track) for use in a virtualized or 3D environment.

DRAWINGS

The invention will now be described in connection with the embodiments with reference to the accompanying drawings. In the drawing:

1 is a schematic diagram illustrating an audio editing system for use in a three dimensional environment, in accordance with an embodiment of the present invention.

2 is a flow chart illustrating a method for audio editing in a three dimensional environment, in accordance with an embodiment of the present invention.

detailed description

Specific embodiments of the present invention will be explained in detail below with reference to the accompanying drawings.

1 is a schematic diagram illustrating an audio editing system for use in a three-dimensional (3D) environment, in accordance with an embodiment of the present invention.

As shown in FIG. 1, an audio editing system 100 for use in a 3D environment according to an embodiment of the present invention includes an environment input unit 101, an audio input unit 102, a rendering unit 103, an environment operating unit 104, and a digital audio workstation (DAW). Unit 105.

As shown in FIG. 1, the environment input unit 101 receives loaded three-dimensional (3D) data and processes the loaded 3D data. The processed 3D data is transmitted to the rendering unit 103. The 3D data described herein may be virtual reality (VR) data or other 3D movie/game space data.

The audio input unit 102 then receives the loaded audio material and processes the loaded audio material for use in the 3D environment to be generated.

The original audio material may include: a sound source output by other editors, an audio stream from a network or a field acquisition device. For example, a movie in a battle scene, input audio material for helicopters, airplanes, bullets, warriors, artillery, ambient sounds and other sound sources.

Rendering unit 103 constructs 3D environment 150 using the processed 3D data. In the schematic shown in FIG. 1, the 3D environment 150 is specifically a 3D VR environment. Those skilled in the art will appreciate that the present invention is not limited to implementation in a 3D VR environment. In the 3D environment 150 shown in FIG. 1, preferably, the rendering unit 103 also constructs a virtual console 160 such that the user controls the operations of the environment operating unit 104 and the DAW unit 105 described below by operating the virtual console 160. Further, in the 3D environment 150, there are also a plurality of objects 170-1, 170-2, 170-3, ..., 170-n (where n is a natural number).

In a preferred embodiment of the invention, the rendering unit 103, when using the data processed by the environment input unit 101 to construct a 3D environment, is transferred to one or more VR headsets. When the user is immersed in the 3D VR environment, the user can interact with the virtual console 160 in a 3D environment. This virtual console is used as a user interface. Commands input into the virtual user interface are passed to the environment operating unit 104 and the DAW unit 105.

The environment operating unit 104 shown in FIG. 1 can position a plurality of sound generating sources of the audio material to the respective objects 170-1, 170-2, 170-3, ..., 170-n in the 3D environment 150, respectively, and 3D The sound produced by the objects 170-1, 170-2, 170-3, ..., 170-n in the environment is presented in the form of a track. In a preferred embodiment, the audio track can be presented on virtual console 160.

In the audio editing system 100, the environment operating unit 104 may cause a user to move (navigate) in the 3D environment 150. The DAW unit 105 can then cooperate with the environment operating unit 104 to specify objects 170-1, 170-2, 170-3, ..., 170-n in the 3D environment to be used while the user is moving in the 3D environment 150. The sound generation source, the sound generated by the objects 170-1, 170-2, 170-3, ..., 170-n in the 3D environment is presented in the form of a track, preferably presented in the virtual console 160 on. In other words, the user can assign sounds in the 3D environment to any part (object) of the 3D virtual environment, such as objects, people, animals, open spaces, landscapes, and the like.

Moreover, in a preferred embodiment of the present invention, when one or more of the objects 170-1, 170-2, 170-3, ..., 170-n are moved in the 3D environment, the DAW unit 105 is The change in position and propagation of the sound due to object movement is modeled and reflected in the soundtrack. That is, the system of the present invention models changes in the location and propagation of sound within the 3D environment such that when the position of the object changes relative to the user, it also ideally affects the user's perception of the sound scene within the environment. Each track attached to the 3D environment is assigned a specific tag that is used to represent attributes such as exact location, time of occurrence, associated object, and the like. The 3D environment with additional tracks is edited in the DAW unit 105, including but not limited to audio association, permutation, mixing, encoding, and the like.

In a preferred embodiment of the invention, the object that produces the sound is indicated by a visual indicia (not shown in Figure 1) that displays information about the current track so that the user can track the object in the 3D environment 150 exercise.

Moreover, the environmental operations 104 can further model the sound propagation conditions to construct a multi-user environment in which ambient sound is projected to each user as a function of the user's position in the 3D environment 150. Thus, the system of the present invention creates an ideal audio archive for multiple users within a single VR environment.

In addition to the single source described so far, the sound object may also be the entire sound field environment as a sound source. This sound source has no specific directionality, but is represented by an audio signal similar to Ambisonics or a multi-channel audio signal driven by 5.1, 7.1, etc. This type of sound signal is not the primary target for this editor, but another sound source for this audio editor may appear in the 3D mix. Due to the nature of the sound source, the editor will be represented by a graphic that is different from the point source. In general, this sound field source has directionality, but does not have its own spatial coordinates.

In other words, some objects in the 3D environment can be called point sources, that is, each has its own sense of direction; in addition, the sound field, such as FOA (first order ambisonics), HOA (higher Order ambisonics), 5.1 or 7.1 channels, etc., represent the entire field, and can also be used as objects in a 3D environment, but represent a background layer without its own fixed spatial position. The "object" described in the present invention also includes such a sound source as described above.

The DAW unit 105 shown in Figure 1 can mix and format the tracks to create a new audio file. The audio file may contain processed audio information (audio tracks, etc.) generated by the DAW unit 105. Preferably, the new audio file may conform to industry standard formats, such as the mainstream audio file format known to those skilled in the art. In addition, the DAW unit 105 can further save new audio files in a database or upload them to a remote computer, data center. Thus, it is possible for a user to incorporate audio files stored in a database, remote computer, data center, etc. into a sound scene being built within the VR environment. That is, the user can load the saved file and use the DAW unit 105 to operate the file.

The above two objects are controlled, and the format of the audio file that can be output after being combined may be the following:

a. channel based: 5.1, 7.1, 11.1, 22.2, Auro 3D, etc.

b. object based: Dolby ATMOS (channel + object)

c. Scene based: HOA. At the same time, HOA can also bring several track objects, such as commentary, narration, each track is mono, compressed separately, and transmitted with HOA's scene based code stream.

For example, the output audio file can be an Ambisonics track (4 tracks in 1st order, (n+1) ² tracks in n order), mainly used for VR; or traditional 5.1, 7.1, 11.1, 22.2, etc. Channel format, or soundtracks like MPEG-H and Dolby ATMOS plus separate sound sources.

In addition, new audio files need to contain additional information, such as metadata or side information, especially in ATMOS and object-based audio formats. This metadata is typically added to each frame of the audio data encoding and is synchronized in time with the audio signal itself.

2 is a diagram illustrating audio for use in a three-dimensional (3D) environment, in accordance with an embodiment of the present invention. Flow chart of the editing method.

As shown in FIG. 2, a flowchart S200 for an audio editing method in a 3D environment according to an embodiment of the present invention begins in step S201. In this step, the loaded 3D data is processed. And in step S203, the loaded audio material may be processed before or after step S201 or at the same time. Audio material is an abstraction of the audio signal, and real-time audio streams as well as signals and the like can also appear here.

At step S205, the processed 3D data is used to construct a 3D environment. In accordance with a preferred embodiment of the present invention, a virtual console can be built in a built 3D environment such that the user controls the objects and sounds in the virtual reality environment by operating the virtual console.

In a preferred embodiment of the invention, these virtualized 3D environments are transmitted to one or more VR headsets when the processed 3D data is used to construct the 3D environment. When the user is immersed in the environment, the user can interact with the virtual console in a 3D environment.

In step S207, the sound generation source is positioned in the object in the 3D environment. According to a preferred embodiment of the present invention, an object in the 3D environment can be designated as a sound generation source while the user is moving in the 3D environment.

At step S209, the sound generated by the object in the 3D environment is edited. Preferably, the sound produced by the object in the 3D environment is presented in the form of a soundtrack. According to a preferred embodiment of the present invention, when an object moves in a 3D environment, changes in sound generation position and propagation due to object movement are modeled and reflected in the soundtrack.

In accordance with a preferred embodiment of the present invention, the object that produces the sound is indicated by a visual indicia that displays information about the current track so that the user can track the motion of the object in the 3D environment.

In a preferred embodiment of the invention, the sound propagation situation can be modeled to construct a multi-user environment in which ambient sound is projected to each user as a user in a 3D environment. The function of the location.

In the operation of step S209, the tracks can be mixed and formatted to create a new audio file. Preferably, the new audio file can conform to an industry standard format. New audio files can be saved in a database or uploaded to a remote computer or data center. In an application scenario such as a live broadcast, the newly created audio file may appear as a real-time audio stream or an audio signal, not necessarily a specific file written to a certain medium.

Thereafter, method flow diagram S200 can end.

The term "unit" of the present invention may also be used herein to refer to an assembly grouped based on functionality. It is an object of the present invention to provide a digital audio workstation that enables a sound engineer to manipulate the position, propagation, and intensity of sound within a virtual environment. To this end, the invention may be software for processing a pre-built virtual reality environment. That is, the present invention reads various VR formats and enables a user to become immersed in a VR environment through a connected VR headset.

Therefore, according to the present invention, there is also provided a computer readable recording medium. The instructions are stored on the computer readable recording medium. The instructions, when executed by one or more processors for audio editing in a three-dimensional (3D) environment, cause the one or more processors to:

Processing the loaded 3D data;

Processing the loaded audio material;

Use the processed 3D data to build a 3D environment;

Positioning the sound source of the audio material in an object in the 3D environment;

Edit the sound produced by objects in the 3D environment.

Furthermore, the above term "unit" may also be referred to as "engine." Therefore, reference can be made to the following description.

A preferred embodiment of the present invention is a system for operating audio information within a virtualized three dimensional environment. The present invention includes an environment input engine, an audio input engine, a rendering engine, an environmental operations engine, a digital audio workstation (DAW) engine, an encoding engine, a user interface (UI) engine, and a database. The term "engine" is used herein to refer to an assembly that is grouped based on functionality. It is an object of the present invention to provide a digital audio workstation that enables a sound engineer to manipulate the position, propagation, and intensity of sound within a virtual environment. To this end, the present invention is software for processing a pre-built virtual reality environment. That is, the present invention reads various VR formats and enables a user to become immersed in a 3D environment through a connected VR headset.

In a preferred method of the invention, the invention is used as a program in which a user loads a VR environment, a movie, etc. into the program. To this end, the environment input engine processes the 3D or VR data loaded into the system. In a preferred embodiment of the invention, the task of the environment input engine is to read 3D environments in various formats. The user loads the audio file into the audio input engine. The audio input engine processes all audio files loaded into the system of the present invention. The 3D environment loaded into the environment input engine is processed and then passed to the rendering engine.

In a preferred embodiment of the invention, the rendering engine uses the data processed by the environment input engine to build a 3D environment. These virtualized environments are delivered to one or more VR headsets. It is an object of the present invention to provide a rendering engine that generates a 3D control panel that allows the user to interact with the 3D control panel when the user is immersed in the VR environment. That is, in addition to the virtual environment, the rendering engine generates a virtual console that is used as a user interface. Commands entered into the virtual interface are passed to the environment operations engine and the DAW engine.

In a preferred embodiment of the invention, the environmental operations engine enables the user to navigate within the VR environment. It is an object of the present invention to provide an environmental operations engine that cooperates with a DAW engine to enable a user to locate a sound generation source at any location in a virtualized environment. That is, when the user moves within the 3D environment, he can specify an object in the environment to use as a sound generation source. The user can assign object assignments and sound files to any part of the virtual environment, such as objects, people, animals, open spaces, landscapes, and the like.

In a preferred embodiment of the invention, the DAW engine acts as a mix and operating system capable of processing audio tracks from multiple objects within the VR environment. In addition to mixing audio tracks associated with multiple objects, the DAW engine and the environmental operations engine model changes in sound propagation as the object moves within a 3D or VR environment. That is, the system of the present invention transmits sound in a 3D environment Modeling is performed such that the position of the object relative to the user changes, ideally affecting the user's perception of the sound scene within the environment. Each track attached to the VR environment is assigned a specific tag that is used to represent attributes such as exact location, time of occurrence, associated object, and the like. The 3D environment with the attached audio track is then passed to the encoding engine.

In an additional embodiment of the invention, the object designated as the sound generation source is indicated by a visual marker. These visual markers broadcast information about the current track, enabling the user to track the motion of the object in the VR environment. In an additional embodiment, the system of the present invention is capable of modeling sound propagation archives for environments containing multiple users. In this embodiment, ambient sound is projected to each user as a function of its location within the 3D environment. Thus, the system of the present invention creates an ideal audio archive for multiple users within a single 3D environment.

In a preferred embodiment of the invention, the encoding engine formats the audio tracks associated with the processed 3D environment. It is an object of the present invention to provide an encoding engine for constructing an audio file containing processed audio information generated by a DAW engine and an environmental operations engine. The audio files built by the encoding engine are encoded into an industry standard format. In a preferred embodiment, the task of the UI engine is to interpret user input. To this end, the system of the present invention interacts with various forms of user input systems to enable a user to operate a virtual console generated by a rendering engine. The audio files generated by the system of the present invention are stored in a database. In addition, users can incorporate audio files saved in the database into audio files that are being built within the 3D environment. That is, the user can load the saved file and use the DAW engine to manipulate the file. In an additional embodiment, the user can upload the audio file to a remote computer, data center, or the like.

Various embodiments and implementations of the invention have been described above. However, the spirit and scope of the present invention are not limited thereto. Those skilled in the art will be able to make further applications in accordance with the teachings of the present invention, and such applications are within the scope of the present invention.

Claims

An audio editing method for use in a three-dimensional (3D) environment, comprising:

Processing the loaded 3D data;

Processing the loaded audio material;

Use the processed 3D data to build a 3D environment;

Positioning the sound source of the audio material in an object in the 3D environment;

Edit the sound produced by objects in the 3D environment.
The audio editing method of claim 1, wherein constructing the 3D environment using the processed 3D data further comprises:

The virtual console is built in the built 3D environment, enabling the user to control the objects and sounds in the 3D environment by operating the virtual console.
The audio editing method according to claim 1, wherein the object that locates the sound generating source of the audio material in the 3D environment further comprises:

While the user is moving in the 3D environment, an object in the specified 3D environment is used as a sound generation source.
The audio editing method of claim 1, wherein editing the sound generated by the object in the 3D environment further comprises:

Rendering sounds produced by objects in a 3D environment in the form of audio tracks;

Mix and format the tracks to create a new audio file.
The audio editing method according to claim 4, wherein presenting the sound generated by the object in the 3D environment in the form of a soundtrack further comprises:

When an object moves in a 3D environment, changes in position and propagation of sound due to object movement are modeled and reflected in the soundtrack.
The audio editing method according to claim 4, wherein the object that generates the sound is indicated by a visual mark that displays information about the current track so that the user can track the motion of the object in the 3D environment.
The audio editing method according to claim 1, further comprising:

The sound propagation situation is modeled to build a multi-user environment in which ambient sound is projected to each user as a function of the user's position in the 3D environment.
The audio editing method of claim 4 wherein the new audio file conforms to an industry standard format.
The audio editing method of claim 4, further comprising:

Save the new audio file in a database or upload it to a remote computer or data center.
An audio editing system for use in a three-dimensional (3D) environment, comprising:

An environment input unit for processing the loaded 3D data;

An audio input unit for processing the loaded audio material;

a rendering unit for constructing a 3D environment using the processed 3D data;

An environmental operating unit for locating a sound generating source of the audio material to an object in the 3D environment;

A digital audio workstation unit for editing the sound produced by objects in a 3D environment.
The audio editing system of claim 10 wherein said rendering unit constructs a virtual console in a built 3D environment such that a user controls said environmental operating unit and said digital audio workstation unit by operating a virtual console operating.
The audio editing system according to claim 10, wherein the environment operating unit is further configured to cause the user to move in the 3D environment, and to specify an object in the 3D environment to be used as a sound generation source while the user moves in the 3D environment.
The audio editing system of claim 10 wherein said digital audio workstation unit is further for presenting sounds produced by objects in a 3D environment in the form of audio tracks and mixing and formatting the tracks to create new ones Audio file.
The audio editing system according to claim 13, wherein said digital audio workstation unit models changes in sound generation position and propagation due to object movement when the object moves in a 3D environment, and is reflected in the audio track in.
The audio editing system of claim 13 wherein the object that produces the sound is indicated by a visual indicia that displays information about the current audio track such that the user is able to track the motion of the object in a 3D environment.
The audio editing system of claim 10, wherein said environmental operating unit further models a sound propagation condition to construct a multi-user environment, wherein ambient sound is projected to each user as the user in a 3D environment The function of the position.
The audio editing system of claim 13 wherein the new audio file conforms to an industry standard format.
The audio editing system of claim 13 wherein said digital audio workstation unit further saves the new audio file in a database or uploads to a remote computer, data center.