WO2023212879A1 - Procédé et appareil de génération de données audio d'objet, dispositif électronique, et support de stockage - Google Patents

Procédé et appareil de génération de données audio d'objet, dispositif électronique, et support de stockage Download PDF

Info

Publication number
WO2023212879A1
WO2023212879A1 PCT/CN2022/091051 CN2022091051W WO2023212879A1 WO 2023212879 A1 WO2023212879 A1 WO 2023212879A1 CN 2022091051 W CN2022091051 W CN 2022091051W WO 2023212879 A1 WO2023212879 A1 WO 2023212879A1
Authority
WO
WIPO (PCT)
Prior art keywords
sound
information
data
recording terminal
current location
Prior art date
Application number
PCT/CN2022/091051
Other languages
English (en)
Chinese (zh)
Inventor
史润宇
易鑫林
张墉
刘晗宇
吕柱良
吕雪洋
Original Assignee
北京小米移动软件有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 北京小米移动软件有限公司 filed Critical 北京小米移动软件有限公司
Priority to PCT/CN2022/091051 priority Critical patent/WO2023212879A1/fr
Priority to CN202280001279.8A priority patent/CN117355894A/zh
Publication of WO2023212879A1 publication Critical patent/WO2023212879A1/fr

Links

Images

Classifications

    • GPHYSICS
    • G10MUSICAL INSTRUMENTS; ACOUSTICS
    • G10LSPEECH ANALYSIS TECHNIQUES OR SPEECH SYNTHESIS; SPEECH RECOGNITION; SPEECH OR VOICE PROCESSING TECHNIQUES; SPEECH OR AUDIO CODING OR DECODING
    • G10L21/00Speech or voice signal processing techniques to produce another audible or non-audible signal, e.g. visual or tactile, in order to modify its quality or its intelligibility
    • G10L21/02Speech enhancement, e.g. noise reduction or echo cancellation
    • G10L21/0272Voice signal separating
    • G10L21/028Voice signal separating using properties of sound source
    • GPHYSICS
    • G11INFORMATION STORAGE
    • G11BINFORMATION STORAGE BASED ON RELATIVE MOVEMENT BETWEEN RECORD CARRIER AND TRANSDUCER
    • G11B27/00Editing; Indexing; Addressing; Timing or synchronising; Monitoring; Measuring tape travel
    • G11B27/02Editing, e.g. varying the order of information signals recorded on, or reproduced from, record carriers
    • G11B27/031Electronic editing of digitised analogue information signals, e.g. audio or video signals

Definitions

  • the present disclosure relates to the field of communication technology, and in particular, to a method, device, electronic device and storage medium for generating object audio data.
  • MPEG-H 3D Audio the next generation audio codec standard of MPEG (Moving Picture Experts Group), is the ISO/IEC 23008-3 international standard.
  • Object Audio a new audio format, object audio, is used (Object Audio) can mark the direction of the sound, so that the listener can hear the sound coming from a specific direction regardless of the number of speakers using headphones or speakers.
  • object audio data is generated by pre-recording monophonic audio and combining it with the position information of the pre-prepared monophonic audio in the later stage.
  • Using this method requires the production of production equipment in the later stage, and there is still a lack of a method.
  • a method for recording object audio data of a sound object in real time is provided.
  • Embodiments of the present disclosure provide a method, device, electronic device, and storage medium for generating object audio data, which can accurately obtain the location information of each sound object in real time, and record and generate object audio data in real time.
  • embodiments of the present disclosure provide a method for generating object audio data.
  • the method includes: obtaining sound data of at least one sound object; obtaining current location information of the at least one sound object; and converting at least one sound object to The sound data and the current location information are synthesized to generate object audio data.
  • sound data of at least one sound object is obtained; current position information of at least one sound object is obtained; sound data of at least one sound object and current position information are synthesized to generate object audio data.
  • the position information of each sound object can be accurately obtained in real time, and the object audio data can be recorded and generated in real time.
  • obtaining the current location information of the at least one sound object includes: obtaining the current location information of at least one recording terminal that records the sound data of the at least one sound object.
  • the method before synthesizing the sound data of the at least one sound object and the current location information, the method further includes: synthesizing the sound data of the at least one sound object and the current location information. to synchronize.
  • obtaining the current location information of at least one recording terminal that records the sound data of the at least one sound object includes: obtaining the at least one recording in a one-way transceiver mode, a two-way transceiver mode, or a hybrid transceiver mode.
  • the current location information of the terminal includes: obtaining the at least one recording in a one-way transceiver mode, a two-way transceiver mode, or a hybrid transceiver mode.
  • obtaining the location information of the at least one recording terminal using a hybrid transceiver method includes: obtaining first positioning reference information using the one-way transceiver method; and obtaining a second positioning reference information using the two-way transceiver method.
  • the first positioning reference information is one of angle information and distance information
  • the second positioning reference information is the other one of the angle information and the distance information
  • obtaining the current location information of the at least one recording terminal in the one-way transceiver mode includes: receiving a first positioning signal sent by the at least one recording terminal in a broadcast manner, and based on the The first positioning signal generates current location information of the at least one recording terminal.
  • obtaining the location information of the at least one recording terminal in the two-way transceiver mode includes: receiving a positioning start signal sent by the at least one recording terminal in a broadcast manner; The terminal sends a response signal; receives a second positioning signal sent by the at least one recording terminal, and generates current location information of the at least one recording terminal according to the second positioning signal.
  • each recording terminal corresponds to a sound object, and the position of the recording terminal moves along with the sound source of the sound object.
  • the method further includes: obtaining initial position information of the at least one sound object.
  • synthesizing the sound data and current location information of at least one sound object to generate object audio data includes: obtaining audio parameters, and using the audio parameters as the object audio data. header file information of the data; at each sampling moment, the sound data of each sound object is saved as the object audio signal, and the current position information is saved as the object audio auxiliary data to generate the object audio data.
  • the method further includes: saving the sound data and the current location information in frame units.
  • inventions of the present disclosure provide a device for generating object audio data.
  • the device for generating object audio data includes: a data acquisition unit configured to acquire sound data of at least one sound object; and an information acquisition unit configured to In order to obtain the current position information of the at least one sound object; a data generation unit is configured to synthesize the sound data and the current position information of the at least one sound object to generate object audio data.
  • the information acquisition unit is specifically configured to: acquire the current location information of at least one recording terminal that records the sound data of the at least one sound object.
  • the device further includes: a synchronization processing unit configured to synchronize the sound data of the at least one sound object and the current location information.
  • the information acquisition unit is specifically configured to acquire the current location information of the at least one recording terminal in a one-way transceiver mode, a two-way transceiver mode, or a hybrid transceiver mode.
  • the information acquisition unit includes: a first information acquisition module configured to acquire the first positioning reference information in the one-way transceiver mode; a second information acquisition module configured to acquire the first positioning reference information in the two-way transceiver mode.
  • the transceiver mode acquires second positioning reference information; the first current information acquisition module is configured to determine the current position information of the at least one recording terminal according to the first positioning reference information and the second positioning reference information.
  • the first positioning reference information is one of angle information and distance information
  • the second positioning reference information is the other one of the angle information and the distance information
  • the information acquisition unit includes: a second current information acquisition module configured to receive a first positioning signal sent by the at least one recording terminal in a broadcast manner, and generate a generated positioning signal based on the first positioning signal.
  • the current location information of the at least one recording terminal is not limited to: a second current information acquisition module configured to receive a first positioning signal sent by the at least one recording terminal in a broadcast manner, and generate a generated positioning signal based on the first positioning signal. The current location information of the at least one recording terminal.
  • the information acquisition unit includes: a signal receiving module configured to receive a positioning start signal sent by the at least one recording terminal in a broadcast manner; a signal sending module configured to send a signal to the at least one recording terminal. The recording terminal sends a response signal; the third current information acquisition module is configured to receive the second positioning signal sent by the at least one recording terminal, and generate the current location information of the at least one recording terminal according to the second positioning signal.
  • each recording terminal corresponds to a sound object, and the position of the recording terminal moves along with the sound source of the sound object.
  • the device further includes: an initial position acquisition unit configured to acquire initial position information of the at least one sound object.
  • the data generation unit includes: a parameter acquisition module configured to acquire audio parameters and use the audio parameters as header file information of the object audio data; an audio data generation module configured to At each sampling moment, the sound data of each sound object is saved as an object audio signal, and the current position information is saved as object audio auxiliary data to generate the object audio data.
  • the data generation unit further includes: a processing module configured to save the sound data and the current location information in frame units.
  • embodiments of the present disclosure provide an electronic device, which includes: at least one processor; and a memory communicatively connected to the at least one processor; wherein the memory stores information that can be used by the at least one processor. Instructions executed by the processor, the instructions being executed by the at least one processor, so that the at least one processor can execute the method described in the first aspect.
  • embodiments of the present disclosure provide a non-transitory computer-readable storage medium storing computer instructions, wherein the computer instructions are used to cause the computer to execute the method described in the first aspect.
  • embodiments of the present disclosure provide a computer program product, including computer instructions, characterized in that, when executed by a processor, the computer instructions implement the method described in the first aspect.
  • Figure 1 is a flow chart of a method for generating object audio data provided by an embodiment of the present disclosure
  • Figure 2 is a flow chart of another method for generating object audio data provided by an embodiment of the present disclosure
  • Figure 3 is a flow chart of yet another method for generating object audio data provided by an embodiment of the present disclosure
  • Figure 4 is a flow chart of yet another method for generating object audio data provided by an embodiment of the present disclosure
  • Figure 5 is a flow chart of yet another method for generating object audio data provided by an embodiment of the present disclosure
  • Figure 6 is a flow chart of yet another method for generating object audio data provided by an embodiment of the present disclosure.
  • Figure 7 is a structural diagram of a device for generating object audio data provided by an embodiment of the present disclosure.
  • Figure 8 is a structural diagram of another device for generating object audio data provided by an embodiment of the present disclosure.
  • Figure 9 is a structural diagram of an information acquisition unit in the device for generating object audio data provided by an embodiment of the present disclosure.
  • Figure 10 is a structural diagram of another information acquisition unit in the device for generating object audio data provided by an embodiment of the present disclosure.
  • Figure 11 is a structural diagram of yet another information acquisition unit in the device for generating object audio data provided by an embodiment of the present disclosure
  • Figure 12 is a structural diagram of yet another device for generating object audio data provided by an embodiment of the present disclosure.
  • Figure 13 is a structural diagram of a data generation unit in the device for generating object audio data provided by an embodiment of the present disclosure
  • FIG. 14 is a structural diagram of an electronic device according to an embodiment of the present disclosure.
  • At least one in the present disclosure can also be described as one or more, and the plurality can be two, three, four or more, and the present disclosure is not limited.
  • the technical feature is distinguished by “first”, “second”, “third”, “A”, “B”, “C” and “D” etc.
  • the technical features described in “first”, “second”, “third”, “A”, “B”, “C” and “D” are in no particular order or order.
  • each table in this disclosure can be configured or predefined.
  • the values of the information in each table are only examples and can be configured as other values, which is not limited by this disclosure.
  • it is not necessarily required to configure all the correspondences shown in each table.
  • the corresponding relationships shown in some rows may not be configured.
  • appropriate deformation adjustments can be made based on the above table, such as splitting, merging, etc.
  • the names of the parameters shown in the titles of the above tables may also be other names understandable by the communication device, and the values or expressions of the parameters may also be other values or expressions understandable by the communication device.
  • other data structures can also be used, such as arrays, queues, containers, stacks, linear lists, pointers, linked lists, trees, graphs, structures, classes, heaps, hash tables or hash tables. wait.
  • the object audio data acquisition method cannot realize the direct recording of the object audio data and cannot obtain the real sound object position information.
  • Embodiments of the present disclosure provide a method, device, electronic device and storage medium for generating object audio data to accurately obtain the location information of each sound object in real time and record and generate object audio data in real time to solve problems in related technologies. .
  • the method for generating object audio data in the embodiment of the present disclosure can be executed by the device for generating object audio data according to the embodiment of the present disclosure.
  • the device for generating object audio data can be implemented by software and/or hardware.
  • the object The audio data generating device may be configured in an electronic device, and the electronic device may install and run the target audio data generating program.
  • electronic devices may include but are not limited to smartphones, tablet computers and other hardware devices with various operating systems.
  • position information refers to the position information of each microphone or sound object relative to the listener when the listener (Audience) is taken as the origin.
  • the position information can be expressed in a rectangular coordinate system (xyz) or a spherical coordinate system ( ⁇ , ⁇ , r). They can be converted by the following formula (1).
  • xyz respectively represent the position coordinates of the microphone or sound object on the x-axis (front-back direction), y-axis (left-right direction), and z-axis (up-down direction) of the rectangular coordinate system.
  • ⁇ , ⁇ , and r respectively represent the horizontal angle of the microphone or sound object on the spherical coordinate system (the angle between the mapping of the microphone or sound object and the origin on the horizontal plane and the x-axis); the vertical angle (the angle between the microphone or sound object and the origin) The angle between the line connecting the object and the origin and the horizontal plane); and the straight-line distance of the microphone or sound object from the origin.
  • the aforementioned position information is position information in a three-dimensional coordinate system. If it is in a two-dimensional coordinate system, the position information can be expressed in the rectangular coordinate system (x, y) or in the polar coordinate system ( ⁇ , r). They can be converted by the following formula (2).
  • the position information will be expressed in a spherical coordinate system (polar coordinate system).
  • object audio generally refers to various sound formats that can describe sound objects (Audio Object). Point sound objects containing position information, or surface sound objects that can roughly determine the center position can be used as audio objects (Audio Object).
  • Object Audio generally consists of two parts, the sound signal itself (Audio Data), and the accompanying position information (Object Audio Metadata). Among them, the sound signal itself can be regarded as a mono audio signal, which can be in the form of PCM (Pulse-code modulation), DSD (Direct Stream Digital) and other uncompressed formats, or it can be MP3 (MPEG-1 or MPEG-2 Audio Layer III), AAC (Advanced Audio Coding), Dolby Digital and other compression formats.
  • the attached position information is the position information shown in 1. above at any time t.
  • the format can be a separate combination of the sound signal and position information of each object audio; it can also be a combination of the sound signals of all objects, the position information is combined, and the sound signal or position information is combined. Corresponding information of which sound signal corresponds to which position information is added.
  • FIG. 1 is a flow chart of a method for generating object audio data provided by an embodiment of the present disclosure.
  • the method may include but is not limited to the following steps:
  • S1 obtain sound data of at least one sound object.
  • the sound signal of the sound object can be recorded through a sound collection device to obtain the sound data of the sound object.
  • the at least one sound object can include one or more sound objects, including one
  • the sound signal of the sound object is recorded by one sound collecting device.
  • the sound signals of multiple sound objects are recorded by multiple sound collecting devices.
  • the sound collection device may be a device capable of collecting sound information, such as a microphone, which is not specifically limited in the embodiments of the present disclosure.
  • obtaining the current location information of the sound object may be to obtain the current location information of the sound object while obtaining the sound data of the sound object, so as to obtain the sound data and location information of the sound object in real time.
  • the sound data of the sound object is obtained, and the sound data of the sound object can be obtained through one or more sound collection devices.
  • the current position information of the sound object is obtained, relative to the sound collection device.
  • the relative position of the sound object is fixed, the current position information of the sound collection device can be obtained, and the current position information of the sound object can be determined based on the relative position relationship between the sound collection device and the sound object.
  • the sound object moves, and the sound collection device moves with the movement of the sound object, so that the position information of each sound object can be obtained in real time.
  • the current location information of the sound collection device can be obtained through the ultrasonic positioning method.
  • the sound collection device is provided with an ultrasonic transceiver device, and the ultrasonic signal in the sound collection device is collected.
  • the current location information of the sound collection device can be obtained.
  • other methods may be used, and the embodiments of the present disclosure do not specifically limit this.
  • S3 Synthesize the sound data of at least one sound object and the current location information to generate object audio data.
  • the sound data and current location information of at least one sound object are obtained and the sound data and location information of the sound object are obtained in real time
  • the sound data and current location information of the sound object are synthesized to generate an object. audio data.
  • the sound data of the sound object and the current location information are synthesized.
  • the sound data and the current location information can be combined according to time and saved in a specific file storage format to generate the object audio data.
  • sound data of at least one sound object is obtained; current position information of at least one sound object is obtained; sound data and current position information of at least one sound object are synthesized to generate object audio data.
  • the position information of each sound object can be accurately obtained in real time, and the object audio data can be recorded and generated in real time.
  • the method may include but is not limited to the following steps:
  • obtaining the current location information of the sound object may be to obtain the current location information of the sound object while obtaining the sound data of the sound object, so as to obtain the sound data and location information of the sound object in real time.
  • obtaining the current location information of at least one sound object includes: obtaining the current location information of at least one recording terminal that records the sound data of the at least one sound object.
  • the sound signal of the sound object can be recorded through the recording terminal, and the sound data of the sound object can be obtained.
  • the sound data of the sound object can be recorded through one recording terminal; when there are multiple sound objects, the sound data of the sound object can be recorded through multiple recording terminals.
  • the recording terminal includes a microphone, and the sound data of the sound object can be recorded through the microphone in the recording terminal.
  • the current location information of at least one sound object is obtained.
  • the current location information of the recording terminal that records the sound data of the sound object can be obtained.
  • the current location information of the recording terminal that records one or more sound objects can be obtained.
  • the current location information of the recording terminal of the sound data of the sound object can be obtained.
  • obtaining the current location information of at least one recording terminal that records sound data of at least one sound object includes: obtaining the current location information of at least one recording terminal in a one-way transceiver mode, a two-way transceiver mode, or a hybrid transceiver mode.
  • the current location information of at least one recording terminal that records the sound data of at least one sound object is obtained. If there is a sound object, the current location information of at least one recording terminal that records the sound data of the sound object is obtained. Information, when there are multiple sound objects, obtain the current location information of at least one recording terminal that records the sound data of each sound object.
  • the current location information of the recording terminal can be obtained through a one-way transceiver method, or the current location information of at least one recording terminal can be obtained through a two-way transceiver method, or the current location information of the recording terminal can be obtained through a hybrid transceiver method.
  • the current location information of the recording terminal is obtained through a hybrid transceiver method, and the current location information of the recording terminal can be obtained through a one-way transceiver method and a two-way transceiver method.
  • obtaining the location information of at least one recording terminal in a hybrid transceiver mode includes: obtaining the first positioning reference information in a one-way transceiver mode; acquiring the second positioning reference information in a two-way transceiver mode; and obtaining the second positioning reference information according to the first positioning reference information. and the second positioning reference information to determine the current location information of at least one recording terminal.
  • the location information of the recording terminal is obtained through a hybrid transceiver method.
  • the first positioning reference information can be obtained through a one-way transceiver method
  • the second positioning reference information can be obtained through a two-way transceiver method. According to the first positioning reference information and The second positioning reference information determines the current location information of the recording terminal.
  • the first positioning reference information and the second positioning reference information are different.
  • the first positioning reference information is one of angle information and distance information
  • the second positioning reference information is the other one of angle information and distance information
  • the location information of the recording terminal is obtained through a hybrid transceiver method
  • the angle information can be obtained through a one-way transceiver method
  • the distance information can be obtained through a two-way transceiver method
  • the current location information of the recording terminal is determined based on the angle information and distance information.
  • the location information of the recording terminal is obtained through a hybrid transceiver method
  • the distance information can be obtained through a one-way transceiver method
  • the angle information can be obtained through a two-way transceiver method
  • the current location of the recording terminal is determined based on the distance information and angle information. location information.
  • the first positioning reference information and the second positioning reference information can be obtained through sound waves or ultrasonic waves, or can also be obtained through electromagnetic wave signals such as UWB (Ultra Wide Band) or WiFi or BT.
  • UWB Ultra Wide Band
  • WiFi Wireless Fidelity
  • obtaining the current location information of at least one recording terminal in a one-way transceiver mode includes: receiving a first positioning signal sent by at least one recording terminal in a broadcast manner, and generating a location information of at least one recording terminal based on the first positioning signal. Current location information.
  • the current location information of the recording terminal is obtained through a one-way transceiver method, by receiving the first positioning signal sent by the recording terminal in a broadcast manner, and generating the current location information of the recording terminal based on the first positioning signal.
  • the current location information of the recording terminal can be obtained through the TDOA (time difference of arrival) method.
  • the first positioning signal sent by the recording terminal in the broadcast mode can be a sound wave or an ultrasonic wave, or it can also be an electromagnetic wave signal such as UWB (Ultra Wide Band), WiFi or BT.
  • UWB Ultra Wide Band
  • WiFi Wireless Fidelity
  • obtaining the location information of at least one recording terminal in a two-way transceiver mode includes: receiving a positioning start signal sent by at least one recording terminal in a broadcast mode; sending a response signal to at least one recording terminal; receiving at least one recording terminal The second positioning signal is sent, and the current position information of at least one recording terminal is generated according to the second positioning signal.
  • the current location information of the recording terminal is obtained through a two-way transceiver method, by receiving the positioning start signal sent by the recording terminal in a broadcast mode, sending a response signal to the recording terminal, and receiving the second positioning signal sent by the recording terminal. And generate the current location information of the recording terminal according to the second positioning signal.
  • the location information of at least one recording terminal can be obtained through the TOF (time of flight) method.
  • the positioning start signal sent by the recording terminal in a broadcast manner can be a sound wave or an ultrasonic wave, or it can also be an electromagnetic wave signal such as UWB (Ultra Wide Band) or WiFi or BT.
  • UWB Ultra Wide Band
  • WiFi Wireless Fidelity
  • the second positioning signal sent by the recording terminal can be a sound wave or an ultrasonic wave, or it can also be an electromagnetic wave signal such as UWB (Ultra Wide Band) or WiFi or BT.
  • UWB Ultra Wide Band
  • WiFi Wireless Fidelity
  • each recording terminal corresponds to a sound object, and the position of the recording terminal moves along with the sound source of the sound object.
  • each recording terminal corresponds to a sound object. If there is a sound object, the corresponding sound object records the sound data of the sound object through one or more recording terminals.
  • obtaining the current position information of at least one sound object includes: obtaining at least one recording of the sound data of the at least one sound object.
  • the recording terminal corresponds to a sound object, and the positions of the recording terminal and the sound source of the sound object are relatively fixed. When the sound source of the sound object moves, the recording terminal moves along with the movement of the sound source of the sound object.
  • initial position information of at least one sound object is obtained.
  • the initial position information of the sound object and the current position information of the sound object are obtained, and the sound data of the sound object is obtained, thereby obtaining the sound data and position information of the sound object in real time.
  • the sound data, initial position information and current position information of the sound object are obtained, and the sound data and position information of the sound object can be obtained in real time.
  • S23 Synchronize the sound data and current location information of at least one sound object.
  • the sound data and the current location information of the sound object are obtained, the sound data and the current location information of the sound object are synchronized.
  • the sound data and the current location information can be synchronized according to time.
  • S24 Synthesize the sound data of at least one sound object and the current location information to generate object audio data.
  • the sound data and current location information of at least one sound object are obtained and the sound data and location information of the sound object are obtained in real time
  • the sound data and current location information of the sound object are synthesized to generate an object. audio data.
  • synthesizing the sound data of at least one sound object and the current position information to generate the object audio data includes: obtaining audio parameters and using the audio parameters as header file information of the object audio data; in each sample At each moment, the sound data of each sound object is saved as the object audio signal, and the current position information is saved as the object audio auxiliary data to generate the object audio data.
  • the generated object audio data can be stored in multiple storage formats, such as a first format saved as a file, a second format that can be played in real time, etc.
  • the sound data of at least one sound object will be combined into one audio information, and its storage method can be raw-pcm format or uncompressed wav format (this When a single sound object is regarded as a channel of a wav file), it can also be encoded into various compression formats.
  • the current position information of at least one sound object will also be combined and saved as object audio auxiliary data (Object Audio metadata).
  • the second format low delay mode[], with a certain length of time as one frame (frame), inside each frame, save it in the same format as file packing mode, and combine the sound data at this time with The current position information is concatenated together to become the object audio data of the frame.
  • the target audio data of each frame is sent to the playback device or saved in chronological order.
  • the sampling rate (Sampling rate), bit width (bit depth), the number of sound objects N obj (Number of objects s), etc.
  • the audio parameters as the object audio data header file information, for each sampling moment, the sound data of each sound object is saved as the object audio signal, and the current position information is saved as the object audio auxiliary data to generate the object audio data.
  • [s51] obtains the number of sound objects N obj and the current position information of at least one sound object after synchronization Sound data for sound objects
  • [s53a] Record the basic parameters of audio, such as sampling rate (Sampling rate), bit width (bit depth), number of sound objects N obj (Number of objects s), etc. as header file information into the target audio file.
  • the sound data of each sound object occupies a length of wBitsPerSample bits.
  • the sound data of the sound object sampled at time t will be obtained. Record the object audio signal of the sound object obtained at time t-1 according to the natural order of the sound source. After that, the sound data of each sound object occupies a length of wBitsPerSample bits.
  • each parameter is: iSampleOffset: the serial number of the sampling point;
  • Object Object_index the serial number of the currently recorded audio source
  • Object Object_Radius r of the currently recorded audio source.
  • At subsequent sampling points determine whether at least one sound object has a change in position. If so, save the sound source whose position has changed at the sampling point. See Table 2 above for the storage format.
  • a certain time interval can be specified, such as N sampling points being judged and saved at one time to save storage space.
  • the audio parameters as the header file information of the object audio data, the sound data of the sound object as the object audio signal, and the current position information as the object audio auxiliary data are spliced to generate a complete Object audio data.
  • the number of sound objects N obj and the synchronized current position information of at least one sound object are obtained.
  • the position information of the sound objects of all sampling points contained in the current frame is At each subsequent sampling time t, the sound data of the sound object sampled at time t will be obtained. Record the object audio signal of the sound object obtained at time t-1 according to the natural order of the sound source. After that, the sound data of each sound object occupies a length of wBitsPerSample bits.
  • the audio parameters as the header file information of the object audio data, the sound data of the sound object as the object audio signal, and the current position information as the object audio auxiliary data are spliced to generate a complete Object audio data.
  • the method further includes: saving the sound data and current location information in frame units.
  • the header file information is first recorded or transmitted.
  • the recorded sound data of the sound object and the recorded object audio auxiliary data are spliced to become the object of the frame. audio information.
  • the object audio data of each frame is spliced in chronological order and saved, or directly transmitted after obtaining one frame of object audio data each time to achieve low delay transmission.
  • the combined object audio data is saved in memory or disk as needed, or transferred to the playback device, or encoded into MPEG-H 3D Audio format, or Dolby Atmos format or other supported object audio ( Object Audio) encoding format and save or transmit it.
  • positioning technology can be used to obtain the current location information of each sound object in real time and accurately.
  • the object audio data can be recorded and generated in real time.
  • the embodiment of the present disclosure provides an exemplary embodiment.
  • the sound data of the sound object is obtained through a recording terminal.
  • Each sound object collects sound data through a recording terminal, and multiple recording terminals obtain multiple
  • the sound data of the sound object is sent to the recording module to obtain the sound data of at least one sound object.
  • the recording terminal can send a positioning signal, which is received by several receiving ends (antennas or microphones) in the positioning module.
  • Figure 4 shows that the sound signal is transmitted to the recording module in a wired manner, but it can also be transmitted wirelessly (WiFi or BT, etc.), the receiving end in the positioning module receives the positioning signal sent by the recording terminal, and obtains the current location information of the sound object.
  • FIG. 4 only shows the situation of obtaining the current location information of the sound object using the one-way transceiver method.
  • the current location information of the sound object can also be obtained using the two-way transceiver method, or the hybrid transceiver method.
  • the recording terminal can also send positioning start signals as needed, and can also receive response signals returned by the positioning module.
  • the positioning module can accept the positioning sent by the recording terminal. In addition to the signal and initial positioning signal, a response signal can also be sent.
  • the sound signal of the corresponding sound object is first recorded through each recording device, and a ranging signal is emitted.
  • the process of combining the sound information (sound data) and position information (current position information) of each sound object to generate a complete object audio signal (object audio data) may specifically include:
  • [S302] According to the position information of the positioning module, determine the coordinate origin position of the position information. And assign an initial position to each sound object.
  • [S303] Demodulate the positioning signals received at each receiving device (antenna or microphone) of the positioning module and extract positioning features for subsequent positioning of each recording terminal through the features.
  • S305 to S306 are to determine whether there is a positioning signal or positioning start signal of a certain sound object from the received positioning characteristics. If so, obtain the information and use different positioning solutions according to the positioning method. For example, when using the one-way transceiver method, the TDOA (time difference of arrival) method is used to obtain the position information of the sound object. When the two-way transceiver method is used, the TOF (time of flight) method is used to obtain the position information of the sound object. Or UWB indoor positioning solution, etc. If the two-way sending and receiving method is adopted, the positioning module must conduct two-way data communication with each recording terminal.
  • TDOA time difference of arrival
  • TOF time of flight
  • the synchronization module obtains the sound information (sound data) of the sound object from the recording module, obtains the position information (current position information) of the sound object from the positioning module, synchronizes according to time, and synchronizes the sound information (sound data) of the sound object. ) and location information (current location information) are sent to the combination module.
  • the combination module obtains the synchronized position information (current position information) of each sound object from the synchronization module. and sound information (sound data) And combine the position information of the sound object (current position information) and the sound information of the sound object (sound data) to form a complete object audio signal.
  • the sound information (sound data) of each sound object will be combined into a multi-object audio information, which can be saved in raw-pcm format or uncompressed wav format (in this case, a single object Seen as a channel of a wav file), it can also be encoded into various compression formats.
  • the sound object position information of each object will also be combined together and saved as object audio auxiliary data (Object Audio metadata).
  • a certain time length ⁇ is specified as a frame. Within each frame, it is saved in the same format as file packing mode, and the sound information and audio auxiliary data at this time are connected together to become Object audio information for this frame. At this time, the audio information of each frame is sent to the playback device or saved in chronological order.
  • FIG. 7 is a structural diagram of a device for generating object audio data provided by an embodiment of the present disclosure.
  • the object audio data generating device 1 includes: a data acquisition unit 11 , an information acquisition unit 12 and a data generation unit 13 .
  • the data acquisition unit 11 is configured to acquire sound data of at least one sound object.
  • the information acquisition unit 12 is configured to acquire current location information of at least one sound object.
  • the data generating unit 13 is configured to synthesize the sound data of at least one sound object and the current location information to generate object audio data.
  • the information acquisition unit 12 is specifically configured to: acquire the current location information of at least one recording terminal that records the sound data of at least one sound object.
  • the object audio data generating device 1 also includes: a synchronization processing unit 14 configured to synchronize the sound data of at least one sound object and the current location information.
  • the information acquisition unit 12 is specifically configured to acquire the current location information of at least one recording terminal in a one-way transceiver mode, a two-way transceiver mode, or a hybrid transceiver mode.
  • the information acquisition unit 12 includes: a first information acquisition module 121, a second information acquisition module 122, and a first current information acquisition module 123.
  • the first information acquisition module 121 is configured to acquire the first positioning reference information in a one-way sending and receiving manner.
  • the second information acquisition module 122 is configured to acquire the second positioning reference information in a bidirectional sending and receiving manner.
  • the first current information acquisition module 123 is configured to determine the current location information of at least one recording terminal based on the first positioning reference information and the second positioning reference information.
  • the first positioning reference information is one of angle information and distance information
  • the second positioning reference information is the other one of angle information and distance information
  • the information acquisition unit 12 includes: a second current information acquisition module 124, configured to receive a first positioning signal sent by at least one recording terminal in a broadcast manner, and according to the first positioning The signal generates current location information of at least one recording terminal.
  • the information acquisition unit 12 includes: a signal receiving module 125, a signal sending module 126 and a third current information acquiring module 127.
  • the signal receiving module 125 is configured to receive a positioning start signal sent by at least one recording terminal in a broadcast manner.
  • the signal sending module 126 is configured to send a response signal to at least one recording terminal.
  • the third current information acquisition module 127 is configured to receive a second positioning signal sent by at least one recording terminal, and generate current location information of at least one recording terminal according to the second positioning signal.
  • each recording terminal corresponds to a sound object, and the position of the recording terminal moves along with the sound source of the sound object.
  • the object audio data generating device 1 further includes: an initial position acquisition unit 15 configured to acquire initial position information of at least one sound object.
  • the data generation unit 13 includes: a parameter acquisition module 131 and an audio data generation module 132.
  • the parameter acquisition module 131 is configured to acquire audio parameters and use the audio parameters as header file information of the object audio data.
  • the audio data generation module 132 is configured to, at each sampling moment, save the sound data of each sound object as the object audio signal, and save the current position information as the object audio auxiliary data, to generate the object audio data.
  • the data generation unit 13 also includes: a processing module 133.
  • the processing module 133 is configured to save the sound data and current location information in frame units.
  • the object audio data generation device provided by the embodiments of the present disclosure can perform the object audio data generation method as described in some of the above embodiments, and its beneficial effects are the same as those of the object audio data generation method described above, which are not mentioned here. Again.
  • FIG. 14 is a structural diagram of an electronic device 100 used for a method of generating object audio data according to an exemplary embodiment.
  • the electronic device 100 may be a mobile phone, a computer, a digital broadcast terminal, a messaging device, a game console, a tablet device, a medical device, a fitness device, a personal digital assistant, or the like.
  • the electronic device 100 may include one or more of the following components: a processing component 101 , a memory 102 , a power supply component 103 , a multimedia component 104 , an audio component 105 , an input/output (I/O) interface 106 , and a sensor. component 107, and communications component 108.
  • the processing component 101 generally controls the overall operations of the electronic device 100, such as operations associated with display, phone calls, data communications, camera operations, and recording operations.
  • the processing component 101 may include one or more processors 1011 to execute instructions to complete all or part of the steps of the above method. Additionally, processing component 101 may include one or more modules that facilitate interaction between processing component 101 and other components. For example, processing component 101 may include a multimedia module to facilitate interaction between multimedia component 104 and processing component 101 .
  • Memory 102 is configured to store various types of data to support operations at electronic device 100 . Examples of such data include instructions for any application or method operating on the electronic device 100, contact data, phonebook data, messages, pictures, videos, etc.
  • the memory 102 can be implemented by any type of volatile or non-volatile storage device or a combination thereof, such as SRAM (Static Random-Access Memory), EEPROM (Electrically Erasable Programmable read only memory), which can be Erasable programmable read-only memory), EPROM (Erasable Programmable Read-Only Memory, erasable programmable read-only memory), PROM (Programmable read-only memory, programmable read-only memory), ROM (Read-Only Memory, only read memory), magnetic memory, flash memory, magnetic disk or optical disk.
  • SRAM Static Random-Access Memory
  • EEPROM Electrical Erasable Programmable read only memory
  • EPROM Erasable Programmable Read-Only Memory, erasable programmable read-only memory
  • PROM Pro
  • Power supply component 103 provides power to various components of electronic device 100 .
  • Power supply components 103 may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power to electronic device 100 .
  • Multimedia component 104 includes a touch-sensitive display screen that provides an output interface between the electronic device 100 and the user.
  • the touch display screen may include LCD (Liquid Crystal Display) and TP (Touch Panel).
  • the touch panel includes one or more touch sensors to sense touches, swipes, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide action.
  • multimedia component 104 includes a front-facing camera and/or a rear-facing camera. When the electronic device 100 is in an operating mode, such as a shooting mode or a video mode, the front camera and/or the rear camera may receive external multimedia data.
  • Each front-facing camera and rear-facing camera can be a fixed optical lens system or have a focal length and optical zoom capabilities.
  • Audio component 105 is configured to output and/or input audio signals.
  • the audio component 105 includes a MIC (Microphone), and when the electronic device 100 is in an operating mode, such as a call mode, a recording mode, and a voice recognition mode, the microphone is configured to receive an external audio signal.
  • the received audio signals may be further stored in memory 102 or sent via communications component 108 .
  • audio component 105 also includes a speaker for outputting audio signals.
  • the I/O interface 2112 provides an interface between the processing component 101 and a peripheral interface module.
  • the peripheral interface module may be a keyboard, a click wheel, a button, etc. These buttons may include, but are not limited to: Home button, Volume buttons, Start button, and Lock button.
  • Sensor component 107 includes one or more sensors for providing various aspects of status assessment for electronic device 100 .
  • the sensor component 107 can detect the open/closed state of the electronic device 100, the relative positioning of components, such as the display and the keypad of the electronic device 100, the sensor component 107 can also detect the electronic device 100 or an electronic device 100.
  • the position of components changes, the presence or absence of user contact with the electronic device 100 , the orientation or acceleration/deceleration of the electronic device 100 and the temperature of the electronic device 100 change.
  • Sensor assembly 107 may include a proximity sensor configured to detect the presence of nearby objects without any physical contact.
  • the sensor component 107 may also include a light sensor, such as a CMOS (Complementary Metal Oxide Semiconductor) or a CCD (Charge-coupled Device) image sensor for use in imaging applications.
  • CMOS Complementary Metal Oxide Semiconductor
  • CCD Charge-coupled Device
  • the sensor component 107 may also include an acceleration sensor, a gyroscope sensor, a magnetic sensor, a pressure sensor, or a temperature sensor.
  • Communication component 108 is configured to facilitate wired or wireless communication between electronic device 100 and other devices.
  • the electronic device 100 may access a wireless network based on a communication standard, such as WiFi, 2G or 3G, or a combination thereof.
  • the communication component 108 receives broadcast signals or broadcast related information from an external broadcast management system via a broadcast channel.
  • the communication component 108 also includes an NFC (Near Field Communication) module to facilitate short-range communication.
  • the NFC module can be based on RFID (Radio Frequency Identification) technology, IrDA (Infrared Data Association) technology, UWB (Ultra Wide Band) technology, BT (Bluetooth, Bluetooth) technology and other Technology to achieve.
  • the electronic device 100 may be configured by one or more ASIC (Application Specific Integrated Circuit), DSP (Digital Signal Processor, digital signal processor), digital signal processing device (DSPD), PLD ( Programmable Logic Device, Programmable Logic Device), FPGA (Field Programmable Gate Array, Field Programmable Logic Gate Array), controller, microcontroller, microprocessor or other electronic components, used to perform the generation of audio data for the above objects method.
  • ASIC Application Specific Integrated Circuit
  • DSP Digital Signal Processor, digital signal processor
  • DSPD digital signal processing device
  • PLD Programmable Logic Device, Programmable Logic Device
  • FPGA Field Programmable Gate Array
  • controller microcontroller, microprocessor or other electronic components
  • the electronic device 100 provided by the embodiments of the present disclosure can perform the object audio data generation method as described in some of the above embodiments, and its beneficial effects are the same as those of the object audio data generation method described above, which will not be described again here.
  • the present disclosure also proposes a storage medium.
  • the electronic device when the instructions in the storage medium are executed by the processor of the electronic device, the electronic device can execute the method of generating the object audio data as described above.
  • the storage medium can be ROM (Read Only Memory Image, read-only memory), RAM (Random Access Memory, random access memory), CD-ROM (Compact Disc Read-Only Memory, compact disc read-only memory) , tapes, floppy disks and optical data storage devices, etc.
  • the present disclosure also provides a computer program product.
  • the computer program When the computer program is executed by a processor of an electronic device, the electronic device can perform the object audio data generating method as described above.

Landscapes

  • Engineering & Computer Science (AREA)
  • Multimedia (AREA)
  • Computational Linguistics (AREA)
  • Quality & Reliability (AREA)
  • Signal Processing (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Human Computer Interaction (AREA)
  • Physics & Mathematics (AREA)
  • Acoustics & Sound (AREA)
  • Stereophonic System (AREA)

Abstract

Sont divulgués dans des modes de réalisation de la présente divulgation un procédé et un appareil de génération de données audio d'objet, un dispositif électronique, et un support de stockage. Le procédé comprend les étapes suivantes : obtention de données sonores d'au moins un objet sonore ; obtention d'informations de position courante de l'au moins un objet sonore ; et synthétisation des données sonores et des informations de position courante de l'au moins un objet sonore afin de générer des données audio d'objet. Par conséquent, des informations de position de chaque objet sonore peuvent être obtenues avec précision en temps réel, de sorte que des données audio d'objet puissent être enregistrées et générées en temps réel.
PCT/CN2022/091051 2022-05-05 2022-05-05 Procédé et appareil de génération de données audio d'objet, dispositif électronique, et support de stockage WO2023212879A1 (fr)

Priority Applications (2)

Application Number Priority Date Filing Date Title
PCT/CN2022/091051 WO2023212879A1 (fr) 2022-05-05 2022-05-05 Procédé et appareil de génération de données audio d'objet, dispositif électronique, et support de stockage
CN202280001279.8A CN117355894A (zh) 2022-05-05 2022-05-05 对象音频数据的生成方法、装置、电子设备和存储介质

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2022/091051 WO2023212879A1 (fr) 2022-05-05 2022-05-05 Procédé et appareil de génération de données audio d'objet, dispositif électronique, et support de stockage

Publications (1)

Publication Number Publication Date
WO2023212879A1 true WO2023212879A1 (fr) 2023-11-09

Family

ID=88646110

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2022/091051 WO2023212879A1 (fr) 2022-05-05 2022-05-05 Procédé et appareil de génération de données audio d'objet, dispositif électronique, et support de stockage

Country Status (2)

Country Link
CN (1) CN117355894A (fr)
WO (1) WO2023212879A1 (fr)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102968991A (zh) * 2012-11-29 2013-03-13 华为技术有限公司 一种语音会议纪要的分类方法、设备和系统
US20130101122A1 (en) * 2008-12-02 2013-04-25 Electronics And Telecommunications Research Institute Apparatus for generating and playing object based audio contents
CN105070304A (zh) * 2015-08-11 2015-11-18 小米科技有限责任公司 实现对象音频录音的方法及装置、电子设备
CN107968974A (zh) * 2017-12-07 2018-04-27 北京小米移动软件有限公司 麦克风控制方法、系统、麦克风及存储介质
CN110320498A (zh) * 2018-03-29 2019-10-11 Cae有限公司 用于确定麦克风的位置的方法和系统
CN111312295A (zh) * 2018-12-12 2020-06-19 深圳市冠旭电子股份有限公司 一种全息声音的记录方法、装置及录音设备
CN114333853A (zh) * 2020-09-25 2022-04-12 华为技术有限公司 一种音频数据的处理方法、设备和系统

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20130101122A1 (en) * 2008-12-02 2013-04-25 Electronics And Telecommunications Research Institute Apparatus for generating and playing object based audio contents
CN102968991A (zh) * 2012-11-29 2013-03-13 华为技术有限公司 一种语音会议纪要的分类方法、设备和系统
CN105070304A (zh) * 2015-08-11 2015-11-18 小米科技有限责任公司 实现对象音频录音的方法及装置、电子设备
CN107968974A (zh) * 2017-12-07 2018-04-27 北京小米移动软件有限公司 麦克风控制方法、系统、麦克风及存储介质
CN110320498A (zh) * 2018-03-29 2019-10-11 Cae有限公司 用于确定麦克风的位置的方法和系统
CN111312295A (zh) * 2018-12-12 2020-06-19 深圳市冠旭电子股份有限公司 一种全息声音的记录方法、装置及录音设备
CN114333853A (zh) * 2020-09-25 2022-04-12 华为技术有限公司 一种音频数据的处理方法、设备和系统

Also Published As

Publication number Publication date
CN117355894A (zh) 2024-01-05

Similar Documents

Publication Publication Date Title
KR101770295B1 (ko) 객체 오디오 녹음 방법, 장치, 전자기기, 프로그램 및 기록매체
JP6626440B2 (ja) マルチメディアファイルを再生するための方法及び装置
US8606183B2 (en) Method and apparatus for remote controlling bluetooth device
US10051368B2 (en) Mobile apparatus and control method thereof
CN113890932A (zh) 一种音频控制方法、系统及电子设备
KR102538775B1 (ko) 오디오 재생 방법 및 오디오 재생 장치, 전자 기기 및 저장 매체
WO2020108178A1 (fr) Procédé de traitement d'effet sonore d'enregistrement et terminal mobile
CN109121047B (zh) 双屏终端立体声实现方法、终端及计算机可读存储介质
WO2022048599A1 (fr) Procédé de réglage de position de lecteur acoustique, et procédé et appareil de rendu audio
WO2022068613A1 (fr) Procédé de traitement audio et dispositif électronique
CN105451056A (zh) 音视频同步方法及装置
WO2023151526A1 (fr) Procédé et appareil d'acquisition audio, dispositif électronique et composant périphérique
CN113921002A (zh) 一种设备控制方法及相关装置
CN115167802A (zh) 一种音频切换播放方法及电子设备
WO2023216119A1 (fr) Procédé et appareil de codage de signal audio, dispositif électronique et support d'enregistrement
CN104599691A (zh) 音频播放方法及装置
WO2023212879A1 (fr) Procédé et appareil de génération de données audio d'objet, dispositif électronique, et support de stockage
CN114598984B (zh) 立体声合成方法和系统
US20240144948A1 (en) Sound signal processing method and electronic device
CN114079691B (zh) 一种设备识别方法及相关装置
CN108111920A (zh) 视频信息处理方法及装置
CN115550559A (zh) 视频画面显示方法、装置、设备和存储介质
CN109712629B (zh) 音频文件的合成方法及装置
WO2024027315A1 (fr) Procédé et appareil de traitement audio, dispositif électronique, support de stockage et produit-programme
EP4167580A1 (fr) Procédé de commande audio, système et dispositif électronique

Legal Events

Date Code Title Description
WWE Wipo information: entry into national phase

Ref document number: 202280001279.8

Country of ref document: CN

121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 22940575

Country of ref document: EP

Kind code of ref document: A1